Oleg Ilarionov
Faculty of Information Technology, Taras Shevchenko National University of Kyiv, Ukraine
Anton Astakhov
Faculty of Information Technology, Taras Shevchenko National University of Kyiv, Ukraine
Anna Krasovska
Faculty of Information Technology, Taras Shevchenko National University of Kyiv, Ukraine
Iryna Domanetska
Faculty of Information Technology, Taras Shevchenko National University of Kyiv, Ukraine
Abstract
DOI: https://doi.org/10.17721/AIT.2021.1.06
Speech is the main way of communication for people, and people can receive not only semantic but also emotional information from speech. Recognition of emotions by voice is relevant to areas such as psychological care, security systems development, lie detection, customer relationship analysis, video game development. Because the recognition of emotions by a person is subjective, and therefore inexact and time consuming, there is a need to create software that could solve this problem. The article considers the state of the problem of recognizing human emotions by voice. Modern publications, the approaches used in them, namely models of emotions, data sets, methods of extraction of signs, classifiers are analyzed. It is determined that existing developments have an average accuracy of about 0.75. The general structure of the system of recognition of human emotions by voice is analyzed, the corresponding intellectual module is designed and developed. A Unified Modeling Language (UML) is used to create a component diagram and a class diagram. RAVDESS and TESS datasets were selected as datasets to diversify the training sample. A discrete model of emotions (joy, sadness, anger, disgust, fear, surprise, calm, neutral emotion), MFCC (Mel Frequency Cepstral Coefficients) method for extracting signs, convolutional neural network for classification were used. . The neural network was developed using the TensorFlow and Keras machine learning libraries. The spectrogram and graphs of the audio signal, as well as graphs of accuracy and recognition errors are constructed. As a result of the software implementation of the intelligent module for recognizing emotions by voice, the accuracy of validation has been increased to 0.8.
Keywords – recognition of emotions by voice, neural networks, deep learning, convolutional neural networks
Information about the author
Oleg Ilarionov. Candidate of Technical Sciences, Associate Professor, Head of the Department of Intellectual Technologies, Faculty of Information Technologies, Taras Shevchenko National University of Kyiv, areas of research – tools and technologies for developing information systems for managing technological processes and objects of different physical nature information systems
Anton Astakhov Graduate of the educational program “Technologies of Artificial Intelligence” of Taras Shevchenko National University of Kyiv.
Anna Krasovska. Candidate of Technical Sciences, Associate Professor, works as an associate professor of the Department of Intellectual Technologies of Taras Shevchenko National University of Kyiv, Kyiv, Ukraine. Research interests include intelligent decision support systems, adaptive intelligent systems in education, multi-agent systems and technologies.
Iryna Domanetska. Candidate of Technical Sciences, Associate Professor, works as an associate professor of the Department of Intellectual Technologies of Taras Shevchenko National University of Kyiv, Kyiv, Ukraine. Аreas of research are system-technical research in the field of IT, neural network technologies and their application, adaptive learning systems.
References
- Schuller, B.W. (2018) «Speech emotion recognition: two decades in a nutshell, benchmarks, and ongoing trends», Commun. ACM 61 (5), pp. 90–99. [Online]. Available: doi:10.1145/3129340.
- Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G. (2001) «Emotion recognition in human-computer interaction», IEEE Signal Process. Mag. 18 (1), pp. 32–80. [Online]. Available: doi:10.1109/79.911197.
- Huahu, X., Jue, G., Jian, Y. «Application of speech emotion recognition in intelligent household robot», in International Conference on Artificial Intelligence and Computational Intelligence, 2010, Vol.1, pp. 537–541.
- Yoon WJ., Cho YH., Park KS. A Study of Speech Emotion Recognition and Its Application to Mobile Services, ser. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 2007, vol 4611.
- Gupta, P., Rajput, N. «Two-stream emotion recognition for call center monitoring», in Proc. Interspeech 2007, pp.2241–2244.
- Szwoch, M., Szwoch, W. «Emotion recognition for affect aware video games», in Image Processing & Communications Challenges 6, Springer International Publishing, Cham, vol. 313, pp. 227–236.
- Lancker, D.V., Cornelius, C., Kreiman, J. «Recognition of emotionalprosodic meanings in speech by autistic, schizophrenic, and normal children». Develop. Neuropsychol. vol. 5 (2–3), pp. 207–226, 1989.
- Low, L.A., Maddage, N.C., Lech, M., Sheeber, L.B., Allen, N.B. (2011) «Detection of clinical depression in adolescents’ speech during family interactions», IEEE Trans. Biomed. Eng. vol.58, issue 3, pp. 574–586.
- Ververidis, D., Kotropoulos, C. «Emotional Speech Recognition: Resources, Features, and Methods», Speech Communication, vol.48, issue 9, pp. 1162-1181, 2006, [Online]. Available: http://dx.doi.org/10.1016/j.specom.2006.04.003
- Ayadi, M.E., Kamel M.S., Karray F. «Survey on speech emotion recognition: Features, classification schemes, and databases», Pattern Recognition, vol. 44, issue 3, pp. 572-587, 2011.
- Koolagudi, S. G., & Rao, K. S. «Emotion recognition from speech: a review», International Journal of Speech Technology, vol.15 issue 2, pp. 99–117, 2012.
- Anagnostopoulos, C.N.; Iliou, T.; Giannoukos, I. «Features and classifiers for emotion recognition from speech: A survey from 2000 to 2011», Artif. Intell. Rev., vol. 43, pp. 155–177, 2012.
- Ramakrishnan, S. Recognition of emotion from speech: A review. In: Ramakrishnan, S. (Ed.), Speech Enhancement, Modeling and Recognition Algorithms and Applications, Intec, 2012.
- Sailunaz, K., Dhaliwal, M., Rokne, J., Alhajj, R. «Emotion detection from text and speech: a survey» Soc. Netw. Anal. Min. 8(1), pp.1–26, 2018.
- Basu, S., Chakraborty, J., Bag, A., Aftabuddin, M. «A review on emotion recognition using speech», in International Conference on Inventive Communication and Computational Technologies (ICICCT), 2017, pp. 109–114.
- Livingstone SR, Russo FA (2018) «The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English». PLoS ONE 13(5): e0196391. [Online]. Available: https://doi.org/10.1371/journal.pone.0196391.
- M. K. P. Kate Dupuis, “Toronto emotional speech set (TESS)” 2010. [Online]. Available: https://tspace.library.utoronto.ca/handle/1807/24487

Published
2021-11-04
How to Cite
Oleg Ilarionov , Anton Astakhov , Anna Krasovska , Iryna Domanetska Intelligent module for recognizing emotions by voice”, Advanced Information Technology, vol.1, pp. 46–52, 2021
Issue
Advanced Information Technology № 1 (1), 2021
Section
Machine learning and pattern recognition