Speech Emotion Classification Analysis using Short-term Features

Thirukumaran, S.; Archana, A.F.C.

Speech Emotion Classification Analysis using Short-term Features

Thirukumaran, S.; Archana, A.F.C.

URI: http://drr.vau.ac.lk/handle/123456789/1729

Date: 2017

Abstract:

Speech is an auditory signal produced from the human speech production system used to express ourselves. In this era, speech signals are also used in biometric identification technologies and interacting with machines, so that it can give different response. Emotion recognition is not a new topic and researches and applications exist using different methods to extract specific features from the speech signals. This paper presents a classification analysis of emotional human speech only with short term processing features of the speech signals using artificial neural network based approach. Speech rate, pitch and energy are the most basic features of speech signal but they still have significant differences between emotions such as angry and sad. The most common way to analyze the speech emotion is to extract important features which are related to different emotion states from the voice signal. In the speech pre-processing phase, the samples of four basic types of emotional speeches sad, angry, happy, and neutral are used. Then feed those extracted short term features into the input end of the classifier and obtained different emotions at the output end. 23 short term audio signal features of 40 samples of two frames are selected and extracted from the speech signals to analyze the human emotions. These derived data along with their related emotion target matrix are fed to test and design the classifier using artificial neural network pattern recognition algorithm. The confusion matrix is generated to analyze the performance results. The overall correctly classified results for two times trained network is 73.8 %, while increasing the training times to ten, 95 % of the emotions are correctly classified. The accuracy of the neural network system is improved by multiple times of training. The overall system provides a reliable performance and correctly classifies more than 85 % for the new non-trained dataset.

Show full item record