Abstract:
Speech is an auditory signal produced from the human speech production system used
to express ourselves. In this era, speech signals are also used in biometric identification
technologies and interacting with machines, so that it can give different response.
Emotion recognition is not a new topic and researches and applications exist using
different methods to extract specific features from the speech signals. This paper presents
a classification analysis of emotional human speech only with short term processing
features of the speech signals using artificial neural network based approach. Speech
rate, pitch and energy are the most basic features of speech signal but they still have
significant differences between emotions such as angry and sad. The most common way
to analyze the speech emotion is to extract important features which are related to
different emotion states from the voice signal. In the speech pre-processing phase, the
samples of four basic types of emotional speeches sad, angry, happy, and neutral are
used. Then feed those extracted short term features into the input end of the classifier
and obtained different emotions at the output end. 23 short term audio signal features
of 40 samples of two frames are selected and extracted from the speech signals to analyze
the human emotions. These derived data along with their related emotion target matrix
are fed to test and design the classifier using artificial neural network pattern recognition
algorithm. The confusion matrix is generated to analyze the performance results. The
overall correctly classified results for two times trained network is 73.8 %, while
increasing the training times to ten, 95 % of the emotions are correctly classified. The
accuracy of the neural network system is improved by multiple times of training. The
overall system provides a reliable performance and correctly classifies more than 85 %
for the new non-trained dataset.