THE 3
rd
INTERNATIONAL SCIENTIFIC CONFERENCES OF STUDENTS AND YOUNG RESEARCHERS
dedicated to the 99
th
anniversary of the National Leader of Azerbaijan Heydar Aliyev
187
features in audio for time-frequency decompositions [1]. Mathematically,
STFT can be defined as below:
𝑆(𝑚, 𝑘) =
𝑥(𝑛 + 𝑚𝐻) ∗ 𝑤(𝑛) ∗ 𝑒
MFCC
: After Fourier transform, Mel-frequency cepstral coefficients can
be obtained which is the most recommended method for speech recognition
applications. The reason that MFCC is about the nature of speech signals.
Speech is simply convolution of the vocal tract and frequency response with
glottal pulse [2]. Because in real life we already have a speech signal as an
output, it should be decomposed into two parts to extract features from it.
𝐶 𝑥(𝑡) = 𝐹 [log(𝐹[𝑥(𝑡])] − 𝑐𝑒𝑝𝑠𝑡𝑟𝑢𝑚
Aside from that linear prediction coefficients (LPC), perceptual linear
prediction coefficients (PLP), RASTA-PLP, power-normalized cepstral
coefficients (PNCC) and deep bottle-neck features (DBNF) can be used for
speech recognition applications.
CNN
: Convolution neural network is a form of standard neural network
which consists of three main layers; convolutional, pooling, and fully-
connected layers. As layers are added, complexity increases but more parts
of the image are specified. CNN has three key properties [3]:
Ability to decrease non-white noise effect –
locality.
Model robustness can be increased,
overfitting and number of weights can be
decreased – weight sharing.
Size can be reduced – pooling.
The block diagram for the overall process is
given on the right. So, images that are obtained from
STFT and MFCC operations can be given to the
CNN for training and building the model. With
regards to the STFT, results are low, whereas
MFCC generates higher results which shows the
importance of MFCC in comparison with using only
STFT. The reason that discrepancy occurs between
STFT and MFCC methods is about speech signal behavior. Because its
power is mostly in low band frequencies, MFCC can represent a signal better
and give better results.
Dostları ilə paylaş: