Tezislər / Theses

Short-time Fourier transform

Yüklə 17,55 Mb.

Pdf görüntüsü

səhifə	178/493
tarix	02.10.2023
ölçüsü	17,55 Mb.
	#151572

1 ... 174 175 176 177 178 179 180 181 ... 493

BHOS Tezisler 2022 17x24sm

THE 3 rd INTERNATIONAL SCIENTIFIC CONFERENCES OF STUDENTS AND YOUNG RESEARCHERS

Short-time Fourier transform
power spectrum of the sound can be
used obtained or in other words, we build a picture model for each command.
Pictures are simply arrays that can be processed and trained through
machine learning models. Short-time Fourier transform as the name implies
is applying Fourier transform on small segments of the signal rather than the
whole of it by applying a window function. It is a widely used method to extract

THE 3
rd
INTERNATIONAL SCIENTIFIC CONFERENCES OF STUDENTS AND YOUNG RESEARCHERS
dedicated to the 99
th
anniversary of the National Leader of Azerbaijan Heydar Aliyev
187
features in audio for time-frequency decompositions [1]. Mathematically,
STFT can be defined as below:
𝑆(𝑚, 𝑘) =
𝑥(𝑛 + 𝑚𝐻) ∗ 𝑤(𝑛) ∗ 𝑒
MFCC
: After Fourier transform, Mel-frequency cepstral coefficients can
be obtained which is the most recommended method for speech recognition
applications. The reason that MFCC is about the nature of speech signals.
Speech is simply convolution of the vocal tract and frequency response with
glottal pulse [2]. Because in real life we already have a speech signal as an
output, it should be decomposed into two parts to extract features from it.
𝐶 𝑥(𝑡) = 𝐹 [log(𝐹[𝑥(𝑡])] − 𝑐𝑒𝑝𝑠𝑡𝑟𝑢𝑚
Aside from that linear prediction coefficients (LPC), perceptual linear
prediction coefficients (PLP), RASTA-PLP, power-normalized cepstral
coefficients (PNCC) and deep bottle-neck features (DBNF) can be used for
speech recognition applications.
CNN
: Convolution neural network is a form of standard neural network
which consists of three main layers; convolutional, pooling, and fully-
connected layers. As layers are added, complexity increases but more parts
of the image are specified. CNN has three key properties [3]:

Ability to decrease non-white noise effect –
locality.

Model robustness can be increased,
overfitting and number of weights can be
decreased – weight sharing.

Size can be reduced – pooling.
The block diagram for the overall process is
given on the right. So, images that are obtained from
STFT and MFCC operations can be given to the
CNN for training and building the model. With
regards to the STFT, results are low, whereas
MFCC generates higher results which shows the
importance of MFCC in comparison with using only
STFT. The reason that discrepancy occurs between
STFT and MFCC methods is about speech signal behavior. Because its
power is mostly in low band frequencies, MFCC can represent a signal better
and give better results.

Yüklə 17,55 Mb.

Dostları ilə paylaş:

1 ... 174 175 176 177 178 179 180 181 ... 493