Tezislər / Theses


Short-time Fourier transform



Yüklə 17,55 Mb.
Pdf görüntüsü
səhifə178/493
tarix02.10.2023
ölçüsü17,55 Mb.
#151572
1   ...   174   175   176   177   178   179   180   181   ...   493
BHOS Tezisler 2022 17x24sm

Short-time Fourier transform
power spectrum of the sound can be 
used obtained or in other words, we build a picture model for each command. 
Pictures are simply arrays that can be processed and trained through 
machine learning models. Short-time Fourier transform as the name implies 
is applying Fourier transform on small segments of the signal rather than the 
whole of it by applying a window function. It is a widely used method to extract 


THE 3
rd
 INTERNATIONAL SCIENTIFIC CONFERENCES OF STUDENTS AND YOUNG RESEARCHERS 
dedicated to the 99
th
anniversary of the National Leader of Azerbaijan Heydar Aliyev
187
features in audio for time-frequency decompositions [1]. Mathematically, 
STFT can be defined as below: 
𝑆(𝑚, 𝑘) =
𝑥(𝑛 + 𝑚𝐻) ∗ 𝑤(𝑛) ∗ 𝑒
MFCC
: After Fourier transform, Mel-frequency cepstral coefficients can 
be obtained which is the most recommended method for speech recognition 
applications. The reason that MFCC is about the nature of speech signals. 
Speech is simply convolution of the vocal tract and frequency response with 
glottal pulse [2]. Because in real life we already have a speech signal as an 
output, it should be decomposed into two parts to extract features from it. 
𝐶 𝑥(𝑡) = 𝐹 [log(𝐹[𝑥(𝑡])] − 𝑐𝑒𝑝𝑠𝑡𝑟𝑢𝑚
Aside from that linear prediction coefficients (LPC), perceptual linear 
prediction coefficients (PLP), RASTA-PLP, power-normalized cepstral 
coefficients (PNCC) and deep bottle-neck features (DBNF) can be used for 
speech recognition applications. 
CNN
: Convolution neural network is a form of standard neural network 
which consists of three main layers; convolutional, pooling, and fully-
connected layers. As layers are added, complexity increases but more parts 
of the image are specified. CNN has three key properties [3]: 

Ability to decrease non-white noise effect – 
locality. 

Model robustness can be increased, 
overfitting and number of weights can be 
decreased – weight sharing. 

Size can be reduced – pooling. 
The block diagram for the overall process is 
given on the right. So, images that are obtained from 
STFT and MFCC operations can be given to the 
CNN for training and building the model. With 
regards to the STFT, results are low, whereas 
MFCC generates higher results which shows the 
importance of MFCC in comparison with using only 
STFT. The reason that discrepancy occurs between 
STFT and MFCC methods is about speech signal behavior. Because its 
power is mostly in low band frequencies, MFCC can represent a signal better 
and give better results. 

Yüklə 17,55 Mb.

Dostları ilə paylaş:
1   ...   174   175   176   177   178   179   180   181   ...   493




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin