Key Words : Digital signal processing, Machine learning,
Segmentation, Fundamental frequencies, pyAudioAnalysis,
Fundamental frequency estimation, Time-Frequency graph.
1. INTRODUCTION Digital signal process (DSP) a numerical manipulation of
signals usually to measure features produced over compress
continuous analog signals. It is characterized by the use of
digital signals to represents these signals as discrete-time,
discrete frequency or other discrete domain signals in the
form of a sequence of numbers symbols to permit digital
processing of signals. Numerical methods required digital
signals such as to produce analog to digital convertor. Digital
signal processing and analog signal processing are subfields
of signal processing. DSP applications are speech and audio
signal processing, sonar and radar signal processing, spectral
estimation, statistical processing, digital image processing,
signal processing for communication.
Segmentation is a completely vital processing level for
maximum of audio evaluation programs. The intention is to
cut up an uninterrupted audio signal into homogeneous
segments. Segmentation can either be
Supervised: in that case some sort of supervised information
is used to categories and section the enter indicators. This is
both carried out via applying a classifier a good way to
classify successive restore-sized segments to a hard and fast
of predefined lessons, or employing using a HMM method to
achieve joint segmentation-category.
Unsupervised: a supervised model isn't always to be had and
the detected segments are clustered (instance: speaker
dualization)
Applications of audio content analysis may be categorized in
two categories. One part is to discriminate an audio stream
into homogenous areas and the alternative categories is to
discriminate a speech movement into segments, of different
speakers.
Audio segmentation algorithms may be divided into 3
fashionable categories. In the primary class, classifiers are
designed. The functions are extracted in time domain and
frequency area; then classifier is used to discriminate audio
signals primarily based on its content material. The second
category of audio segmentation extracts capabilities on
statistics that is used by classifier for discrimination. These
styles of capabilities are known as posterior probability-
based features. Large quantity of observed records is needed
by using the classifier to present correct results. The
category of audio segmentation set of rules emphasizes
putting in effective classifiers. The classifiers used in this
category are Bayesian facts criterion, Gaussian chance ratio,
and a hidden Markov version (HMM) classifier. These
classifiers additionally provide excellent effects whilst huge
training facts is provided
The analysis of superimposed speech is a complicated
trouble and progressed performance systems are required.
In many audio processing applications, audio segmentation
plays a critical position in preprocessing step. It additionally
has a good-sized impact on frequency recognition
performance. That is why a fast and optimized audio class
and segmentation algorithm is proposed which can be used
for real-time packages of multimedia. The audio input is
classed and segmented into four primary audio types:
natural-eco, noise, environment sound, and silence. A set of
rules is proposed that calls for less training facts and from
which high accuracy may be achieved; this is,
misclassification rate is minimum.
[2] The proposed method of signal segmentation is based
upon the two sliding overlapping windows and the detection
of signal properties changes. [4] Most of the researches
integrated segmentation approaches with some intelligent
techniques such as neural network, support vector machines