the American English stop consonants are investigated. Features
Acoustic phonetic features for the autom
the American English stop consonants are investigated. Features
studied in the literature are evaluated for their information con-
tent and new features are proposed. A statistically guided, knowl-
edge-based, acoustic–phonetic system for the automatic classifica-
tion of stops, in speaker independent continuous speech, is pro-
posed. The system uses a new auditory-based front-end processing
and incorporates new algorithms for the extraction and manipu-
lation of the acoustic–phonetic features that proved to be rich in
their information content. Recognition experiments are performed
using hard decision algorithms on stops extracted from the TIMIT
database continuous speech of 60 speakers (not used in the design
process) from seven different dialects of American English. An ac-
curacy of 96% is obtained for voicing detection, 90% for place
of articulation detection and 86% for the overall classification of
stops.
Index Terms— Acoustic–phonetic, feature extraction, phoneme
recognition, speech recognition, stop consonants.
N
OMENCLATURE
ALSD
Average localized synchrony detector de-
veloped by the authors [3], [5].
Burst Spectrum
Spectral shape during the burst (i.e., re-
lease) of the stop.
BF
Burst frequency defined in (1).
DRHF
Dominance relative to the highest filters
defined in (4).
Fi
th formant.
GSD
Generalized synchrony detector devel-
oped by Seneff [36], [37].
LINP
Laterally Inhibited MDP defined in (5).
MNSS
Maximum normalized spectral slope de-
fined in (2).
MDP
Most dominant peak defined as the peak
with the largest amplitude or slope.
Prevoicing
Voicing during the closure period of the
stop.
Manuscript received June 29, 1999; revised August 15, 2001. This work was
supported under a grant from the Catalyst Foundation. The associate editor co-
ordinating the review of this manuscript and approving it for publication was
Dr. Rafid A. Sukkar.
A. M. A. Ali is with Texas Instruments, Inc., Research and Development,
Warren, NJ 07059 USA and also with the Department of Electrical Engi-
neering, University of Pennsylvania, Philadelphia, PA 19104-6390 USA
(e-mail: ahm@ee.upenn.edu).
J. Van der Spiegel is with the Department of Electrical Engineering,
University of Pennsylvania, Philadelphia, PA 19104-6390 USA (e-mail:
jan@ee.upenn.edu).
P. Mueller is with Corticon, Inc., King of Prussia, PA 19406 USA (e-mail:
cortion@aol.com).
Publisher Item Identifier S 1063-6676(01)09663-8.
VOT
Voicing onset time: time from the stop re-
lease to the voicing onset of the following
vowel.
VF2
Second formant of the following vowel.
I. I
NTRODUCTION
D
ESPITE the long history of research on the acoustic char-
acteristics of stop consonants, current state-of-the-art au-
tomatic speech recognition (ASR) systems are still incapable of
performing accurate fine phoneme distinctions for this class of
sounds. One of the main reasons for this is the dynamic, short,
speaker- and context-dependent nature of these sounds. The in-
formation that exists in the literature is neither sufficient nor
consistent enough to be integrated in an ASR system.
The stop consonants,
and
and their voiced cog-
nates
and
, are a class of sounds which is formed
by the greatest degree of obstruction and a complex of move-
ments in the vocal tract. The articulators form an oral occlusion
(closure) behind which pressure is built up. The location of the
oral occlusion, i.e., the place of articulation, could be bilabial
(
and
), alveolar (
and
) or palatal/velar (
and
). During the closure period, the vocal cords may or may
not vibrate. If they do, the stop is said to be prevoiced. After the
closure phase comes the release phase. In the release, the oral
occlusion is broken, releasing the air pressure and allowing the
air to resume its flow. When stops are released, an audible burst
of noise results. This burst of noise is different from the frica-
tive noise in being transient and not prolongable. This gives the
stops the property of not being continuants.
In this work, we investigate the acoustic-phonetic character-
istics of stop consonants. We combine expert knowledge and
statistical analysis in a hybrid approach, to gain a better under-
standing of the role of various static and dynamic features in the
recognition process (individually and combined). The designed
system can be best described as a statistically guided knowl-
edge-based system. It uses a new auditory-based front-end that
generates mean-rate and synchrony outputs.
We concentrate here on the characteristics responsible for
classifying the stops (i.e., detecting the place of articulation
and voicing). Extracting the stops, on the other hand, is dis-
cussed in more detail elsewhere [5] as a part of a segmentation
and phoneme categorization system. This system is used in the
present experiments to extract the stops and mark their different
segment boundaries.
In the next section, the acoustic–phonetic features of stop
consonants, which exist in the literature, are discussed. In the
1063–6676/01$10.00 © 2001 IEEE
834
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 8, NOVEMBER 2001
following sections, the results of our research on the character-
istics of stop consonants and their automatic classification are
discussed.
II. A
COUSTIC
–P
HONETIC
F
EATURES
Dostları ilə paylaş: