4.1.2.Acoustic representation
In the most common type of spectrogram the linear Hertz frequency scale is used. The difference between 100 Hz and 200 Hz is the same as the difference between 1000 Hz and 1100 Hz. However, our perception of frequency is non-linear. We hear the difference between 100 Hz and 200 Hz as an octave interval, but also the difference between 1000 Hz and 2000 Hz is perceived as an octave. Our ear evaluates frequency differences not absolutely, but relatively, namely in a logarithmic manner. Therefore, in the Barkfilter, the Bark-scale is used which is roughly linear below 1000 Hz and roughly logarithmic above 1000 Hz (Zwicker and Feldtkeller, 1967).
In the commonly used type of spectrogram the power spectral density is represented per frequency per time. The power spectral density is the power per unit of frequency as a function of the frequency. In the Barkfilter the power spectral density is expressed in decibels (dB’s). “The decibel scale is a way of expressing sound amplitude that is better correlated with perceived loudness” (Johnson, 1997, p. 53). The decibel scale is a logarithmic scale. Multiplying the sound pressure ten times corresponds to an increase of 20 dB. On a decibel scale intensities are expressed relative to the auditory threshold. The auditory threshold of 0.00002 Pa corresponds with 0 dB (Rietveld and Van Heuven, 1997, p. 199).
A Barkfilter is created from a sound by band filtering in the frequency domain with a bank of filters. In PRAAT the lowest band has a central frequency of 1 Bark per default, and each band has a width of 1 Bark. There are 24 bands, corresponding to the first 24 critical bands of hearing as found along the basilar membrane (Zwicker and Fastl, 1990). A critical band is an area within which two tones influence each other’s perceptibility (Rietveld and Van Heuven, 1997). Due to the Bark-scale the higher bands summarize a wider frequency range than the lower bands.
In PRAAT we used the default settings when using the Barkfilter. The sound signal is probed each 0.005 seconds with an analysis window of 0.015 seconds. Other settings may give different results, but since it was not a priori obvious which results are optimal, we restricted ourselves to the default settings. In Figure 4 Barkfilters for some segments are shown.
F igure 4. Barkfilter spectrograms of some sounds pronounced by John Wells. Upper the [i] (left) and the [e] (right) are shown, and lower the [p] (left) and the [s] (right) are visualized.
Dostları ilə paylaş: |