• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!


EEN540 Project 1

Page history last edited by Leon D\'Angio 15 years, 4 months ago

The Acoustic Features of Speech Sound

Assignment Sheet

     Speech is the most preferred means of human communication. Language is the acoustic code that gives meaning to a sequence of spoken sounds. The smallest constituent of spoken language is the phoneme. Every language has a unique set of phonemes. The English language has approximately 40 phonemes. Different combinations of these constituent phonemes make up every word in the English language.


     Phonemes can be characterized in a number of ways: manner of production, manner of articulation, temporal characteristics, or spectral characteristics. This project will analytically examine the phonemes of the English language using a MatLab function, phoneme_analyzer.  This MatLab function takes as input the name of the .WAV file, name of the example word, phoneme symbol, phoneme start time, and phoneme end time. It outputs 3 figure windows:

     Figure 1 : Example word time waveform, 30ms phoneme waveform, phoneme's magnitude spectrum and spectral envelope

     Figure 2 : Example word time waveform, narrowband spectrogram of word, wideband spectrogram of word

     Figure 3 : 3-D plot of frequency vs. time vs. power spectral density


     Below are links to each of the 40 phonemes examined. Each link includes the function output for each phoneme's recorded .WAV file. Each word was recorded as a mono-channel .WAV file at 16kHz sampling rate and 16 bits/sample bit depth. Silence was removed from the beginning and end of the recorded word, and the recording was normalized. The function phoneme_analyzer will work with any .WAV file of a word as long as the user knows the temporal location of the phoneme in the example word used. Also below is a .ZIP archive containing all 40 recordings used for the links below, as well as the MatLab function created to do the analysis.


     Click HERE for an archive containing the .WAV of all 40 phonemes.     (LeonDangio_English_Phonemes.zip)

     Click HERE for the MatLab function used to analyze each phoneme.     (phoneme_analyzer.m)



     Vowels are produced by the vibration of the vocal folds and the length of the vocal tract. The three categories above (center, front, and back) describe the position of the tongue in the mouth. The position dictates the effective length of the vocal tract.

     A temporal characteristic of a vowel is its quasi-periodic waveform. This is caused by the vibration of the vocal tract. By taking the reciprocal of the approximate period yields the fundamental frequency of the speaker's voice. This quasi-periodicity also gives rise to harmonic characteristics in the frequency domain. These harmonics can clearly be seen in the spectrograms for each word as either horizontal striations in the narrowband spectrogram or vertical striations in the wideband spectrogram. Another spectral characteristic of the vowel is the presence of resonances in the spectral envelope. These resonances, or formants, are the theoretical poles if the vocal tract was modeled as a system. The formants can clearly be seen in the spectral envelope of the center vowels above: bird /R/ and up /A/. 



     Semi-vowels, characterized as glides or liquids, are vowels that form dipthongs with full syllabic vowels. They have similar characteristics to vowels. Liquids usually increase in frequency with time. This is clear in the spectrogram for /r/. Both liquids and glides are quasi-periodic in the time domain.



     Consonants are articulated with either partial or full closure of the upper vocal tract. Consonants are divided into four main categories: Nasals, Plosives, Whispers, and Fricatives.

     Nasals, like vowels, have a quasi-periodic waveform in the time domain. Unlike vowels, the time waveform has more of a triangular shape. The spectrograms show that the periodicity for nasals is not as pronounced as vowels. This can be seen well in the narrowband spectrogram for Me /m/.

     Plosives are impulsive sounds, caused by sudden bursts of air. Plosives have generally noisy spectrums.

     Fricatives can be either voiced or unvoiced. All fricatives have noisy spectra due to constriction in the oral tract. Voiced fricatives are quasi-periodic in the time-domain, and noisy in the time and frequency domains. Unvoiced fricatives are purely noisy, both in time and frequency.

     Whispers are very similar characteristically to unvoiced fricatives in that they are purely noisy.



     Affricates are sometimes considered transitional sounds, but could fall under consonants. An affricate is a combination of plosive and fricative sounds. The unvoiced affricate, /J/, is noisy in the time and frequency domain. The voiced affricate, /tS/, is quasi-periodic in the time domain.



     Dipthongs, like semi-vowels, are a main category of the transitional phonemes. These sounds are voiced in nature, and therefore have periodicity in the time domain.


Comments (0)

You don't have permission to comment on this page.