If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

EEN540 Project 2

Page history last edited by Leon D\'Angio 15 years, 1 month ago

Sound Production Modeling Using Concatenated Acoustic Tubes

Section A

In this section, the vocal tract is approximated using a set of concatenated lossless tubes with an extra fictitous tube appended to simulate radiation loss at the lips. Given were measurements for the vowels /a/, /e/, /i/, /o/, and /u/ of the area function for the male vocal tract. This data was manipulated to also mimic the female and child vocal tract.

The sampling frequency used for this project is 11025 Hz.

Attached HERE is the MATLAB m-file used for the simulation. The code is divided into Cells using Cell Mode. Each cell represents the respective parts shown on the Assignment Sheet provided above. For each part, a simultaneous simulation is done for a male, female, and child speaker. Here is a summary of what each part computes and plots:

Part 1
- Computation of the area function, A(x). Plot of A(x) with superimosed acoustic tube approximation and fictitous tube.
- Computation of reflection coefficients. Stem plot of reflection coefficients.
Part 2
- Computation of the vocal tract transfer function, V(z). Plot of the magnitude response, |V(w)|, on a dB scale.
Part 3
- Computation of radiation model, RL(z). Plot of the magnitude response, |H(w)|=|V(w)RL(w)|, on a dB scale.
Part 4
- Computation of glottal waveform, g[n], over six periods. Plot of magnitude response, |G(w)|.
Part 5
- Output of g[n] filtered by H(z), generating synthesized speech signal, s[n].
- Plot of magnitude response, |S(w)|.
Part 6
- Generation of 2 seconds of speech.
- .WAV file produced with name 'SpeakerType_Synthesized_Vowel' (ie. Male_Synthesized_A.wav)

NOTE: Each part does calculation for male, female, and child speaker simultaneously.

RESULTS

ANALYSIS OF RESULTS:

All five synthesized vowels sound very different from one another, but if you were not aware of what vowel you were listening to it might be difficult to classify it. The /E/ and /I/ could be mistaken for one another, as well as the /O/ and the /U/. Another problem with the synthesized vowels is the apparent choppiness. This choppiness is much more apparent in the male synthesized vowel than the child synthesized vowel. I believe this choppiness is due to the creation of the glottal pulse. The glottal pulses were created based on the Rosenberg model. First, one period was created. This period was then repeated six times to create the six periods shown in the results. Two create two seconds, the six periods were appended to itself multiple times and then trimmed down to 2 seconds based on the number of samples encompassed in 2 seconds. Otherwise

if you know what vowel you are listening to and ignore the choppiness, you can hear the vowel in each synthesized version. The fundamental frequency for male, female, and child is clearly distinguishable. In the simulation, the fundamental frequencies for male, female, and child were 155Hz, 245Hz, and 345Hz respectively. These are close to the upper limits of fundamental frequency for each speaker type. This value was chosen to try to reduce some of the choppiness discussed above while still preserving the difference in pitch for each speaker.

Section B

In this section, simulations are made of of sound propogating through a 10m long, 3.5m high tunnel. Our goal is to simulate what the sound will sound like upon leaving the tunnel.

Attached HERE is the MATLAB m-file used for this simulation. The m-file returns plots of the area of the tunnel, the reflection coefficients of the tunnel, and the magnitude response of the tunnel. The m-file also generates two .WAV files; one of the output of the tunnel with no consideration for volume loss and the other with consideration for volume loss.

The simulation is tested on two input sounds:

Speech Input : .WAV file of me saying "Sound Propogation Modeling Using Concatenated Acoustic Tubes." The sampling rate for this file is 16kHz.

Music Input : .WAV file of the theme song from the '80s TV show "Batman." The sampling rate for this file is 11.025kHz.

Here are the results:

All plots and .WAV files seen below can be downloaded in this .ZIP file: Tunnel_Data.zip

Input/Output .WAV files:

Speech Input: Speech.wav

Speech Output: TunnelOutput_Speech.wav No volume consideration.

TunnelOutputSoft_Speech.wav Volume consideration.

Music Input: batman.wav

Music Output: TunnelOutput_Batman.wav No volume consideration.

TunnelOutputSoft_Batman.wav Volume consideration.

Area of the Tunnel

Reflection Coefficients of the Tunnel

Magnitude Response of the Tunnel

Note : The top response shows that for the speech input, the lower response that of the music input. They appear different because the files used to simulate each have different sampling rates. Otherwise, the tunnel only has one magnitude response independent of input.

Analysis of Results:

The effects of the tunnel are much more noticeable for the speech input. The output sounds reverberant, and also has a soft choppy noise. When simulated for the music input, the reverberances is not really noticeable, but the choopy noise is still apparent. Because the area of the tunnel is basically uniform throughout its length, its frequency response lacks the resonance of the vocal tract. This uniformity also accounts for the negligible reflection coefficients shown above. The frequency response does show a decreasing attenuation from low frequencies to high frequencies. This accounts for the reverberant effect that is heard especially for the speech case.

Comments (0)

You don't have permission to comment on this page.

EEN540 Project 2

EEN540 Project 2

Page Tools

Insert links

Comments (0)

Join this workspace

Navigator

SideBar

Recent Activity