Amer Sallam, Nashwan Al-Khulaidi


Speech is the output of a time varying excitation excited by a time varying
system. It generates pulses with fundamental frequencies F0. This time varying impulse is trained as one of the features, and characterized by fundamental frequency F0 and its formant frequencies. These features vary from one
speaker to another and from a gender to another one as well. In this paper
the accent issues in continuous speech recognition system are considered.
Variations in F0 and formant frequencies are the main features that characterize variation in a speaker. The variation becomes considerably less within
a speaker, medium within the same accent and very high among a diffrent
accent. This variation in information can be exploited to recognize gender
type and to improve performance of speech recognition systems through customizing separate models based on gender type information.
Five sentences are selected for training. Each of the sentences are spoken
and recorded by 5 female speakers and 5 male speakers. The speech corpus
will be preprocessed to identify the voiced and unvoiced region. The voiced
region is the only region which carries information about F0. From each
voiced segment, F0 is computed. Each forms the feature space labeled with
the speaker identifiation: i.e., male or female. This information is used to
parameterize the model for male and female. The K-means algorithm is used
during training as well as testing. Testing is conducted in two ways: speaker
dependent testing and speaker independent testing. SPHINX-III software by
Carnegie Mellon University has been used to measure the accuracy of speech
recognition of data taking in to account the case of gender separation which
has been used in this research.

Full Text:



  • There are currently no refbacks.

Follow me on Academia.edu