A procedure is presented that is capable
of accurately extracting various speaker features, and is of particular value
for analyzing records containing single words and shorter durations of speech.
By taking advantage of the fast convergence of adaptive filtering, the approach
is capable of modeling the nonstationarities due to both the vocal tract and
vocal cord dynamics. This procedure is quite simple, requires no manual intervention,
and is particularly unique because it derives both the vocal tract and glottal
signal estimates directly from the time-varying filter coefficients rather than
the prediction error signal. Several glottal signals are derived using this
procedure, and are plotted to demonstrate the kind of glottal characteristics
obtained therein. Finally, in order to provide a more quantitative performance
measure, the procedure is used in a simple automatic speaker identity verification
application. |