Decomposition of Speech into Modulators and Temporal Fine Structure
Les Atlas, Ph.D.
Virginia Merrill Bloedel Hearing Research Scholar
Professor of Electrical Engineering
University of Washington; Seattle, WA
Friday, May 10th, 2013
…musical tones are the simpler and more regular elements of the sensations of hearing, and that we have consequently first to study the laws and peculiarities of this class of sensations.
Hermann von Helmholtz, On the Sensations of Tone as a Physiological Basis for the Theory of Music, 2nd English Edition (A. Ellis), translated from the 4th German Edition of 1877, Longman Green, London, 1885, Page 7.
It has been 135 years since this passage was written, yet we still have no formal foundation for going beyond what Helmholtz brilliantly saw as the building blocks he called “musical tones,” which we now simply call “frequency.” Helmholtz also saw that “beats of simple tones” and “beats due to combinational tones” or “differential tones” [op. cit., Page 159.] formed sum and difference beats. We now call the generalization of this effect “modulations” or “envelopes.”
Since the time of Helmholtz, science and technology has developed radio and then very high-speed digital communications, revolutionizing the way we now live. Concepts from 1920’s to 1930’s AM and FM radio communications still provide a perhaps outdated foundation. Researchers conventionally model the above modulations as “envelopes,” which multiply “carriers” or, equivalently, “temporal fine structure.” These envelopes, as typically derived after subband filtering, are Hilbert envelopes or, with perhaps a closer connection to physiology, rectified and lowpass filtered real envelopes. Yet as will be argued, hearing science’s current foundation, which has frequency subband envelopes modulating temporal fine structure carriers, is still not as precise as Helmholtz was with single tones and harmonics.
Our talk will begin with demonstrations of simple combinations of tones which have identical envelopes yet sound obviously different. We will show, assuming sufficiently low rate envelopes, how important it is to remove this ambiguity, especially for speech. We will then suggest how a simple and physiologically viable decomposition into an envelope, with a new form of temporal fine structure, removes this ambiguity. These results raise new questions about relative roles of temporal fine structure in everyday audio and speech for normal hearing listeners. These results, with our new definition of temporal fine structure, also suggest differing decoding of speech in steady noise versus multiple simultaneous masking talkers.
Three Learner Outcomes: Temporal fine structure has been ill-defined. Hilbert phase is a flawed method for finding temporal fine structure. A new additive definition of temporal fine structure, with underlying theory and physiological plausibility, is instead hypothesized.