Tikalon Header

A Voice from the Crypt

March 16, 2020

After a deluge of anthropomorphic animals in videos, children are surprised at an older age that real animals are not as articulate as Peppa Pig. Darwin ascribed the animals' lack of speech to inferior brain development, but later scientific reasoning was that nonhuman primates, such as chimpanzees, can't talk since they didn't have as developed a vocal tract as humans.

However, scientists at Princeton University (Princeton, New Jersey) and the Vrije Universiteit (Brussel, Belgium) analyzed xray videos of a long-tailed macaque in 2016 to build a computer model of its vocal tract.[1-2] They found that this primate and its close primate cousins could produce intelligible speech. The reason that they don't is that there isn't sufficient neural control of their vocal tract muscles.[1-2]

 Blue-fronted Parrot (Amazona aestiva) by Mateus Hidalgo

Parrots can mimic human speech, and this allows Polly to get her cracker reward. A "talking" animal offers some entertainment, but it also needs to be fed; and, in the case of a parrot, the occasional fresh newspaper at the bottom of its cage. A talking machine is much more impressive, and such mechanical speech synthesis was accomplished by the German-Danish scientist, Christian Gottlieb Kratzenstein, who created a device based on the human vocal tract in 1779.

Charles Wheatstone (1802-1875) of the eponymous Wheatstone bridge produced a bellows-operated "speaking machine" in 1837 that included the vocal tract, and also a tongue and lips. A less complicated speech mechanism was contained in the 1824 "mama doll" of German inventor, Johann Nepomuk Maelzel (1772-1838).[3] Maelzel's mama doll's voice box was bellows-activated,[3-4] and US inventor, William A. Harwood, patented a variation of this device in which a child would blow air through a tube.[5]


(Modified Wikimedia Commons photograph of a Blue-fronted Parrot (Amazona aestiva) by Mateus Hidalgo,)


Such speaking devices use what's called articulatory synthesis in which the mechanism of human speech is physically emulated. In our computer age, the first step beyond the static mama doll voice box was using a computer to control a dynamic vocal tract, which was done in the 1960s. The next step was to model the human speech mechanism, which includes the resonant vocal chords, acoustic wave propagation along the vocal tract, and sound modification by the lips and tongue, and simulate it using electronics. Software has advanced to the point at which a free and open source implementation is available in the form of gnuspeech.

As I wrote in an earlier article (Computers as Listeners and Speakers, November 4, 2013), human speech information is contained in the audio frequency band below 20 kHz, but intelligible speech, such as that found in early telephone systems, is contained in frequencies between 300-3400 Hz, with most of the amplitude contained between 80-260 Hz. Telephone research resulted in the first form of digital encoding of speech in a system called the vocoder, patented in 1939 by Bell Labs acoustical engineer, Homer Dudley.[6]

Spectrograms of vowel sounds (Daniel E. Re, et al., PLoS ONE, 2012)

Spectrograms of the average female (left) and male (right) voicing of vowels. These are the English vowel sounds, 'eh' (bet), 'ee' (see), 'ah' (father), 'oh' (note), and 'oo' as in (boot). Note the overall lower frequencies of the male voice, as well as the slower male cadence. (Fig. 1 of ref. 7, licensed under a Creative Commons License. Click for larger image.)[7]


Dudley's idea, as shown in the figure from his patent,[6] was to detect the amplitude of the speech signal in audio bands selected by a bank of audio filters. Instead of the speech signal itself, these amplitudes could be transmitted to a remote bank of oscillators to reconstructed the signal. All this was a tour de force in 1939 when vacuum tubes were used to perform the necessary bandpass filtering and amplitude detection.

Portion of fig. 1 of US Patent No. 2,194,298, 'System for the artificial production of vocal or other sounds,' by Homer W. Dudley, March 19, 1940.

Portion of fig. 1 of US Patent No. 2,194,298, "System for the artificial production of vocal or other sounds," by Homer W. Dudley, March 19, 1940.[6]

As the circuit shows, Dudley realized that white noise is an important component of speech.

(Via Google Patents.[6] Click for larger image.)


Dudley's research was extended to the development of speech synthesis using the combined techniques of formant synthesis and linear predictive coding (LPC).[8] These techniques were commercialized in 1978 in the Texas Instruments Speak & Spell toy. Several decades later, my e-book reader has an excellent text-to-speech feature with both male and female speakers. My Linux desktop computer has a simple text-to-speech program called espeak. You can hear an mp3 example of a portion of Hamlet's To be, or not to be soliloquy rendered using espeak, here.[9]

An interdisciplinary research team from the University of London (United Kingdom), the University of York (York, United Kingdom), the Leeds Museums and Galleries (Leeds, United Kingdom), the Leeds General Infirmary (Leeds, United Kingdom), and the University of Tübingen (Tübingen, Germany) have recently reported on an unusual project in voice synthesis in an open access paper in the journal, Scientific Reports.[10-11] They used computer tomographic (CT) scans of the vocal tract of a 3,000 year old mummy to develop a precise model of its vocal tract and create a 3-D printed version of this vocal tract.[10] This simulated vocal tract was then used to create a vowel sound that's similar to that produced by modern humans.[10-11]

The mummy, sited at the Leeds City Museum in northern England, was that of an Egyptian priest, Nesyamun.[10-11] Nesyamun was a scribe and a high-ranking priest at the Karnak temple in Thebes who held his position during the reign of Ramses XI, pharaoh of Egypt from 1099 BC-1069 BC.[11] The nature of the specimen limited the scope of the study. The vocal tract has just a single shape, this shape being the consequence of Nesyamun's supine burial position.[11] The sound produced is just a groan that sounds like eeuuughhh, and it would be the sound made if his vocal tract came to life again in his coffin.[11]

Nesyamun in hieroglyphs

Nesyamun's name, spelled in Egyptian hieroglyphs, as inscription on his coffin. Nesyamun was a common name that translates to mean, "The one belonging to the God, Amun."[11] (Fig. 2 of ref, 10, licensed under a Creative Commons Attribution 4.0 International License.)


Nesyamun was a fitting object for this study, since he performed rituals in song and speech. He seemingly died from an allergic reaction in his mid-50s, and he suffered also from gum disease and heavily worn teeth. The inscriptions on his coffin attest to Nesyamun's hope that his soul would one day speak to the gods as he had in his lifetime.[11] In a way, this scientific study fulfilled this wish. Other obstacles to generation a true sound is that the tongue is missing, and there's a lack of the fleshy, vibrating, vocal folds that add timbre to a voice.[11] In any event, this recorded sound will add an interesting dimension to the museum display.[11]

References:

  1. W. Tecumseh Fitch, Bart de Boer, Neil Mathur, and Asif A. Ghazanfar, "Monkey vocal tracts are speech-ready," Science Advances, vol. 2, no. 12 (December 9, 2016), Article no. e1600723, DOI: 10.1126/sciadv.1600723.
  2. Michael Price, "Why monkeys can't talk—and what they would sound like if they could," Science, December 9, 2016.
  3. Patrick Feaster, "A Cultural History of the Edison Talking Doll Record," Thomas Edison National Historical Park Website.
  4. Daniel Tiffany, "Toy Medium: Materialism and Modern Lyric," University of California Press, March 8, 2000, p. 58 (via Google Books).
  5. William A. Harwood, "Improvement in Talking and Crying Dolls," U. S. Patent no. 189,935 (April 24, 1877).
  6. Homer W Dudley, "Signal transmission," US Patent no. 2,151,091, March 21, 1939.
  7. Daniel E. Re, Jillian J. M. O'Connor, Patrick J. Bennett and David R. Feinberg, "Preferences for Very Low and Very High Voice Pitch in Humans," PLoS ONE, vol. 7, no. 3 (March 5, 2012), Article No. e32719.
  8. Physical Audio Signal Processing, Voice Synthesis, Vocal Tract Analog Models, Free Books.
  9. The Linux command line is espeak "To be, or not to be--that is the question: Whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune; or, to take arms against a sea of troubles, and by opposing end them."
  10. D. M. Howard, J. Schofield, J. Fletcher, K. Baxter, G. R. Iball, and S. A. Buckley, "Synthesis of a Vocal Sound from the 3,000 year old Mummy, Nesyamun True of Voice, Scientific Reports, vol. 10, Article no. 45000 (January 23, 2020), https://doi.org/10.1038/s41598-019-56316-y. This is an open access article with a PDF file here.
  11. Meilan Solly, "Listen to the Recreated Voice of a 3,000-Year-Old Egyptian Mummy," Smithsonian Magazine, January 24, 2020.