Tikalon Header

Proto-Tongues

February 18, 2013

I worked for many years in industrial research at a manufacturing company, and we scientists were subjected to training in the management fad of the year, such as Total Quality Management, along with the other corporate employees. One possible reason for the demise of industrial research in America is the fact that too much of our time was spent doing useless, non-scientific things devised by business school theoreticians.

I was even required to make
poster presentations to hang in my laboratory to create a "visual workplace." At my billing rate, those were worth more per square foot than sheets of fifty dollar bills. If the purpose of these business fads was to increase employee productivity, they certainly had the opposite effect in the laboratories.

One emphasis at the time was
customer surveys. After all, "the customer is always right," so we need to know what he wants. Unfortunately, the customer didn't understand that what he wanted couldn't be done because of the limitations of the available materials, and he wasn't willing to pay for any fundamental materials research into possibly attaining his impossible goals.

Just as a
cook, or a materials scientist, is limited in his recipes by the ingredients at hand, spoken language is limited by the possible sounds produced by the human vocal tract. It's no wonder an infant's first word is "ma-ma," since these sounds are so easy to produce. They're much easier to say than "da-da," so there's a scientific reason why fathers shouldn't feel slighted.

Modern languages have a broad
vocabulary of nuanced meanings and delicate vocalizations, but early man limited himself to some simple words; and, probably, a lot of hand gestures. Some basic words, such as you, I, man, woman, etc., must have been part of this early language, so it's interesting to see how such basic words may have evolved over time. Spotting trends, we might also conjecture about the rest of the early vocabulary.

Figure captionWhat article about linguistics would be complete without a phtograph of the Rosetta Stone, seen here on display in the British Museum?

This stone, inscribed in
Egyptian hieroglyphs, Demotic and Ancient Greek, allowed the decipherment of hieroglyphics.

Not surprisingly,
Egypt would like to have its stone repatriated.

From a photograph by Félix Martín Sánchez, via
Wikimedia Commons.)

Scientists at
The University of Reading have found that the English words, I, we, two and three, have persisted for tens of thousands of years, whereas the current words, squeeze, guts, stick and bad, will soon be extinct.[1] One clue to the ephemeral nature of the word, "dirty," is that it's said 46 different ways among the Indo-European languages, whereas persistent words are regionally similar. The Reading researchers identified two hundred words persistent words, their commonality being that they don't relate to technology, and they don't have a specific cultural reference.[1]

Computers, of course, make research such as this much easier to do. A research team of a statistician, a psychologist, and two computer scientists from the University of California, Berkeley, and the University of British Columbia have developed a computer program for the rapid reconstruction of the proto-languages from which our modern languages have evolved.[2-8] This is reported as an open access paper in the Proceedings of the National Academy of Sciences (PNAS).[2]

Spoken language existed much before
writing, the invention of which dates back about 6,000 years.[3] Even after writing, the number and types of texts were limited. You can imagine the difficulty tracing words back through time to discover the protolanguages, the ancient languages from which other languages evolved.[6] The set of protolanguages includes Proto-Indo-European, Proto-Afroasiatic and Proto-Austronesian. The later, which is the root language of the languages of Southeast Asia, parts of continental Asia, Australasia and the Pacific Islands, is the object of the PNAS study.[3]

Linguists presently reconstruct these protolanguages manually by a process known as the comparative method, which is based on the idea that sounds change in certain ways, leaving patterns that humans, or computers, can find.[2-3,8] Says Alexandre Bouchard-Côté, lead author of the study, an assistant professor of statistics at the University of British Columbia and formerly a graduate student at UC Berkeley where the project originated,
"To understand how language changes -- which sounds are more likely to change and what they will become -- requires reconstructing and analyzing massive amounts of ancestral word forms, which is where automatic reconstructions play an important role."[3]

Computers are well suited to pattern matching, so computing was investigated as a way to speed the comparative method process.[5,8] The research team chose the 637 languages of the Proto-Austronesian group as a test case.[2-3,7-8] Assembling a
database of 142,000 words, the computer system was able to reconstruct the protolanguage spoken about 7,000 years ago.[4-6,8]

Computer analysis was by the
Markov chain Monte Carlo sampler algorithm, which looked at the words in different languages sharing common sound, history and origin.[3-4] Rules are applied, such as one that paired sounds will be condensed into a single sound if the result is not confusing.[7] This allowed a probabilistic inference of what the protolanguage word might be.[2-3]

The computer analysis was able identify 85% of protolanguage words to within one character of those determined by skilled linguists, but at a far faster rate.[2-8] This is a good result; but, at this point, it's not a valid replacement for a linguist.[7] Coauthor,
Dan Klein, is quoted by the BBC as saying,
"Our system still has shortcomings. For example, it can't handle morphological changes or re-duplications - how a word like 'cat' becomes 'kitty-cat'."[8]
One success of the analysis program is that it seems to confirm a 1955 hypothesis called "
functional load." This hypothesis, which seems intuitive to me, since I never understood why English persists in having their, there, and they're, is that sounds that distinguish one word from another are more resistant to change.[7]

Is any of this important?
Timelines are an important part of history, and the words used to describe certain events may place them in sequence.[5] One interesting linguistic fact, mentioned in a Time Magazine article by Matt Peckham, is that the word, "alcohol," has been essentially unchanged from its Sumerian form six thousand years ago.[5] This is another example that essential words are resistant to change.

References:

  1. 'Oldest English words' identified, BBC News, February 26, 2009.
  2. Alexandre Bouchard-Côté, David Hall, Thomas L. Griffiths and Dan Klein, "Automated reconstruction of ancient languages using probabilistic models of sound change," Published online before print February 11, 2013, doi: 10.1073/pnas.1204678110. A copy of the open access PDF file is available, here.
  3. Yasmin Anwar, "Scientists create automated ‘time machine’ to reconstruct ancient languages," University of California, Berkeley, Press Release, February 11, 2013.
  4. Computerized ‘Rosetta Stone’ reconstructs ancient languages, University of British Columbia Press Release, February 11, 2013.
  5. Matt Peckham, "Move Over, BabelFish: Computer Program Reconstructs Lost Tongues," Techland, Time Magazine, February 12, 2013.
  6. New cunning linguist computer has got ancient tongues licked, The Register (UK), February 12, 2013.
  7. Philip Ball, "Computer program roots out ancestors of modern tongues," Nature, February 11, 2013.
  8. Rebecca Morelle, "Ancient languages reconstructed by computer program," BBC World Service, February 12, 2013.