Tikalon Header

Short Words

October 19, 2020

The American author, Kurt Vonnegut (1922-2007), is known as a writer of science fiction, but his novels are widely read, since their scientific content serves as a vehicle for his unique perspective on life. I wrote about his ice-nine in an earlier article (Fictional Materials, April 21, 2016).

Ice-nine, not to be confused with the actual ice-IX phase of water, is a fictional material in Vonnegut's novel, Cat's Cradle. Vonnegut's phase of water ice has the dangerous physical that it will spontaneously crystallize ordinary water into ice-nine under normal temperatures and pressures. Unfortunately, this phase of water is stable, and it will only melt at 45.8 degrees Celsius. This well above Earth ambient temperature even under the worst global warming scenario.

Ice-nine eventually leads to the solidification of all the world's water, and destruction of life on Earth. Vonnegut said that he got this idea from 1932 Nobel Chemistry Laureate, Irving Langmuir. Langmuir had apparently suggested this as a story line to H. G. Wells, and Vonnegut heard about it while he was working in the public relations department at General Electric. Vonnegut considered the Ice-Nine idea to be free for him to use after both Langmuir and Wells had died.

Kurt Vonnegut explaining story arcs at Case Western Reserve University in 2004

Kurt Vonnegut explaining his concept of "story arcs" at Case Western Reserve University in 2004.

(Screenshot from a YouTube video by Case Western Reserve University, November 29, 2016.)[1]


As contrasted with many authors who inflate their prose with too many words, Vonnegut's books contain less literary allusion. His books are short, and they can been read in a single sitting of a few hours. Since Vonnegut was technically educated, he devised a theory of literary plot development, graphically illustrated as story arcs. You can learn about these in a YouTube video of a February 4, 2004, lecture he gave at Case Western Reserve University (Cleveland, Ohio) (The story arc portion starts at 37:35 in this 54:56 lecture).[1] As Vonnegut said in this lecture, "... I have tried to bring scientific thinking to https://en.wikipedia.org/wiki/Literary_criticism, and there's been very little gratitude for this."

Vonnegut's story arc analysis requires considerable interpretation of story text, and it won't be done by computers until artificial intelligence is somewhat more developed. However, computer analysis of literature using words and an elementary knowledge of grammar has been with us for quite some time. It's done such things as determining that multiple authors had a hand in writing books of the Bible, and piecing together fragments of text into a readable version of the Dead Sea Scrolls.[2]

A literary analysis feature has been present on personal computer word processors since the earliest days of personal computing. I've written about these Flesch-Kincaid readability tests[3] in two earlier articles (Readability and Word Length, September 14, 2012, and Readability, February 12, 2014). I've written a short C language program (source code here) that calculates these scores using some simplified analysis.

The Flesch-Kincaid tests are based on the simple idea that longer words and sentences are more difficult than short words and short sentences. This simplification ignores the fact that there are many short archaic and technical words unknown to students. The research for these tests was funded by the US military, which wanted to ensure that their training materials and maintenance manuals were understood by its recruits.

The two measures of readability are the Flesch-Kincaid grade level and the Flesch reading ease index, calculated as follows:
Flesch-Kincaid Grade =
(0.39*(words/sentences))+(11.8*(syllables/words))-15.59

Flesch Reading Ease =
205.835 - (1.015*(words/sentences))-(84.6*(syllables/words))
Words, sentences and syllables in these formulas are the total counts of these objects in the manuscript. The scale for reading is aligned with age group; viz., 90->100 = 11 year old, 60->70 = 13-15 year old, and 0->30 = college graduates. The grade level is for school grade levels in the United States. The articles in this blog are at about a tenth grade level, and my three science fiction novels have a very accessible sixth grade rating.

Newspaper headline - 'Headless Body in Topless Bar'

"Headless Body in Topless Bar" was the actual headline on the front page of the April 15, 1983, edition of The New York Post.[4]

Some publishers of supermarket tabloids use readability scores to ensure that their content is at the reading level of purchasers at the checkout.

(Created using Inkscape.)


A 2012 study by scientists at the Kazan Federal University (Kazan, Russia) examined the Google Books corpus to determine trends in word usage.[5] They found that the average word length of American English declined from 4.8 to 4.3 characters in the period from 1975-2008, while that of British English increased from 4.9-5.1 over the same period.[5] They also found the interesting temporal trend in the use of personal pronouns from 1700 to 2008 (see graph).[5]

Personal pronoun usage from 1835-2008

Personal pronoun usage from 1835-2008. (Fig. 6 from ref. 5, via arXiv.)


A moment's reflection will reveal that just about every narrative, from trivial television dramas and romance novels to Oedipus Rex, has a progression from a beginning through a middle to an end. While any narrative has its specific characters and setting, it seems that all narratives have a structure that's independent of such details. Computer analysis has enabled a way to quantify this structure, and a tally of positive vs negative "emotion" words has given a means of sorting rags-to-riches stories from tragedies.

Portion of Part VI of Aristotle's Poetics

Literary analysis was done as early as 335 BC. This is a portion of Part VI of the c. 335 Poetics of Aristotle (384-322 BC). This translates as "The plot then is the first principle and as it were the soul of tragedy: character comes second." (Via the Tufts University Department of the Classics Perseus Digital Library


A simpler method of computer analysis has just been published by psychologists from Lancaster University (Lancaster, UK) and the University of Texas at Austin (Austin, Texas). Their study is published as an open access paper in Science Advances.[6-9] They did a computer analysis on about 40,000 narratives that included such things as novels, screenplays, 28,664 New York Times articles, 1,580 US Supreme Court opinions, and 2,226 TED talks.[6-7] The analysis revealed a common structure in most of three primary processes; namely, staging, plot progression, and cognitive tension, which was absent in factual texts.[6]

Since the narrative structure was independent of topic, the researchers reasoned that it should be apparent in content-free words, function words, short connector words such as pronouns (she, they), preposition, articles (a, the), conjunctions, and negations Such function words also include auxiliary verbs, and non referential adverbs, such as "so" and "really."[6-7] There is just a small number of common function words in English, fewer than 200, but they account for 50 to 60% of all words that are written, or said.[6] These small function words appear in a similar pattern across most narratives, independent of length or type.[7]

The common structure of the narratives followed these three stages:[7]
Staging, the start of the narrative in which many prepositions and articles were used, apparently as a means set the scene and convey basic information about concepts and characters in the story (e.g., "the laboratory,").

Plot progression in which more interactional language is used, including auxiliary verbs, adverbs and pronouns (e.g., "his laboratory.").

Cognitive tension using action words, such as "think" and "cause," that brings the narrative to a conclusion.
Says lead author of the study, Ryan Boyd, an assistant professor at Lancaster University,[7]
"If we want to connect with an audience, we have to appreciate what information they need, but don’t yet have... At the most fundamental level, humans need a flood of logic language at the beginning of a story to make sense of it, followed by a rising stream of action information to convey the actual plot of the story."

Use of articles and prepositions over the course of a narrative

Use of articles and prepositions over the course of a narrative.

TAT is a collection of 14,419 brief stories written in response to a standardized thematic apperception test by individuals who accessed a website maintained by an author of the study.

(Created using data from a YouTube Video.[8])


References:

  1. Kurt Vonnegut February 4, 2004, Lecture at Case Western Reserve University, YouTube Video, November 29, 2016. A shorter version with Spanish subtitles can be found here.
  2. Ron Grossman, "Computer Generates Bootleg Copy Of Dead Sea Scrolls." Chicago Tribune, September 4, 1991.
  3. J. Peter Kincaid, Richard Braby and John E. Mears, "Electronic authoring and delivery of technical information," Journal of Instructional Development, vol. 11, no. 2 (June, 1988), pp. 8-13.
  4. Steve Cuozzo, "The genius behind 'Headless Body in Topless Bar' headline dies at 74," New York Post, June 9, 2015.
  5. Vladimir V. Bochkarev, Anna V. Shevlyakova and Valery D. Solovyev, "Average word length dynamics as indicator of cultural changes in society," Social Evolution & History. vol. 14, no. 2 (September, 2015), pp. 153-175. Also at arXiv.
  6. Ryan L. Boyd, Kate G. Blackburn and James W. Pennebaker, "The narrative arc: Revealing core narrative structures through text analysis," Science Advances, vol. 6, no. 32 (August 7 2020), Article no. eaba2196, DOI: 10.1126/sciadv.aba2196. This is an open access article with a PDF file available here.
  7. Authors' 'Invisible' Words Reveal Blueprint for Storytelling, University of Texas Press Release, August 7, 2020.
  8. Science Advances: The Arc of Narrative, YouTube Video by Ryan Boyd, August 7, 2020.
  9. The Arc of Narrative website.
  10. Aristotle, "Poetics," S. H. Butcher, Trans., at the The Internet Classics Archive by Daniel C. Stevenson.