Tikalon Header

Text Topography

May 16, 2022

Most of you likely read the title of this article as "Text Typography." Typography is choosing fonts and arranging its displayed characters to make writing legible, readable and aesthetic. After choosing Comic Sans as your font, you strive to make your PowerPoint slides into works of art by adjusting word placement, line lengths, indentation, and line-spacing. However, the topic of this article is topography, the study of surface features usually applied to land surfaces; i.e., the lay of the land.

Comic Sans, This is not a pipe

This is not a pipe, but the font is Comic Sans. Comic Sans is a typeface released by Microsoft in 1994 with Windows 95. It was inspired by comic book lettering, but its widespread use for formal communication has been widely ridiculed. When the Higgs boson was discovered at CERN in July 2012, one of the presentations of Higgs data used Comic Sans.[1] CERN must have liked the publicity, since it announced on April 1, 2014 (April Fools' Day), that it would use the Comic Sans typeface in all its publications.[2] (Modified Wikimedia Commons image by Torsten Bätge.)


In the dark days BC (Before Computers) I would enjoy exploring the older archives at our public library. Many of the books I would sample were large format tomes with ornate leather binding that screamed, "This stuff is important, since we spared no expense in its presentation." It was there that I discovered that viewing book pages obliquely revealed rivers of whitespace, the blank area between words, that flowed down the page. This effect was likely accentuated by somewhat larger spacing between words from the demands of justification in stretching lines to fill the horizontal of a page.

I've carried this idea with me through five decades, a period in which computers and computer graphics have advanced considerably. It's come to the point at which generating aesthetic images based on this idea of text topology has become very easy. The term, computer art, covers a wide range of topics, so a better term for text topology art might be low-complexity art. Naming these abstract art images is easy, since they're based on particular texts, and the title or theme of the texts supplies the name.

Generating such art is aided by the fact that files of public domain classics exist on the Internet in places such as the Internet_Archive and Project Gutenberg. Such free and open resources should be encouraged in an Internet age in which everyone seeks to perpetually monetize their intellectual property, and I've donated to both of these organizations.

Johannes Gutenberg and Tim Berners-Lee

Johannes Gutenberg (c.1400-1468) and Tim Berners-Lee (b. 1955). Although not the inventor of movable type, Gutenberg was the first European to use it, thereby creating a way to make books less expensive and available to more people. Berners-Lee, a computer scientist at CERN, was the originator of the Internet's World Wide Web, thereby making vast quantities of information instantly available. (Left, a Wikimedia Commons image of Johannes Gutenberg. Right, portion of a Wikimedia Commons image by Paul Clarke of Tim Berners-Lee at the November, 2014, Open Data Institute Summit. Click for larger image.)


I created a PHP program to transform text files into images, and these images can be subsequently modified by an image manipulation program, such as the GNU Image Manipulation Program, to add color and other effects. A zip file containing the PHP source code and some example text files can be found here. The input text file should be ASCII text, but a Linux utility can convert the common UTF-8 files to ASCII, as follows:

uni2ascii -e input.txt >output.txt

The intended text file is read as a long string, and regular expressions are used to replace whitespace and letter characters with ones and zeros to create an image string. There are checks along the way to ensure that the regular expression patterns hadn't missed anything. An output image file is opened, and a header is written for a portable bitmap format (*.pbm) file, a type of file that's just black and white pixels represented by ones and zeros. The vertical axis is stretched by a factor of two to give a more aesthetic image, and the *.pbm file is written. The file is crude at this point, as the following image illustrates.

Text topography *.pbm image

Portion of a *.pbm image created by the text topography program. This basic image is subsequently processed to add color and other image effects to produce a finished artwork. One cause of an invalid *.pbm file is writing characters other than zeros and ones. My text topography program tries to prevent such errors, but some texts (Moby Dick) needed manual editing to ensure ASCII encoding before they would work. The program flags most errors.


Figure caption

Left, image processed output for Pliny's Natural History, book XXXVII, chapters 1-6, in English translation.[3] Right, image processed and rotated output for Homer's Iliad, book I, in English translation.[4] Edge detection was used to create this effect. (Click for larger image.)


References:

  1. Patrick Kingsley, "Higgs boson and Comic Sans: the perfect fusion," The Guardian. July 4, 2012.
  2. Cian O'Luanaigh, "CERN to switch to Comic Sans," CERN Website, April 1, 2014.
  3. Pliny the Elder, "The Natural History," John Bostock, Trans., Taylor and Francis (London: 1855), via the Tufts University Perseus Digital Library Project.
  4. Homer, "Iliad," Samuel Butler, Trans., via The Internet Classics Archive .