Data Compression
June 21, 2021
I took a
course on
rhetoric as a
college freshman. It was taught by a
rotund humanities professor who had the unenviable task of teaching this subject to a group of
scientists and
engineers whose only goal was to complete a humanities requirement. As you can imagine, not many
textbooks were available for such a niche topic, and the thin volume we had was quite uninspiring. I did, however, glean a few important principles from this course, one of which was the need to express an
idea clearly, but in as few
words as possible.
Rhetoric, the art of persuasion, is most commonly associated with the Greek orator, Demosthenes (Δημοσθένης, 384-322 BC).
Plutarch wrote that Demosthenes had a speech impediment as a boy, and it's thought that it was one in which he mispronounced r as an l. This is a condition known as de-rhotacism, named after the Greek letter for r, ρ (rho). This doesn't seem like that much of a problem, and I'm reminded of the way that news anchor, Tom Brokaw,
pronounces his l.
It's written that Demosthenes cured himself of this speech impediment by shouting above the noise of ocean waves with pebbles in his mouth.
(Portion of Demosthenes on the Seashore, an 1859 painting by Eugène Delacroix (1798–1863), presently at the National Gallery of Ireland, via Wikimedia Commons)
Such brevity in
speech is desirable, and this idea is applied
technically in the various ways that
language is
coded into more
compact forms. Prior to the development of rapid and accurate
speech-to-text software,
symbolic writing methods known as
shorthand were used to allow people to write as rapidly as others would speak. One of my
aunts learned shorthand in
secretarial school. A
mechanical means of recording shorthand known as
stenotyping can be seen in
courtroom scenes in older
movies and
television shows.
It's all Greek to me.
An example of shorthand from page 153 the 1916 book, "Gregg Shorthand. A Light-Line Phonography for the Million", by John Robert Gregg.
The idiom, "It's all Greek to me," can be found in Shakespeare's, The Tragedy of Julius Caesar (1599), in which one of the characters, Casca, describes a speech made in Greek by Cicero, a Roman scholar of Greek literature. Casca, who doesn't understand Greek, says that "... those that understood him smiled at one another and shook their heads; but, for mine own part, it was Greek to me."
(Portion of a Wikimedia Commons image.)
The first principal method of
electrical signal compression was the
telegraph Morse code,
invented by
portrait painter,
Samuel F. B. Morse.[1-2] Morse was
inspired to invent the telegraph after the
death of his
wife while he was absent from his
New Haven, Connecticut,
home while
painting a
portrait of the
Marquis de Lafayette in
Washington, D.C.. Morse, who had received news via a
horseback messenger that his wife was
ill, immediately returned home, but his wife died before he arrived. He reasoned that if he had received the news earlier, he would have been at her
bedside, and at that point he devoted himself to the problem of
telecommunication.
Morse was assisted in the invention of the telegraph by
Alfred Vail,who also assisted in development of the code. Since
certain letters are used more often than others, Morse created his code to make the more common letters easier to transmit. Thus "e" (
frequency of occurrence 12.7%) is a single dot, a "t" (9.1%) is a single dash, while the less common "z" (0.074%) is dash-dash-dot-dot. The
Federal Communications Commission abandoned its Morse code requirement for
amateur radio operator licensing in 2005, and the
United States Coast Guard stopped monitoring Morse Code
distress calls in 1993. An interesting Morse code generating program was created by Frans van Dorsselaer in 1998, and I've included his
C programming language source code here.
Binary tree of Morse code letters and numbers. Dot-heavy letters are to the left, and dash-heavy letters are to the right. In 1854, The US Supreme Court ruled (O'Reilly v. Morse) against applicability of the Morse Code apart from the telegraph implementation. This ruling confirmed the idea that an abstract idea is not patentable, only its implementation. (Created by the author using Inkscape, and also found at Wikimedia Commons. Click for larger image.)
Another example of useful data compression is the
barcode, which is an
automated method for retrieving
data from a
lookup table. The barcode was
invented by
Norman J. Woodland and
Bernard Silver and
patented in 1952.[3] The available
electronics in 1952 were primitive, and this invention was demonstrated using
vacuum tubes and a non-
laser light source. The patent was eventually sold to
Philco for just $15,000. Barcodes became useful only after the availability of inexpensive
computers and laser light sources. Advances in
machine vision have made the similar
QR code possible.
Early
mainframe computers had limited
data storage, so data compression became an important topic. An early method of
lossless data compression was
Huffman coding, published in 1952 by
computer scientist,
David A. Huffman (1925-1999), when he was a
doctoral student at
MIT.[4] It works much like Morse code by creating a table of symbols for data items that's generated based on their frequency of occurrence. Instead of the dots and dashes of Morse code,
binary ones and zeros are used.
At the advent of
ubiquitous computing and the
Internet, text was easy to store and send, but
images were a problem. The saying that
a picture is worth a thousand words was replaced by the need to represent even simple images by tens of thousands of
word
words of computer data. The solution was data compression using the
Graphics Interchange Format (GIF). GIF was based on
Lempel–Ziv–Welch (LZW) data compression, pioneered by
Jacob Ziv (b. 1931) and
Abraham Lempel (b. 1936) in 1977-78, and improved by
Terry Welch (1939-1988) in 1983.[5-6] This
algorithm was deemed significant enough to achieve
IEEE Milestone status in 2004, an achievement reserved for such important things as the
invention of the transistor.
The difference between Huffman coding and LZW is that LZW looks for the frequency of occurrence of repetitive
sequences of data bytes, rather than individual data bytes, and uses these as table entries. For this reason it is especially effective in image encoding in which such repetition is common, such as for a uniform white background. The LZW patent (US Patent no. 4,558,302) was filed on June 20, 1983, and it was issued on December 10, 1965.[6] The importance of this algorithm was such that the patent was
licensed to more than a hundred companies, the most important of which was the license to
CompuServe in connection with
GIF image encoding in 1993.
Unisys, the patent owner, did not require
licensing, or payment of fees, for non-commercial GIF usage, including GIFs on the Internet.
Some computer nostalgia. A snippet of FORTRAN source code, fig. 6 of US Patent No. 4,558,302, "High speed data compression and decompression apparatus and method," by Terry A. Welch, December 10, 1985.[6] (Via Google Patents.)
The
Institute of Electrical and Electronics Engineers (IEEE) recently announced that Jacob Ziv, the sole member of the LZW team who was trained as an
electrical engineer, has been awarded the
2021 IEEE Medal of Honor.[7] This award, first given in 1917, has been given to such luminaries as
Edwin Armstrong (1890-1954), the inventor of
Frequency modulation (1917), and
Claude Elwood Shannon (1916-2001), the father of
information theory," (1966).
Jacob Ziv in 2015.
Ziv experimented with electronic circuits as a boy, and his education was in electrical engineering. He was awarded the BSc (1954), Dip-Eng (1955), and MSc (1957), each in electrical engineering from Technion (Israel Institute of Technolog). His 1962 Sc.D from MIT was for research in message transmission in a noisy channel.[7]
Ziv is presently Technion Distinguished Professor Emeritus in its Faculty of Electrical Engineering,[7]
(Portion of a Wikimedia Commons photograph by Mark Neyman, image courtesy of the Spokesperson unit of the President of Israel.)
As with other awards, the IEEE Medal of Honor isn't given for just a single achievement, but for a life's work in a particular field. Beyond LZW, Ziv made other compression innovations. The
citation for the IEEE medal is for his "... fundamental contributions to information theory and data compression technology, and for distinguished research leadership."[7] He is also the recipient of another IEEE awards, the
Claude E. Shannon Award of the
IEEE Information Theory Society.[7] Ziv, who is an IEEE
fellow, has
published about 100
peer-reviewed papers, and he is a member of the
U.S. National Academy of Engineering, and the
U.S. National Academy of Sciences.[7]
One complaint that Ziv made of his electrical engineering education was its primary focus on
power systems and not electronics. As a consequence, he and his
peers who were working to convert
telemetry systems from
vacuum tubes to
transistors needed to teach themselves electronics.[7] He also had a complaint about computing early in his career. He says that he hated these punch card systems, and that's why he didn’t go into
computer science.[7] That's also the reason why I never took a computer programming course during my own college days. I can still envision all those poor students
trudging around
campus with huge boxes of punch cards under their arms.
Ziv met Abraham Lempel as a fellow faculty member at Technion. Ziv says he and Lempel were the
perfect match to tackle lossless data compression because of their complementary knowledge.[7] Ziv and Lempel were on simultaneous
sabbaticals in the US, Ziv at
Bell Labs, and Lempel at
Sperry Rand, when their compression work was completed[7] Bell decided not to patent their algorithm, since
software patents were not possible at the time, but Sperry decided to patent a
hardware implementation.[7] Sperry subsequently patented Terry Welch's LZW algorithm.[6]
References:
- Samuel F. B. Morse, "Improvement in the Mode Of Communicating Information by Signals by by Application of Electromagnetism," US Patent No. 1,647, June 20, 1840.
- Samuel F. B. Morse, "Improvement in Electro-Magnetic Telegraphs," US Reissue Patent No. RE117, June 13, 1848.
- Norman J. Woodland and Bernard Silver, "Classifying Apparatus And Method," US Patent No. 2,612,994, October 7, 1952.
- D. Huffman, "A Method for the Construction of Minimum-Redundancy Codes," Proceedings of the IRE, vol. 40, no. 9 (September 9, 1952), pp. 1098-1101, doi:10.1109/JRPROC.1952.273898.
- J. Ziv and A. Lempel, "A universal algorithm for sequential data compression," IEEE Transactions on Information Theory, vol. 23, no. 3 (May 1977), pp. 337-343, doi: 10.1109/TIT.1977.1055714.
- Terry A. Welch, "High speed data compression and decompression apparatus and method," US Patent No. 4,558,302, December 10, 1985.
- Tekla S. Perry, "From WinZips to Cat GIFs, Jacob Ziv's Algorithms Have Powered Decades of Compression," IEEE Spectrum, April 21, 2021.