Archival Data Storage
April 13, 2020
Perhaps
motivated by a
fear that
Americans were generally
uncultured,
elementary school students of
my generation were subjected to a
plethora of
poetry. One of these poems was
Ozymandias, a
sonnet written by
Percy Bysshe Shelley (1792–1822) in 1818. As I've remarked in earlier articles,
education is often stifled by the need to use free,
public domain, materials, and an 1818 sonnet by a poet who died in 1822 definitely falls into that domain.
Ozymandias was the
Greek name for
Egyptian pharaoh,
Ramesses II, who ruled
Egypt from 1279-1213
BC. This sonnet tells the tale of a traveler who finds the
ruins of a
statue of which just the
legs remain.
Inscribed in the
pedestal are the
words, "My name is Ozymandias,
king of kings: Look on my works, ye Mighty, and despair!" The sonnet demonstrates the
futility of the pharaoh's
hubris, and it also shows that
humans can create a
message, the stone inscription, that lasts three
thousand years.
The Phaistos Disk is a fired clay disk about six inches in diameter that's about 3,500 years old. It's completely covered on both sides with stamped symbols arrayed in a spiral from center to edge, and it was found in the Minoan palace of Phaistos. No other examples of the Phaistos script have been found, so it's unlikely that a translation can be made.
(photo by Maksim)
While stone tablets have permanence over the course of
millennia, they can't contain much information. The
Rosetta Stone, created about 200 BC, was the key to
decipherment of ancient Egyptian scripts, since it contained a text in Greek translation. The Greek text of the Rosetta Stone has about 10,000
characters. Compare this to the contents of the
English version of Wikipedia at more than 10
gigabytes, which is about 10 million times greater.
Stone tablet of the computer age, the eighty column punch-card. It's likely that there are boxes of these that are still readable after more than half a century. Punch-cards were once so common that people would make Christmas wreaths from them. (Wikimedia Commons image by Arnold Reinhold.)
While the
Zip disks containing
decades of my
email messages are long gone and never missed, there are some things that people would like to keep for a very long time, such as
family photographs. I have photos of my
great grandparents that are a
century old and still about the same
quality as the time they were made.
Digital photographss made today, although of much higher initial quality and much easier to create, might not survive even a decade without proper care. Aside from the problem of finding the right
media player (remember
floppy disks?), there's the problem of
bit rot.
An eight-inch computer disk, essentially unreadable, today, for either lack of a proper disk drive, or decayed data. These usually held about a quarter megabyte of data.
I had many boxes of these in the 1980s, when I had an S-100 bus CP/M computer in my laboratory.
At that time, I did laboratory automation using Forth. I still have fond memories of Forth, although I haven't used it for decades.
(Wikimedia Commons photo by Hannes Grobe/AWI.)
For a time,
CDs and
DVDs were the preferred
archival storage media, but people now save everything on
USB flash drives,
SD cards, or on one or another "
cloud" service. Many people, myself included, reject the idea of cloud storage, the safety of your data is being relinquished to another party. There are numerous examples in which people have lost data through reliance on cloud storage.
The problem with cloud storage is only with the
longevity of the provider, not the longevity of the
data. Centralized
data centers store archives on
magnetic tape, generally on
tape cassettes. Such tapes have a very low
per bit cost, and they are readable after 15-30 years.[2-3] Their data are usually transcribed onto newer tapes every 5-10 years. Floppy disks and diskettes, which store data by the same
magnetic principle, have a shorter lifetime, since rubbing
erodes the media.
According to the
US National Institute of Standards and Technology (NIST), a DVD will retain your data less than fifteen years as a worst case, although
CD-R media have about double the life expectancy (see graph).[4-7]
Optical media lifetime, as determined by the US National Institute of Standards and Technology (NIST). Recordable CDs are more archival than recordable DVDs, principally because the areal data density of a CD is smaller. Aside from the initial quality of the manufactured CD and DVD, data longevity depends on exposure to heat, humidity, and light. Storage conditions and the handling of the media during use are important factors that affect longevity. (Graph rendered from data in ref. 5 using Gnumeric.[5]}
NIST obviously didn't have 45 year old CDs and DVDs for their study; so, how did they did they get their data? Longevity studies like this are done using
accelerated-aging experiments that rely on the
parameters of a
first-principles model of the
system. Any memory
material will have an
energy barrier between its two data states (
logical "0" and logical "1"), as shown in the figure, so an
Arrhenius law model can be used.
Material stability modeled as an energy barrier ΔE between two states, I and II.
(Illustration by by author using Inkscape.)
According to an Arrhenius law model, data is
corrupted by
thermal fluctuations that
randomly push a
bit from its intended state to its
complement. The
probability P for this to occur is an
exponential function of
temperature:
P = 1 - exp (-t/τ(T)
where
τ(T) = (1/f0) exp (ΔE/(kBT)
In these
equations,
τ is the
decay time,
kB is the
Boltzmann constant,
T is the
absolute temperature, and
f0 is the attempt
frequency. If just a single
atom can change the data state, the attempt frequency can be estimated as the
atomic vibration frequency, and this can be as large as 10
13 Hz.
Based on these equations, you have a million year memory with an energy barrier of 63 k
BT when
error correction codes are included.[8] If you want a billion year memory, you need to increase the barrier just a bit, to 70 k
BT, which is 1.8
eV at
room temperature).[8] In 2013, an international team of
scientists from
The Netherlands and
Germany proposed a billion year memory based on a
pattern of
tungsten embedded in
silicon nitride (
Si3N4). Such a memory could be read using by
imaging, or by
interference of an
electron beam (see figure).[8]
The tungsten-silicon nitride billion year memory.
There are two possible architectures; transparent (top), in which the substrate is removed, or using interference effects in electron or photon beams.
The creators of this memory caution that "black swan" events would reduce its billion year lifetime. These include "theft, meteor impact or the sun entering the red giant phase."[8]
(Figs. 3 and 4 of ref. 8, via arXiv.[8] Click for larger image.)
There's another method of storing vast quantities of data that's been demonstrated in the past decade; namely,
recording the data chemically in DNA. There are about three billion
base pairs contained in the 23
chromosome pairs of the
human genome, and in 2019
it was reported that the entire 16 GB of the
English language Wikipedia has been encoded in DNA. While this is impressive, the longevity of a temperature-sensitive
chemical in a
test tube is likely far less than the billion year
W-
Si3N4 memory.
However, DNA is
easy to copy, and recent research by a huge international team of scientists from
Harvard Medical School (Boston, Massachusetts), the
Massachusetts Institute of Technology (Cambridge, Massachusetts),
Brandeis University (Waltham, Massachusetts), the
Skolkovo Innovation Center (Moscow, Russia),
Utrecht University (Utrecht, the Netherlands), and the
Tata Institute of Fundamental Research (Bengaluru, India), have increased DNA data permanance by encoding it in
living cells of the
bacterium,
Halobacterium salinarum. Their research is posted in a recent
paper on
bioRxiv.[10-11]
Halobacterium salinarum, an extraordinarily hardy
organism, is a
halophile ("
salt loving")
extremophilic archaeon that's hard to kill.[11] This bacterium has, on average, 25 backup copies of each of its chromosomes.[11] It's resistant to thermal extremes, prolonged
vacuum, and
ionizing radiation, and it can withstand
desiccation while being trapped in
brine pockets in salt
crystals.[11] This bacterium has been revived from prolonged
stasis in hundred million year old
salt deposits.[10-11]
The proof of the pudding is in the Petri dish. This is Halobacterium salinarum in which some DNA sequences have been modified to contain data. (Fig. 7 of ref. 10, licensed under a Creative Commons license.[10])
The research team encoded the digital
specification for creation of
3-dimensional figures into the DNA of these bacteria, and embedded the bacteria into crystalline
mineral salts.[10] The
authors state that such
repositories of biological information can be expected to survive for much longer than humans. The average lifespan of
mammalian species is about a million years, and estimates of the longevity of
homo sapiens are between 600 to 7.8 million years.[10]
References:
- Ancient History Sourcebook: The Rosetta Stone: Translation of the Greek Section, Fordham University.
- John W. C. Van Bogart (National Media Lab), "Mag Tape Life Expectancy 10-30 years," Letter to the editor of the Scientific American, March 13, 1995.
- S. H. Charap, P. L. Lu, and Y. He, "Thermal stability of recorded information at high densities," IEEE Trans. Magn., vol. 33, no. 1 (January, 1997), pp.978-983.
- CD-R and DVD-R RW Longevity Research, US Library of Congress.
- Final Report: NIST/Library of Congress (LC) Optical Disc Longevity Study. The LIBRARY of CONGRESS NIST September 2007 (414 kB PDF file).
- How Long Can You Store CDs and DVDs and Use Them Again?, Council on Library and Information Resources.
- Optical media preservation, Wikipedia.
- Jeroen de Vries, Dimitri Schellenberg, Leon Abelmann, Andreas Manz and Miko Elwenspoek, "Towards Gigayear Storage Using a Silicon-Nitride/Tungsten Based Medium," arXiv, October 9, 2013.
- Sang Yup Lee, "DNA Data Storage Is Closer Than You Think," Scientific American, July 1, 2019.
- J. Davis, A. Bisson-Filho, D. Kadyrov, T. M. De Kort, M. T. Biamonte, M. Thattai, S. Thutupalli, and G. M. Church, "In vivo multi-dimensional information-keeping in Halobacterium salinarum," bioRxiv, February 15, 2020, doi: https://doi.org/10.1101/2020.02.14.949925 .
- Steve Nadis, "Hardy microbe's DNA could be a time capsule for the ages," Science, vol. 367, no. 6480 (February 18, 2020), p. 840, doi:10.1126/science.abb3588.