Tikalon Header

Scientist Programmers

October 27, 2010

Several years ago, one of my colleagues was involved in a scientific modeling exercise. He located the data he needed, but they were embedded in huge datafiles supplied by a government agency. These datafiles contained the desired data, and also 99% useless stuff - at least useless to the task at hand. He needed to filter the dataset to extract the 1% he needed. Unfortunately, the only programming skill he had was in writing Excel macros. As you can imagine, his filter ran a little slow. His solution was to rent a dozen PCs and install them in a laboratory so they could work on parallel pieces of the dataset. This would have been easier if cloud computing were available then, but he did accomplish his data filtering task using this brute force approach. Would his code be useful to anyone else? If you polled a bunch of programmers, they would laugh you out of the lunchroom. If you polled a group of scientists who worked with similar large datasets, some would say, "yes."

This is the point that Nick Barnes, a software engineer who works with scientists, makes in a current article in
Nature.[1] Barnes is not only a professional programmer, but he's also director of the Climate Code Foundation, a non-profit organization founded just this summer with several goals. One of these is to promote the public understanding of climate science, but another is to form a framework wherein climate modeling code can be shared. The primary sentiment of the article is best summarized by its subtitle:
"Freely provided working code - whatever its quality - improves programming and enables others to engage with your research..."

Most scientists write software, but they never think of publishing it. This is because it's usually written as a
hack, with very few comments, non-descriptive variable names, non-optimized code, and a primitive user interface with no error-trapping. One of my early programs would bomb if you included quotation marks in a text box. Sure, I could have fixed this, but I just remembered not to type quotation marks. There was also the automation program that didn't correctly transition through midnight. Yes, we're dedicated scientists, but we aren't going to be in the lab at midnight, so no problem there. I knew how to use these programs, but would it be fair to foist the code on others?

First, Barnes assures scientists that commercial software, written by professional programmers, is not as good as you might think. You need only a few
BSODs or Guru Meditations to remind you of this. He also points out that the code scientists write is good enough to do whatever task it was designed for. As Voltaire said, "The better is the enemy of the good."[2]

Windows 95-97 Blue Screen of Death

Windows nostalgia - A Blue Screen of Death.

One reason that Barnes started the Climate Code Foundation and wrote the Nature article was "Climategate," the public reaction to the publication of climate-related e-mail messages from the University of East Anglia's Climatic Research Unit (Norwich, UK). An official inquiry into supposed irregularities of scientific practice concluded that scientists had done nothing improper. However, it also concluded that the computer codes upon which scientific predictions of climate change were being made needed to be published. Barnes' interest in this issue predated Climategate and involved his work with global temperature code that NASA had released to the Internet in 2007. This code was messy, and it stood in the way of the scientific gold standard; namely, reproducibility. Barnes rewrote the software and made it accessible to a larger number of people.

If NASA hadn't released its code, this happy outcome would not have been possible. Barnes suggests that the same can happen for all sorts of scientific code. As
Eric Raymond wrote, "Given enough eyeballs, all bugs are shallow." Of course, it would help if scientists learned a few programming basics. Barnes lists the following as his top five. I somehow learned to use all these when I program, and they really help.
Source code management
Defect tracking
Literate programming
Unit testing
Evolutionary development.

Now for "Confessions of a Scientist Programmer." I actually published some code in 1980 to describe the stability of
magnetic bubbles as a function of material properties.[3] The definition of code publication has changed over the years. My code was published as a microfilm of the source code. You can still read the description of the program in the journal, but it might be a chore trying to get a copy of the microfilm. I don't think I still have a copy myself.

References:

  1. Nick Barnes, "Publish your computer code: it is good enough," Nature, vol. 467, October 13, 2010, p. 753.
  2. "Le mieux est l'ennemi du bien," from Voltaire's La Bégueule.
  3. D.M. Gualtieri, "Computer Program Description: SPECS," I.E.E.E. Trans. Magnetics vol. 16, no. 6 (November, 1980), pp. 1440-1441.