Have you ever wondered about the software used in big science facilities? A professional science facility brings together incredibly sophisticated machinery with equally complex software, which is used to do things like drive motors, control robots, and position and run experimental detectors. We also use software to process and store the terrabytes of data created by daily science experiments.
In this article you’ll get an inside look at the software infrastructure used at Diamond Light Source (Diamond), which is the UK’s national synchrotron. I’ll take you through the process of setting up a new experiment, running it, and storing the data for analysis, then I’ll introduce each component of Diamond’s Java- and Python-based stack. I think you’ll find it interesting, and possibly learn about technologies that could be useful to your business. I am excited to show you how science labs are using familiar technologies in new ways.
Laboratory science isn’t what it used to be
It’s likely that you’ve heard of at least one large, famous science facility investigating cutting edge physics or nuclear fusion–facilities like Fermilab , CERN, NIF, or ITER. Unless you’re a real science geek, it’s less likely that you know about synchrotrons and neutron sources. These are facilities designed to discover thousands of smaller but extremely useful facts each year, like the structure of proteins used in medicine, or of fan blades used for a jet engine.
Neutron sources and synchrotrons are usually large facilities with a huge physical footprint. They cost many hundreds of millions of dollars to construct. Just a single detector might require more than a million dollars to purchase, install, and maintain, and there are typically scores of detectors in a major facility. If you live near a synchrotron, try to visit on an open day. You will understand the scale of what I’m describing when you see the electron gun, the storage ring, and possibly a detector or two.
Figure 1. Electron storage ring at Diamond Light Source
Big data in modern science
In the early days of x-ray and neutron research, chemical reactions captured the images that scientists used to understand their samples. It was common for experimenters to use photographic plates. One example is Rosalind Franklin‘s work; she was responsible for the iconic diffraction image Photo 51, taken from a sample of DNA in 1952. Now, experimentalists work on similar problems using electronic detectors that are capable of counting photons directly and storing data to disk.
In Franklin’s day, data was relatively minimal and stored physically. These days, experimental data is produced in terabytes and stored digitally. In a synchrotron, the process starts with the machines creating the high-energy light (or more properly electromagnetic radiation) required for experimentation. The latest generation of synchrotron facilities can produce a high flux of photons, which means more photons in each square millimeter of beam. As the machines evolve, they are capable of producing more and more data. The sheer volume of data enables a wider range of experiments and new experimental possibilities. That data also requires increasingly sophisticated software to process and evaluate.
Detectors are able to detect more, and faster, than they’ve ever done. At Diamond, we use Pilatus and Eiger detectors (first developed at the Swiss Light Source), which are able to make multi-megapixel images at a rate of hundreds per second. PERCIVAL is another type of detector that is being developed for use in science facilities.
During a run, an experimentation machine or system is on duty around the clock, every day but one, used for maintenance; at Diamond this is normally a Thursday. So we are talking about petascale data (250 bytes), which is a thousand times smaller than the famous exascale. Compared to some experimental physics, however, the data is rich in content. Modern science experiments usually require that all of the raw data produced is stored.
In summary, science data today is created at massively high volumes, and that is increasing. Storage requirements are also large, growing, and potentially long-term. A software stack for cutting-edge science must be able to process and store massive volumes of data at rapidly (almost exponentially) increasing scale.
Working with machines
At synchrotrons, we harness the power of electrons to produce super bright light (10 billion times higher flux than the sun) which is channeled into laboratories known as beamlines. Scientists use the different varieties of light in the beamlines to study anything from fossils, jet engines, and viruses to new vaccines.
The machine circumference is more than half a kilometer, so we have to move samples through the beam rather than trying to move the beam around samples. In addition, a researcher cannot stand in the experimental hutch and move the sample. To do so would be much less accurate and efficient than an automated system. More importantly, the light is from high-energy x-rays, which can be extremely hazardous to health.
We use motor-controlled stages and accurate rotating devices called goniometers to move samples. Robotic arms fetch the samples from storage devices called dewars and carousels and place them in the beam.
Figure 2. An example of beamline equipment
In my previous JavaWorld feature I talked about how Diamond’s software team migrated our legacy Java server to OSGi. I explained some of the technical challenges involved in the migration, and also how our team adapted to meet those challenges. While I discussed a few technologies in detail, I didn’t introduce our full software stack.
A massive-scale science facility depends on many coordinated components. Once a proposal is accepted, the science team submits samples using a web interface. During the experiment, we use software to run the detector and correctly expose it in coordination with robots and motors. When the experiment is complete, we write the data to disk. Finally, we run automatic analysis of the data on a computer cluster.
In the next sections I will introduce a full stack used for scientific experimentation. This one is specific to Diamond Light, but similar to how other facilities have solved the problem as well.
The web interface
We use a fairly conventional Java web server interface to schedule and setup experiments. The server is based on Tomcat with an Oracle database and Spring and Hibernate on the server side. We presently code and maintain the client using GWT (Google Web Toolkit); however, web front-ends seem to evolve rapidly, so that may change.
Scientists use our web interface to submit proposals to use the machine, provide experimental data, and arrange to send samples. Once this process is complete, it is open for the experiment to begin.
The laboratory environment
Users can come to the synchrotron or use it remotely. For some experiments or beamlines, an increasing amount of experimental time is remote. For many users, remote use has already become the normal way to use the beamline. Whether you’re using the beamline from a local control cabin or a remote desktop, you get the same environment to run your experiment.
At the time of this writing, the laboratory environment is built on top of RHEL6, with a thick client based on Eclipse RCP. (We use an e4-based platform.) Scientists use the front-end interface to move motors and view output. The interface is designed to speak the language of the experimenter, to allow them to define the experiment easily.
Working backward from the front-end, there is an acquisition server and middleware layer, and a hardware control layer.
Figure 3 is a diagram of the system, going from the hardware toward the front-end, which the user sees. I’ll introduce each layer separately.
Figure 3. Full stack software diagram