Batch and Grid Services at CERN
The CERN Batch Service: its role to fulfil physicists' needs, its workings inside the worldwide computing grid, its challenges to process the massive LHC data.
1 The CERN Batch System
Physicists need to analyse the data coming from the LHC experiments – or from detectors dedicated to other scientific projects. They do so with computers, but not just the ones sitting on their desks. Not normally, anyway.
What they usually do is to use a batch system, a centrally-managed cluster of CPU servers designed to run computing jobs without their intervention. What they do is to write their analysis programme and submit it to the batch system which will send them back results when ready. Such jobs typically involve some intensive computation which can sometimes go on for weeks.
The advantage of using such common computing resources is that many jobs can run all at the same time, accessing experiment data from common storage systems. At the time of writing, the CERN Batch System offers 4000 CPU servers, many of which boasting several CPU cores, large amounts of memory and fast disks. Together, they run a whopping 400 000 jobs every day.
The way it works, a batch system is typically managed by a master server which deals out jobs to suitable CPU servers according to users' priorities and needs. This master server also provides users with a communication point to access data about their jobs, so they don't have to know and directly talk to the CPU server running them.
The batch system solution we use at CERN is an IBM product called Platform LSF. But in all fairness, we've started looking at other options to address some of its shortcomings. Looking at the demanding requirements experiments will have in the future, we found that we'll need to scale from 4000 CPU servers to several 10 000s. We'll also need to display more flexibility in supplying various computing resource types in a snappy and resilient cluster. These are all requirements where we're afraid that LSF may not quite cut the mustard.
So we started looking at alternatives and, having heard many good things about HTCondor, we ended up concentrating on this solution. A distributed batch system, HTCondor looks promising in terms of scalability, resilience and flexibility. Having been actively developed since 1988 is also a comforting thought as for its maturity. And it's free. What we didn't quite expect until we started with it, though, is how unbelievably enthusiastic the developer community is. That alone is a good reason for switching over. We're currently in the process of inviting early-adopters in LHC experiments to help us put a new batch service together.
2 The CERN Batch System as Part of the Grid
So as to be able to keep up with the insane amount of data collected from LHC experiments, at least 100 000 analysis jobs have to be running at all time, and we can't quite achieve that at CERN. So how about using computing capacity from other institutes interested in such physics?
That's the idea behind the Worldwide LHC Computing Grid – WLCG, for short. Unlike the World Wide Web, which is that unique network of pages everyone knows – also invented at CERN, by the way – there is no such thing as The Grid. There are many grids, for many purposes, and the one I'm on about here has been set up to process data from LHC or other related scientific projects.
The WLCG is a collaboration of about 150 institutes worldwide who decided to share their resources to foster scientific research in projects for which they've been funded, the LHC experiments being some of these. These institutes have been organised in layers called tiers: Tier-0 is CERN, there are 13 Tier-1s who have to store a complete LHC data set, and the rest are Tier-2s and Tier-3s. Together, they provide in practice enough capacity to run 200 000 to 300 000 LHC analysis jobs.
These institutes choose to provide resources they have. Not only computing power in terms of CPUs, but also storage and services. One important idea behind a grid is to make these resources available in a transparent way. There's the analogy of the power grid, where getting electricity should be as simple as sticking a plug into a socket. With a computing grid, you want to be able to send your data up the network and have it processed/stored for you somewhere.
This requires that institutes use a homogeneous interface, a software layer called middleware. This software manages authentication/authorisation, information systems, data access, job management, monitoring and accounting. It provides the services that each institute needs to run to be able to act as WLCG partner.
3 Monitoring and Accounting
Monitoring is essential when having to offer a reliable service to a user community, even more so when the service provides thousands of servers' worth of computing capacity to thousands of users worldwide. We've tried several monitoring solutions over the years and we're currently settling on the Elasticsearch/Kibana duet. Before we had this cutting-edge technology, we used to visualise data with e.g. the no less superb matplotlib package, with which I drew a number of old batch plots.
Not completely unrelated to monitoring with the nature of its data, accounting is the art of collecting and reporting capacity and usage information. For each job run, we record and publishing usage information summaries to help experiments understand whether they could use computing capacity according to plan. The CERN Batch Accounting system underwent a major overhaul a few years back to meet new reporting standards.