As global, broad-based climate change projections have become more useful, effectively managing the vast accompanying volumes of data represents a major challenge for the computational scientists who support the projections. In the area of understanding and predicting climate change and extreme weather events, advanced tools are required to securely store, manage, access, analyze, visualize, and process enormous and distributed data sets.
This “big data” challenge is being met with the Earth System Grid Federation (ESGF), an international collaboration formerly led by LLNL with a primary goal of facilitating advancements in Earth system science. Designed and maintained by dozens of American, European, Asian, and Australian research institutions, ESGF now powers most global climate change research, notably including assessments by the United Nations’ Intergovernmental Panel on Climate Change.
ESGF offers an immense, computerized climate database that standardizes and organizes observational and simulation data from 23 countries, allowing scientists to compare models against actual observations and reanalysis. It comprises the largest collaborative data platform on Earth system science. Key capabilities include:
- National and international network infrastructure integrating the world’s climate model and measurement archives;
- Shared resources across multiple centers for high performance computing and storage of tens of petabytes of transportable data;
- Easy-to-use and secure, federated web-based application programming interface and data infrastructure;
- Flexible infrastructure allowing participants to customize parameters;
- High performance search, analysis, and visualization tools;
- Access to a broad set of data and tools for comparative and exploratory analysis; and
- Virtual collaborative environment for analysis tasks demanding large, varied datasets.
Virtually all climate science researchers worldwide use ESGF to discover, access, and compute data. In fact, many of today’s most recognized climate projects employ the valuable software and services developed by the ESGF team and its community. ESGF is designed to remain robust even as data volumes continue to grow exponentially.
Currently, 40,000 users—including scientists and policymakers—from 2,700 sites on six continents are sharing data through ESGF. More than 5 petabytes of data have been downloaded to the climate community through ESGF, making it one of the most complex, successful big data systems in existence. The federation will continue to expand access to relevant data integrated with tools for analysis and visualization that are supported by the necessary hardware and network capabilities to interpret peta- and exascale scientific data.
While ESGF is a multi-national effort, the Department of Energy's contribution to ESGF on behalf of the United States—known as ESGF2—is a collaboration among Oak Ridge (ORNL), Lawrence Livermore, and Argonne (ANL) national laboratories. ESGF2 provides storage and computational services to the ESGF user community, and ESGF2's leadership, currently ORNL, co-chairs the international ESGF collaboration.
Node-Based Architecture
ESGF combines grid-based computing with a distributed architecture, keeping participating members sovereign while simultaneously linking them together. To achieve this, ESGF developers created a unique system of nodes that requires very little explicit coordination while still providing a robust “data space” for storage and computation. Teams work in highly distributed research environments, using unique scientific instruments, exascale-class computers, and extreme amounts of data.
Users can access ESGF data using Web browsers, scripts, and client applications. A key to ESGF’s success is its ability to effectively produce, validate, and analyze research results collaboratively, so that, for example, new results generated by one team member are immediately accessible to the rest of the team, who can annotate, comment on, and otherwise interact with those results.
The ESGF architecture is based on a dynamic system of nodes—independently administered yet united by common protocols and interfaces—that interact on an equal basis and offer a broad range of user and data services, depending on how each is set up. Data are published, stored, and served from dozens of nodes around the globe, yet they are searchable and accessible as if they were stored in a single global archive. Metadata shared among projects help fully integrate the repository of data and components for usability and interoperability. ESGF also promotes standard conventions for data transformation, quality control, and data validation across processes and projects.
Related Projects
The ESGF website contains extensive documentation, developer tools, implementation wikis, user tutorials, the most recent and past ESGF Annual Face-to-Face Conference Reports, and the latest news.
LLNL’s Analytics and Informatics Management Systems group manages ESGF software jointly with collaborators at ORNL and the Centre for Environmental Data Analysis, in the United Kingdom. Other climate projects in the Laboratory’s portfolio include: