Multi-university collaboration will use data science to find the next El Nino

September 11, 2018 By Nolan Lendved
Photo: Satellite view of Hurricane Harvey

Hurricane Harvey, shown in 2017. A new data project hopes to sniff out weather patterns. Photo: NASA/NOAA

The El Nino and La Nina patterns in the Pacific Ocean are notorious for their long-distance effects on weather as far away as Africa and the Midwestern United States. But climate experts also know of several other such patterns, known as teleconnections, and believe that there are many more to be discovered.

The new TRIPODS+Climate project, a collaboration among the University of Wisconsin–Madison, the University of Chicago, and the University of California, Irvine, will develop novel data science tools to sniff out these hidden patterns, improving weather forecasts and scientific understanding of global climate.

The TRIPODS+Climate project will receive $300,000 over three years, part of $8.5 million in grants the National Science Foundation announced today, Sept. 11, to 19 interdisciplinary TRIPODS+X proposals at 23 institutions.

The collaboration is an expansion of the NSF’s TRIPODS program, which funded several research centers in 2017 to explore the fundamentals of data science — the modern intersection of mathematics, statistics, and computer science. Stephen Wright, Professor of Computer Sciences at UW–Madison and the Wisconsin Institute for Discovery, and Rebecca Willett, Professor of Statistics and Computer Science at the University of Chicago, lead one of the TRIPODS Institutes. With TRIPODS+Climate, they will work with a team of climate scientists to apply data science methods such as machine learning, network analysis, and predictive modeling to the growing flood of climate data.

“There are fundamental challenges pervasive in data science that are epitomized in the climate science setting, making this collaboration a nice opportunity for advances on a number of fronts,” Willett says. “The question really is, can we find some middle ground that’s going to allow us to harness climate data as fully as possible without ignoring existing physical models of climate?”

While El Nino is the best-known climate teleconnection, scientists have found many similar patterns in the Pacific and Atlantic Oceans. For example, TRIPODS+Climate co-investigators at UC, Irvine led by Efi Foufoula-Georgiou recently found that sea temperature changes near the coast of New Zealand strongly predict precipitation changes three months later and thousands of miles away in the southwestern United States.

But despite an unprecedented increase in the volume and resolution of climate observations, these phenomena are difficult to detect in the data. Researchers working with high-dimensional and noisy data must spot complex relationships across geography and time while ruling out spurious correlations and other false positives. Enter data science.

“Interrogating observations and climate model outputs to discover, characterize and understand climate modes of variability and change is fundamental for improving seasonal to sub-seasonal forecasts,” says Foufoula-Georgiou. “However, the large internal variability of the climate system, non-stationarities and space-time dependencies make it hard to discern causal predictive relationships,”

TRIPODS+Climate will create new methods in machine learning and network estimation that reveal the structure of the Earth’s climate system and its regional hydroclimatic impacts. Machine learning, where statistical algorithms use large datasets to detect patterns and make predictions, can be used to find teleconnections previously hidden from human observation. Network estimation methods can mathematically conceptualize global climate as an interconnected structure of nodes, so that scientists can better quantify and understand complex influences across geography and time.

“Data science techniques are especially useful for sifting through massive troves of data to discover unexpected relationships between events,” Wright says. “We have seen examples of this phenomenon in the relationships between genetics, environment, and disease. Climate science is an area in which very large collections of data are ready and waiting to be analyzed.”

These tools will then be used to build new computational climate models and create new platforms for climate diagnostics and prognostics, improving seasonal and sub-seasonal forecasts. More accurate predictions will help scientists and policymakers understand and prepare for climate change, extreme weather events, and water allocation under conditions of high or low precipitation.

Like the other TRIPODS+X programs announced today by the NSF, TRIPODS+Climate will also strengthen the broader data science community by training students and postdoctoral researchers at the interface of data and climate science.

“This project will help spread the influence of modern data science through the climate community, and put young data science researchers in touch with a critical area of research that is a rich source of data analysis problems,” Willett says.