Social media for social good: Researchers estimate air pollution from online posts

November 17, 2014

Photo: Woman in China wearing breathing mask

For 30 days, the UW–Madison team monitored Sina Weibo posts from 108 Chinese cities to see how often people complained about the air.

Photo: Nicolò Lazzati/Global Panorama

University of Wisconsin–Madison computer science researchers have developed a method for using social media posts to estimate air pollution levels with significant accuracy.

Graduate students Shike Mei, Han Li and Jing Fan analyzed Sina Weibo — a Twitter-like site that is China’s most popular social media outlet — to uncover real-time information about air pollution levels in Chinese cities. Though the approach cannot forecast future air quality, it can provide accurate, real-time information on the Air Quality Index (AQI).

For 30 days, the team monitored Weibo posts from 108 cities to see how often people complained about the air. The group analyzed the text of the posts, as well as a time-and-space correlation among cities and days, since pollution flare-ups typically cover large amounts of territory and can last for days.

Between 350,000 and 500,000 Chinese citizens die prematurely each year because of air pollution, according to the medical journal The Lancet. Even as smoking rates decrease, lung cancer is on the rise. Yet, while large Chinese cities have physical monitoring stations to gauge air pollution levels, smaller cities generally do not due to the expense of establishing and maintaining them.

Between 350,000 and 500,000 Chinese citizens die prematurely each year because of air pollution. Even as smoking rates decrease, lung cancer is on the rise. 

For Mei, the project is more than just an intellectual exercise. In the area of central China where he grew up, there is just one air quality monitoring station for an area where 60 million people live, he says.

“Anhui province, where I was born, is not very wealthy,” Mei says. “There’s not enough information about pollution, and sometimes people suffer from heavier air pollution. We wondered, ‘How can we use a new information source to help people understand [the severity of] the pollution around?'”

The group’s mathematical models did not use preselected keywords to analyze the text of Weibo posts. Rather, they developed a machine learning model to assign different weights to different words used in the posts. The team’s approach to using publicly available data could be applied to a broad range of issues, says computer sciences professor Jerry Zhu, who is working closely with the students, along with computer sciences professor Chuck Dyer.

The research is supported in part by the National Science Foundation’s Early Concept Grants for Exploratory Research (EAGER): Discovering Spontaneous Social Events initiative. The team’s next step will be to expand the model to include photos along with text posts.

—Jennifer Smith