Massive Data Set Analysis for NASA's Atmospheric Infrared Sounder

Event Date: 

Wednesday, February 27, 2008 - 3:15pm

Event Date Details: 

Refreshments served at 3:00 PM

Event Location: 

  • South Hall 5607F
Amy Braverman(Jet Propulsion Laboratory, California Institute of Technology)

Massive Data Set Analysis for NASA's Atmospheric Infrared Sounder

NASA's Atmospheric Infrared Sounder (AIRS) has been collecting large quantities of remote sensing data about the vertical structure of Earth's atmosphere since AIRS was launched aboard the Aqua spacecraft in mid-2002. These data pose a classic problem in the analysis of massive data sets: how do we understand the relationships among fine-scale phenomena within their global context? We answer that question here by partitioning the data on a coarse spatio-temporal grid, and estimating the multivariate distribution of the data within each grid cell. Then, we look for patterns in the evolution of those distributions as functions of space and time, and ultimately tie them back to physical phenomena generating the data sets. Quantifying this evolution is challenging because the data are high dimensional, and the distributions are complex. We attack the problem using the Wasserstein distance between distributions as a measure of similarity among grid cells' data, and therefore as a measure of similarity between the underlying physical processes. We close with some thoughts on how this strategy might be applied in other problems where massive data sets arise.