Back to Browse

Accelerating Science Using Virtualized Data at PO.DAAC

279 views
Aug 28, 2025
1:06:08

As Earth science data archives and data density continue to increase, traditional science workflows of data download, conditioning, and analysis become more and more unwieldy. Network bandwidth, local storage, and computer performance all place cost and time constraints that an investigator must account for before science and hypothesis testing can begin. Virtualized datasets offer a pathway to navigate around these issues; these lightweight reference files can be used to access an entire data record using Python packages like Xarray. From there, users can quickly subset to their region and timespan of interest, eliminating the need to download and subset thousands of files and terabytes of data. This presents a new pathway for both streamlined data access and improved science workflows where a user can easily iterate over datasets, change space and time bounds, and quickly compare complementary datasets. NASA’s Physical Oceanography Distributed Active Archive Center (PO.DAAC) has created 10 virtualized datasets covering ocean currents, winds, bottom pressure, sea surface height, salinity, and temperature from satellite observations and ocean models. In this webinar, we briefly describe the fundamentals of the technology and demonstrate how to use it in Python scripts and notebooks. We also present performance metrics from computing a regional mean time series of satellite records 25-40 years in length, showing a full order of magnitude improvement in compute time compared to traditional access and methods. Lastly, several virtualized data use cases are presented that illustrate the interdisciplinary relationships between wind and ocean response during upwelling events, Indian Ocean Dipole surface characteristics, and the ocean response to El Niños. Presentation slides: https://go.nasa.gov/4mru8Cv ---------------- Chapters 0:00 Webinar Introduction/Logistics 4:15 Introduction to NASA’s Physical Oceanography DAAC 4:45 Overview and agenda 5:22 Quick Look at Python Packages for Virtual Datasets 8:48 Utility of Virtual Datasets 10:43 Basic Usage, Cookbook Resources, Benchmarking 11:45 Demo: PO.DAAC Cookbook- Using Virtual Datasets 12:41 Virtual Dataset Starter Notebook 13:00 Environment Set-Up 14:14 Available Virtual Dataset Products 16:11 Explore the Data 18:28 Alternate Workflows other than Xarray Built-In Functions 21:05 Benchmarking Results 23:33 Science Use Case Example: Gulf of Tehuantepec Upwelling 28:46 Jupyter Notebook: Gulf of Tehuantepec Upwelling 42:27 Scope of Usability, Resources, and Future Objectives 53:46 Question-and-Answer Period

Download

0 formats

No download links available.

Accelerating Science Using Virtualized Data at PO.DAAC | NatokHD