Description

The growing urgency in dealing with the 21st century’s grand challenges associated with increasing population, food and water security, frequently occurring natural disasters, and changing climate demands innovative, collaborative, and multidisciplinary solutions for sustainability and resilience. However, scientific data, especially geospatial data, presents significant barriers to the effective access, use and sharing of data as they come in large volumes, from different sources, and with widely varying formats, resolutions, or annotation schemas that can differ among disciplines or even research groups. This presentation describes a recently funded NSF CSSI project to develop an open source, extensible geospatial data framework (GeoEDF), aimed at overcoming these barriers by creating seamless connections among platforms, data and tools, making large scientific and social geospatial datasets directly usable in computational tools. By design, GeoEDF abstracts away the complexity of acquiring and utilizing data from various sources. A set of extensible data connectors implements common data query and access protocols such as HTTP, OPeNDAP, FTP, Globus, and REST API, supporting both static and streaming data. Data sources can then be configured by simply specifying the data location, authentication, access protocol, etc. Connectors are also parameterizable, allowing reuse for subdataset, time range, and, geospatial region choices from a data source. Similarly, extensible data processors implement common and domain-specific geospatial data processing such as resampling, reprojection, or a specific simulation model. A plug-and-play workflow composer will allow users to string together connectors and processors into a pipeline that can be executed in various environments including HUBzero tools, HPC resources, or Jupyter Notebooks. Automated metadata extraction and annotation are integrated into such workflows, supporting FAIR science practices through ease of subsequent data discovery and reuse. GeoEDF will enhance interoperability, leveraging data connectors for seamless data transfer and tool invocations with other science gateways. By bringing data to the science, GeoEDF will accelerate data-driven discovery, while ensuring that data is not siloed.

Start Date

11-2019

Document Type

Presentation

Keywords

data framework, geospatial data, science gateway, high-performance computing

Session List

Presentation

Share

COinS
 
Nov 1st, 12:00 AM

An Extensible Geospatial Data Framework (GeoEDF) for FAIR Science

The growing urgency in dealing with the 21st century’s grand challenges associated with increasing population, food and water security, frequently occurring natural disasters, and changing climate demands innovative, collaborative, and multidisciplinary solutions for sustainability and resilience. However, scientific data, especially geospatial data, presents significant barriers to the effective access, use and sharing of data as they come in large volumes, from different sources, and with widely varying formats, resolutions, or annotation schemas that can differ among disciplines or even research groups. This presentation describes a recently funded NSF CSSI project to develop an open source, extensible geospatial data framework (GeoEDF), aimed at overcoming these barriers by creating seamless connections among platforms, data and tools, making large scientific and social geospatial datasets directly usable in computational tools. By design, GeoEDF abstracts away the complexity of acquiring and utilizing data from various sources. A set of extensible data connectors implements common data query and access protocols such as HTTP, OPeNDAP, FTP, Globus, and REST API, supporting both static and streaming data. Data sources can then be configured by simply specifying the data location, authentication, access protocol, etc. Connectors are also parameterizable, allowing reuse for subdataset, time range, and, geospatial region choices from a data source. Similarly, extensible data processors implement common and domain-specific geospatial data processing such as resampling, reprojection, or a specific simulation model. A plug-and-play workflow composer will allow users to string together connectors and processors into a pipeline that can be executed in various environments including HUBzero tools, HPC resources, or Jupyter Notebooks. Automated metadata extraction and annotation are integrated into such workflows, supporting FAIR science practices through ease of subsequent data discovery and reuse. GeoEDF will enhance interoperability, leveraging data connectors for seamless data transfer and tool invocations with other science gateways. By bringing data to the science, GeoEDF will accelerate data-driven discovery, while ensuring that data is not siloed.