Librarians at Purdue University are partnering with scientists to help them describe, preserve, manage, and share the data generated by their research. Scientists are interested in publishing their data to meet the requirements of funding agencies as well as to enable their datasets to be more broadly discovered and used. The Distributed Data Curation Center (D2C2) at the Purdue University Libraries has developed a data repository framework to house datasets and furnish services and tools to make this possible. This poster will describe an example of such a partnership between librarians and agronomists to create a data collection of water quality samples gathered at Purdue’s Agronomy Center for Research and Education (ACRE). It includes an analysis of the researchers’ workflow and the automation of the description and ingestion of instrument data into a repository using XSLT and programming scripts. A scan of available thesauri and community formats and practices was conducted before creating our methodology and publishing our own descriptive schema. The project has two phases: the first to ingest and archive five years’ worth of past data as a batch process, and the second to integrate our tools into the data collection process so that current and future data flows into the repository. Metadata from the water quality sample data collection is harvested, aggregated with metadata from other repository collections, indexed for searching, and presented on the web in a context with other digital library content such as e-prints and digitized archival collections.

Date of this Version

June 2007