This paper was presented at the 28th Annual Conference of the International Association of Technological University Libraries (IATUL) in Stockholm, Sweden.


The challenge of accessing, maintaining, sharing and preserving massive datasets, generally referred to as data curation, has been a direct result of computational e-science. Although scientists and engineers recognized the problem, the solution was not apparent. The principles that underlay library science are not widely understood or appreciated by those outside librarianship. The theory and principles behind librarianship are obscured by historical application primarily to print materials – books and journals – however, the same principles that apply to organization, retrieval, and preservation of print materials apply to the digital realm as well.

The National Science Foundation (NSF) in the United States is concerned that much of its funding was committed to creating datasets, used for a specific research project, and then discarded. The question was: couldn’t a dataset be “mined” for more than one research project? The NSF has begun to assess and research the issues associated with “archiving” datasets for present and future research use.

Purdue University Libraries, after having observed the need that domain researchers have, determined that the creation of a center to focus on and research these issues while fostering collaboration between librarians and domain researchers was needed. The Distributed Data Curation Center (D2C2) was created at the end of 2006.


e-science; curation of scientific data sets; international scholarly communication

Date of this Version

June 2007