Abstract
Knowing how much oil was released into the environment in each incident is critical to studying the environmental, ecological, and economic impacts of oil spills. However, the release amounts (RAs) for numerous oil spill incidents remain unavailable in a structured format for large-scale analysis. The most extensive global oil spill database, managed by NOAA’s Office of Response and Restoration, only documents the worst-case scenario estimations in the machine-readable dataset, while more accurate values are embedded within unstructured incident descriptions and subsequent updates. To enhance the dataset with more accurate RAs, we developed a framework to extract the actual RAs from textual data considering whether spillage was confirmed and the timeliness of the information. The enhanced dataset includes 3,550 oil spill incidents from 1967 to 2023, with their actual RAs confirmed or updated using the text-form information, improving the accuracy of global oil spill data. By supplementing the original potential maximum RAs with the actual RAs information, this dataset can support more accurate risk and environmental impacts assessment of oil spills.
Date of this Version
8-8-2025
Recommended Citation
Liu, Yiming; Yin, Zhuoli; and Cai, Hua, "Enhanced global oil spill dataset from 1967 to 2023 based on text-form incident information" (2025). The School of Sustainability Engineering and Environmental Engineering (SEE) Faculty Publications. Paper 2.
https://docs.lib.purdue.edu/eeepubs/2
Comments
This is the publisher PDF of Liu, Y., Yin, Z. & Cai, H. Enhanced global oil spill dataset from 1967 to 2023 based on text-form incident information. Sci Data 12, 1394 (2025). Published by Springer, it's made available CC-BY-NC-ND, and the version of record can also be found at DOI: 10.1038/s41597-025-05601-9.