Access to credible estimates of water-use are critical for making optimal operational de-15 cisions and investment plans to ensure reliable and affordable provisioning of water. Fur-16 thermore, identifying the key predictors of water use is important for regulators to pro-17 mote sustainable development policies to reduce water use. In this paper, we propose18 a data-driven framework, grounded in statistical learning theory, to develop a rigorously19 evaluated predictive model of state-level, per capita water use in the US as a function20 of various geographic, climatic and socioeconomic variables. Specifically, we compare the21 accuracy of various statistical methods in predicting the state-level, per capita water use22 and find that the model based on the Random Forest algorithm outperforms all other23 models. We then leverage the Random Forest model to identify key factors associated24 with high water-usage intensity among different sectors in the US. More specifically, ir-25 rigated farming, thermoelectric energy generation, and urbanization were identified as26 the most water-intensive anthropogenic activities, on a per capita basis. Among the cli-27 mate factors, precipitation was found to be a key predictor of per capita water use, with28 drier conditions associated with higher water usage. Overall, our study highlights the29 utility of leveraging data-driven modeling to gain valuable insights related to the water30 use patterns across expansive geographical areas.


This is the published version of Wongso, E., Nateghi, R., Zaitchik, B., Quiring, S., & Kumar, R. (2020). A data‐driven framework to characterize state‐level water use in the U.S. Water Resources Research, 56, e2019WR024894. https://doi.org/10.1029/2019WR024894

Date of this Version