Conference Year



Building Energy Prediction, Decision Tree, Root Mean Squared Logarithmic Error


Building energy predictions are in critical need in many fields. The conventional physic model-based approach (via EnergyPlus or similar tools) does decent work to predict energy consumptions. However, it is limited to single predefined building analysis and requires an extensive amount of time and labor to build models. Nevertheless, decision-makers usually need to quantify the energy savings of large building clusters within a short time. The thriving of big data and machine learning techniques enables predicting energy consumptions accurately for different applications within reasonable time frames. This study aims at developing data-driven models for generalized building energy predictions. The models can be used for establishing counter-factual baselines to validate the efficacy of energy-saving measures and energy production and usage planning. The former is usually for medium to long-term durations, while the latter is for short-term durations. We used real-world open data sets from ASHRAE, which covers energy consumptions of about 1,500 buildings for two years. We then preprocessed the data following the industry's standard practice. Multiple approaches of missing values imputations, outlier detections, and feature engineering were explored, based on which the best methods are suggested for building energy predictions. Gradient boosting (GB) based model has been developed for medium to long-term predictions, while the long short-term memory (LSTM) based model has been developed for short term predictions. Hyperparameter tuning was performed on model structures and parameters. We used root mean squared error (RMSE) between the predicted and true energy consumptions to evaluate performances. The results show that the GB based model achieves RMSE of 0.49 for electricity, 1.10 for chilled water, 1.25 for steam, and 1.32 for hot water. The LSTM model performs better with shorter prediction days and longer input days. However, further increasing input days beyond a week does not increase the performance. The LSTM model has about 38% lower prediction errors than the baselines, which are averages of energy consumptions from similar historical days. The study demonstrates the development process of data-driven models for general purpose building energy predictions, from data preparations, model selections, development, and evaluations.