Abstract
Machine learning (ML) has proven effective for predicting the compressive strength of laboratory-produced concrete, but industrially produced concrete exhibits greater variability and uncertainty, and remains less studied. This work analyses a dataset of 2,617 industrial concrete samples from a UK ready-mix supplier, spanning a full year of supply and demand. The study evaluates the impact of both mix proportions and categorical features—including cement type, admixture type, and mix specification—on model accuracy. A holistic model incorporating all curing ages outperforms single-age models, and the inclusion of categorical features, particularly mix specification, significantly enhances predictive performance. Five ML models were considered: CatBoost, LightGBM, Gradient Boosting, XGBoost, and Random Forest, with XGBoost achieving the highest performance (R2 of 0.75, R of 0.86) on the test data for the holistic model.
Keywords
industrially produced concrete, concrete compressive strength, machine learning.
DOI
10.5703/1288284318143
Recommended Citation
Etim, Bassey; Arab, Mzakwan; El Mocayd, Nabil; Seaid, Mohammed; and Mohamed, M Shadi, "Machine Learning Based Analysis of Industrially Produced Concrete" (2025). International Conference on Durability of Concrete Structures. 3.
https://docs.lib.purdue.edu/icdcs/2025/slm/3
Machine Learning Based Analysis of Industrially Produced Concrete
Machine learning (ML) has proven effective for predicting the compressive strength of laboratory-produced concrete, but industrially produced concrete exhibits greater variability and uncertainty, and remains less studied. This work analyses a dataset of 2,617 industrial concrete samples from a UK ready-mix supplier, spanning a full year of supply and demand. The study evaluates the impact of both mix proportions and categorical features—including cement type, admixture type, and mix specification—on model accuracy. A holistic model incorporating all curing ages outperforms single-age models, and the inclusion of categorical features, particularly mix specification, significantly enhances predictive performance. Five ML models were considered: CatBoost, LightGBM, Gradient Boosting, XGBoost, and Random Forest, with XGBoost achieving the highest performance (R2 of 0.75, R of 0.86) on the test data for the holistic model.