Hu et al. 2022 — Machine learning for predicting arsenic and chromium in drinking water

This review and modeling paper evaluates machine learning approaches for predicting the risk of chemical contaminants — primarily arsenic, nitrate, and hexavalent chromium (Cr-VI) — in US public drinking water systems. The authors synthesize literature on ML model performance, including random forest and neural network architectures trained on UCMR and SDWIS databases, and find that geospatial, geological, and demographic features achieve good predictive accuracy for arsenic exceedance above the 10 µg/L MCL, supporting prioritized monitoring and remediation resource allocation.

Key numbers

  • US EPA MCL for arsenic in drinking water: 10 µg/L (10 ppb)
  • ML models for arsenic prediction achieved AUC values of 0.75–0.92 depending on feature set and geography
  • Cr-VI lacks a federal MCL; California MCL of 10 µg/L used as reference in several reviewed models
  • ~3,000 US public water systems flagged as high-risk for arsenic exceedance in prediction models reviewed

Methods (brief)

Review and synthesis of published machine learning models for chemical contaminant prediction in drinking water. Models included random forest, gradient boosting, logistic regression, and neural networks. Primary contaminants covered: As (total, inorganic), NO3, Cr-VI. Speciation note: papers reviewed generally use total arsenic; distinction between tAs and iAs is not consistently maintained across the reviewed ML literature.

Implications

Certification: Establishes ML as a validated tool for predicting drinking water As/Cr exceedance; relevant to supply-chain water source risk assessment. Courses: Demonstrates predictive modeling applications for water quality management. App: Supports geographic risk flagging for drinking water arsenic exposure.

Wiki pages updated on ingest