Source Jouranl of CSCD
Source Journal of Chinese Scientific and Technical Papers
Included as T2 Level in the High-Quality Science and Technology Journals in the Field of Environmental Science
Core Journal of RCCSE
Included in the CAS Content Collection
Included in the JST China
Indexed in World Journal Clout Index (WJCI) Report
Volume 44 Issue 5
May  2026
Turn off MathJax
Article Contents
ZENG Hongbin, LONG Qi, GAO Jingheng, XU Ketong, WEI Chaohai, QIU Guanglei. Application of machine learning in water quality prediction and analysis for river cross-sections[J]. ENVIRONMENTAL ENGINEERING , 2026, 44(5): 50-60. doi: 10.13205/j.hjgc.202605005
Citation: ZENG Hongbin, LONG Qi, GAO Jingheng, XU Ketong, WEI Chaohai, QIU Guanglei. Application of machine learning in water quality prediction and analysis for river cross-sections[J]. ENVIRONMENTAL ENGINEERING , 2026, 44(5): 50-60. doi: 10.13205/j.hjgc.202605005

Application of machine learning in water quality prediction and analysis for river cross-sections

doi: 10.13205/j.hjgc.202605005
  • Received Date: 2025-03-24
    Available Online: 2026-06-06
  • This study collected water quality monitoring data from two city-level control monitoring stations within the district from December 2020 to June 2024. The dataset included eight indicators: water temperature, turbidity, pH, conductivity, dissolved oxygen (DO), ammonia nitrogen (NH4+-N), total phosphorus (TP), and permanganate index (CODMn). To address the water quality prediction issues for city-level monitoring sections in the study area, seasonal trend decomposition (STD)-Bayesian-random forest (RF) model, and STD-Bayesian-XGBoost models were constructed. These models used water temperature, turbidity, pH, conductivity, and seasonal factors as characteristic variables to predict and analyze four key indicators: DO, NH4+-N, TP, and CODMn. STD was applied to smooth and denoise the data while extracting seasonal factors. The Bayesian optimization algorithm was selected to optimize the hyperparameters of the RF and XGBoost models. Evaluation results showed that the STD-Bayesian-XGBoost model yielded smaller bias errors compared to the STD-Bayesian-RF model, achieving better prediction accuracy and superior prediction effect. This study contributes to the research on water quality prediction modeling for river basins in southern China, and provides a technical reference for pollution reduction and carbon emission management in the basin.
  • loading
  • [1]
    HAGGERTY R,SUN J,YU H,et al. Application of machine learning in groundwater quality modeling:a comprehensive review[J]. Water Research,2023,233:120243.
    [2]
    CHEN K,CHEN H,ZHOU C,et al. Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data[J]. Water Research,2020,171:115384.
    [3]
    HE Q,LI N,LUO W J,et al. A survey of machine learning algorithms for big data[J]. Pattern Recognition and Artificial Intelligence,2014,27(4):327- 336. 何清,李宁,罗文娟,等. 大数据下的机器学习算法综述[J]. 模式识别与人工智能,2014,27(4):327- 336.
    [4]
    RUI D N,MA Y Y,YE L. Application of machine learning methods in wastewater treatment systems[J]. Environmental Engineering,2022,40(6):145- 153. 芮栋妮,马燕燕,叶林. 机器学习方法在污水处理系统中的应用[J]. 环境工程,2022,40(6):145- 153.
    [5]
    HUANG D P,QIU Y,LIU Y Q,et al. Review of data-driven fault diagnosis and prognosis for wastewater treatment[J]. Journal of South China University of Technology(Natural Science Edition),2015,43(3):111- 120. 黄道平,邱禹,刘乙奇,等. 面向污水处理的数据驱动故障诊断及预测方法综述[J]. 华南理工大学学报(自然科学版),2015,43(3):111- 120.
    [6]
    ZHI W,APPLING A P,GOLDEN H E,et al. Deep learning for water quality[J]. Nature Water,2024,2(3):228- 241.
    [7]
    LI J,WANG J,CUI Y H,et al. Effects of riparian zone landscape on riverine total nitrogen concentrations using a feature-optimized random forest model[J]. Journal of Lake Sciences,2025,37(1):1- 13. 李江,王杰,崔玉环,等. 基于特征优化的随机森林模型探究河岸带景观对入湖河流总氮浓度的影响[J]. 湖泊科学,2025,37(1):1- 13.
    [8]
    XU B W,BI J,YUAN H T,et al. Dynamic water quality warning with seasonal decomposition and long short-term memory network[J]. Chinese Journal of Intelligent Science and Technology,2021,3(4):456- 465. 许博文,毕敬,苑海涛,等. 基于季节性分解与长短期记忆网络的水质动态预警[J]. 智能科学与技术学报,2021,3(4):456- 465.
    [9]
    PANG H,BEN Y,CAO Y,et al. Time series-based machine learning for forecasting multivariate water quality in full-scale drinking water treatment with various reagent dosages[J]. Water Research,2025,268:120987.
    [10]
    YU Y Q,CHEN N W,YU Q B,et al. Selecting transfer conditions based on XGBoost to improve water quality prediction capacity of the LSTM model[J]. Environmental Engineering,2024,42(1):223- 234. 余镒琦,陈能汪,余其彪,等. 基于XGBoost选择迁移条件提升LSTM模型河流水质预测能力[J]. 环境工程,2024,42(1):223- 234.
    [11]
    HUANG S,XIA J,WANG Y,et al. Pollution loads in the middle-lower yangtze river by coupling water quality models with machine learning[J]. Water Research,2024,263:120865.
    [12]
    MA Y,QIAO Y,CHEN M,et al. How small is big enough?big data-driven machine learning predictions for a full-scale wastewater treatment plant[J]. Water Research,2024,269:121001.
    [13]
    WANG C X,LU Y M. Trend recognition and characteristics analysis of water quality indicator time series in minjiang river basin[J]. Journal of Water Resources and Water Engineering,2020,31(4):63- 69. 王春晓,卢毅敏. 闽江流域水质时间序列变化趋势识别及特征分析[J]. 水资源与水工程学报,2020,31(4):63- 69.
    [14]
    LI H H,XIAO B Z,JIN K L,et al. Construction and comparative analysis of water quality prediction models of the sanmenxia reservoir of the yellow river[J]. Environmental Engineering,2024,42(12):1- 7. 李海华,肖保增,靳凯丽,等. 黄河三门峡水库水质预测模型构建及对比分析[J]. 环境工程,2024,42(12):1- 7.
    [15]
    LIU B L,ZHANG M W,YUAN G T,et al. Elimination method of power frequency interference of micro-seismic wave based on STD-RLS adaptive algorithm[J]. Journal of Geodesy and Geodynamics,2025,45(1):1- 13. 刘宝霖,张明伟,袁国涛,等. 基于STD-RLS自适应算法的微震波工频干扰消除方法研究[J]. 大地测量与地球动力学,2025,45(1):1- 13.
    [16]
    XU F. Improving spatial autocorrelation statistics based on Moran's index and spectral graph theory[J]. Urban Development Studies,2021,28(12):92- 101. 许锋. 基于Moran指数和谱图论的空间自相关测度方法优化[J]. 城市发展研究,2021,28(12):92- 101.
    [17]
    DU C,LIN C L,MA Y Z,et al. Parallel surrogate-based optimization method based on Bayesian expected improvement control and Kriging model[J]. Computer Integrated Manufacturing Systems,2025,31(1):1- 18. 杜晨,林成龙,马义中,等. 基于Bayesian期望改进控制和Kriging模型的并行代理优化方法[J]. 计算机集成制造系统,2025,31(1):1- 18.
    [18]
    MENG G L,CONG Z L,SONG B,et al. Review of Bayesian network structure learning[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(1):1- 24. 孟光磊,丛泽林,宋彬,等. 贝叶斯网络结构学习综述[J]. 北京航空航天大学学报,2025,51(1):1- 24.
    [19]
    WEN B W,DONG W H,XIE W J,et al. Parameter optimization method for random forest based on improved grid search algorithm[J]. Computer Engineering and Applications,2018,54(10):154- 157. 温博文,董文瀚,解武杰,等. 基于改进网格搜索算法的随机森林参数优化[J]. 计算机工程与应用,2018,54(10):154- 157.
    [20]
    SHEN H J,XI H,XIE G. Application of improved grid search algorithm on SVM for fault diagnosis[J]. Mechanical Engineering & Automation,2012(2):108- 110. 申慧珺,席慧,谢刚. 改进的网格搜索算法在SVM故障诊断中的应用[J]. 机械工程与自动化,2012(2):108- 110.
    [21]
    WANG Y,QU K,GE Y Z,et al. Impact factors of O3 and PM2.5pollution in typical cities of the shandong province based on random forest model[J]. Environmental Science,2025,46(1):1- 17. 王玉,曲凯,葛衍珍,等. 基于随机森林模型的山东省典型城市O3与PM2.5污染影响因子[J]. 环境科学,2025,46(1):1- 17.
    [22]
    FENG F,WANG Y H,ZUO Y F. A study on factors that influence the spatial distribution of soil cadmium pollution based on RF-XGBoost[J]. Journal of Agro-Environment Science,2023,42(4):811- 819. 冯锋,王育红,左雨芳. 基于RF-XGBoost的土壤镉污染影响因子及空间分布研究[J]. 农业环境科学学报,2023,42(4):811- 819.
    [23]
    ZHAO N,LU Y M. Remote-sensing estimation of near-surface ozone concentration based on XGBoost[J]. Acta Scientiae Circumstantiae,2022,42(5):95- 108. 赵楠,卢毅敏. 基于XGBoost算法的近地面臭氧浓度遥感估算[J]. 环境科学学报,2022,42(5):95- 108.
    [24]
    KANG J F,TAN J L,FANG L,et al. Short-term PM2.5concentration prediction based on XGBoost and LSTM variable weight combination model:a case study of shanghai[J]. China Environmental Science,2021,41(9):4016- 4025. 康俊锋,谭建林,方雷,等. XGBoost-LSTM变权组合模型支持下短期PM2.5浓度预测:以上海为例[J]. 中国环境科学,2021,41(9):4016- 4025.
    [25]
    HOU D. Application of two machine learning algorithms in water quality prediction[J]. Environmental Pollution & Control,2024,46(11):1596- 1600. 侯德. 两种机器学习算法在水质预测中的应用[J]. 环境污染与防治,2024,46(11):1596- 1600.
    [26]
    AVILA R,HORN B,MORIARTY E,et al. Evaluating statistical model performance in water quality prediction[J]. Journal of Environmental Management,2018,206:910- 919.
    [27]
    WU Y L,LI Z M,CHENG X Q,et al. Prediction of nitrogen removal performance and identification of key parameters of partial nitrification/partial denitrification-anammox process based on machine learning[J]. Environmental Engineering,2024,42(10):1- 11. 吴宇伦,李泽敏,成晓倩,等. 基于机器学习的短程硝化/短程反硝化-厌氧氨氧化工艺脱氮性能预测与关键参数识别[J]. 环境工程,2024,42(10):1- 11.
    [28]
    ZHI W,FENG D,TSAI W P,et al. From hydrometeorology to river water quality:can a deep learning model predict dissolved oxygen at the continental scale?[J]. Environmental Science & Technology,2021,55(4):2357- 2368.
    [29]
    WANG H Z,HU S W,ZUO T T,et al. Composition characteristics and source identification of dissolved organic matter in medium and eutrophic urban lakes in nanjing[J]. China Environmental Science,2025,45(1):1- 14. 王华梓,胡思文,左腾腾,等. 南京市中、富营养化城市湖泊溶解性有机质的组成特征及来源分析[J]. 中国环境科学,2025,45(1):1- 14.
    [30]
    LIU Y T,BAI Y,XU H Y,et al. Noise map within beijing's fifth ring road based on random forest model[J]. Environmental Monitoring in China,2024,40(4):241- 250. 刘宜婷,白煜,许怀悦,等. 基于随机森林模型的北京市五环内噪声地图模拟[J]. 中国环境监测,2024,40(4):241- 250.
    [31]
    LI Z S,LIU Z G. Feature selection algorithm based on XGBoost[J]. Journal on Communications,2019,40(10):101- 108. 李占山,刘兆赓. 基于XGBoost的特征选择算法[J]. 通信学报,2019,40(10):101- 108.
    [32]
    DAI W Q,YE C,LI C H,et al. Analysis of temporal and spatial characteristics of water quality and the impact factors in datong lake area[J]. Environmental Engineering,2022,40(2):34- 41. 戴婉晴,叶春,李春华,等. 大通湖湖区水质时空分布特征及其影响因子解析[J]. 环境工程,2022,40(2):34- 41.
    [33]
    DENG X W,QI L,MA X,et al. Recognition of weeds at seedling stage in paddy fields using multi-feature fusion and deep belief networks[J]. Transactions of the Chinese Society of Agricultural Engineering,2018,34(14):165- 172. 邓向武,齐龙,马旭,等. 基于多特征融合和深度置信网络的稻田苗期杂草识别[J]. 农业工程学报,2018,34(14):165- 172.
    [34]
    YANG Z H,TUO Y,YANG J,et al. Integrated prediction of summer precipitation in China based on multi dynamic-statistic methods[J]. Chinese Journal of Geophysics,2024,67(3):982- 996. 杨子寒,托雅,杨杰,等. 基于多种动力-统计方法的中国夏季降水集成预测研究[J]. 地球物理学报,2024,67(3):982- 996.
    [35]
    MO J B,YUAN Z B. Improving the performance of random forests model in daily-scale ozone pollution prediction by integrating upwind spatial information[J]. Acta Scientiae Circumstantiae,2024,44(11):39- 49. 莫健彬,袁自冰. 上游空间信息融合对随机森林模型预报长三角日尺度臭氧污染的性能提升研究[J]. 环境科学学报,2024,44(11):39- 49.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (67) PDF downloads(0) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return