ANOMALY DETECTION OF SMOKE EMISSIONS BASED ON WORKING CONDITION DATA
-
摘要: 识别由于主观篡改或设备工况异常导致的污染物排放数据异常现象,对于重点排污单位环境污染监控、整治和管理有着重要意义。以河北省某钢铁企业为例,基于每小时工况数据和烟尘浓度建立预测模型,采用改进的损失函数MSECorrLoss进行TabNet模型训练,并与XGBoost、LightGBM和BiLSTM模型进行对比,提出了一种基于阈值划分的K-error算法进行烟尘排放异常数据的识别,结果表明:1)相较RMSELoss损失函数,采用改进的MSECorrLoss训练后,TabNet模型MAPE由15.33%下降为15.10%,且模型收敛更快。2)LightGBM和XGBoost训练速度快,但LightGBM预测精度低(RMSE=0.3201, MAPE=29.45%),XGBoost和BiLSTM模型鲁棒性与稳定性(RMSE:0.3403~0.3425, MAPE:13.58%~18.38%)不及TabNet(RMSE:0.2886~0.2934, MAPE:15.10%~15.33%)。虽然TabNet训练时间较长,但无需人工进行特征选取,应用限制低,在烟尘预测中具有良好的应用效果。3)基于工况数据构建的TabNet模型在污染物排放预测上具有较高的预测精度与稳定性,结合K-error检测算法可以克服阈值法带来的主观性。该方法可以快速检测污染物排放异常数据,为环境管理决策提供参考。
-
关键词:
- TabNet /
- MSECorrLoss /
- 烟尘 /
- 浓度预测 /
- 异常检测
Abstract: Identifying the abnormal phenomenon of pollutant emission data caused by subjective tampering or abnormal equipment working conditions is of great significance for environmental pollution monitoring, remediation and management of key pollutant discharging units. Taking a steel enterprise in Hebei Province as an example, we developed a prediction model, TabNet, based on hourly working condition data and smoke concentration. We trained the model by using an improved loss function, MSECorrLoss. TabNet was compared with XGBoost, LightGBM and BiLSTM. We developed a K-error anomaly detection algorithm to identify the anomaly data of smoke emission. The results show that: 1) the MAPE of TabNet model decreases from 15.33% to 15.10% and TabNet model converges faster after being trained by improved MSECorrLoss comparing with being trained by RMSELoss loss function. 2) LightGBM and XGBoost have high training speed, but low prediction accuracy(RMSE=0.3201, MAPE=29.45%). The robustness and stability of XGBoost and BiLSTM models(RMSE: 0.3403~0.3425, MAPE: 13.58%~18.38%) is lower than TabNet(RMSE: 0.2886~0.2934, MAPE: 15.10%~15.33%). Although TabNet takes longer training time, it does not require manual feature selection, has low application restrictions, and has a better application performance in smoke prediction. 3) The TabNet model constructed based on working condition data has high prediction accuracy and stability in pollutant discharge prediction. With K-error detection, the TabNet model overcomes the subjectivity brought by a threshold method. This method can detect the abnormal data of pollutant discharge quickly and support environmental management decision making.-
Key words:
- TabNet /
- MSECorrLoss /
- smoke /
- concentration prediction /
- anomaly detection
-
[1] LEHMANN R.3σ-rule for outlier detection from the viewpoint of geodetic adjustment[J].Journal of Surveying Engineering,2013,139(4):157-165. [2] DUBITZKY W,WOLKENHAUER O,CHO,K H,et al.Tukey’s Honestly Significant Difference Test[M].Encyclopedia of Systems Biology,Springer,New York,2010. [3] CHAWLA S,GIONIS A.K-means:a unified approach to clustering and outlier detection[C]//Proceedings of the 2013 SIAM International Conference on Data Mining,2013,189-197. [4] 向玲,邓泽奇,赵玥.基于SCADA数据的风电机组异常识别方法[J].太阳能学报,2020,41(11):278-284. [5] 薛美盛,王旭,冀若阳.基于支持向量机的烟气二氧化硫排放量预测模型[J].计算机系统应用,2018.27(2):186-191. [6] WANG Y,XUE S,DING J.Research on water pollution prediction of township enterprises based on support vector regression machine[C]//E3S Web of Conferences,2021,228:02014. [7] 郭佳.基于机器学习算法的企业用电预测模型研究[D].重庆:重庆邮电大学,2019. [8] 陈维刚,张会林.基于RF-LightGBM算法在风机叶片开裂故障预测中的应用[J].电子测量技术,2020,43(1):162-168. [9] ZHANG B,ZOU G,QIN D,et al.A novel Encoder-Decoder model based on read-first LSTM for air pollutant prediction[J].Science of the Total Environment,2021,765(3):144507. [10] 窦珊,张广宇,熊智华.基于LSTM时间序列重建的生产装置异常检测[J].化工学报,2019,70(2):481-486. [11] 赵文清,沈哲吉,李刚.基于深度学习的用户异常用电模式检测[J].电力自动化设备,2018,38(9):34-38. [12] 潘渊洋,李光辉,徐勇军.基于DBSCAN的环境传感器网络异常数据检测方法[J].计算机应用与软件,2012,29(11):69-72,111. [13] 苏银皎,苏铁熊,王大振,等.改进小波神经网络用于火电厂污染物排放量的预测[J].计算机科学,2016,43(增刊1):508-511. [14] 王科峰.火电厂烟尘及废水污染物排放总量预测分析研究[J].环境科学与管理,2020,45(10):144-148. [15] 王印松,闫鑫,袁环环.基于改进PSO-LSSVM的烟气SO2及烟尘浓度预测[C]//2021 全国仿真技术学术会议论文集,2021:161-166. [16] 张冉,张山山,史一涛,等.火电厂大气污染物排放预测模型[J].环境工程学报,2016,10(5):2547-2550. [17] ARIK S Ö,PFISTER T.Tabnet:attentive interpretable tabular learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence,2021,35(8):6679-6687. [18] YAN J,XU T,YU Y,et al.Rainfall forecast model based on the TabNet model[J].Water,2021,13:1272. [19] WANG Q,CHAI S,LIU Y,et al.GTFD-XTNet:a tabular learning-based ensemble approach for short-term prediction of photovoltaic power[J].IET Renew.Power Gener,2022,16:2682-2693. [20] CHEN T,GUESTRIN C.XGBoost:a scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2016,785-794. [21] KE G,MENG Q,FINLEY T,et al.LightGBM:a highly efficient gradient boosting decision tree[C]//NIPS,2017,3149-3157. [22] WANG Z,YANG B.Attention-based bidirectional long short-term memory networks for relation classification using knowledge distillation from BERT[C]//2020 IEEE Intl Conf on Dependable,2020,562-568.
点击查看大图
计量
- 文章访问数: 87
- HTML全文浏览量: 13
- PDF下载量: 6
- 被引次数: 0