ANALYSIS OF AIR POLLUTION BASED ON A CLUSTERING MODEL FOR DISCOVERING THE BACKBONE OF POLLUTION CLUSTER
-
摘要: 由于污染源受地形、地貌及气象等条件影响而分布多样,大气污染数据在空间中呈任意形状、任意密度的复杂分布。为探究这种大气污染分布状况,基于DP算法提出了1个发现污染类核心区域的聚类模型。以实现对污染数据不经统计直接聚类,在保持空气污染数据分布特征不变的基础上提取出关键污染数据,更准确地挖掘空气污染变化规律。将所提聚类模型和k-Means算法在由兰州市2017,2019,2021年各年1月污染物浓度小时数据构成的3个数据集上进行了对比分析。结果显示:所提模型在以上3个数据集上均能更清晰地挖掘出污染数据,在污染类核心区域中的关键污染数据分别为59.0%、57.2%和69.0%,且造成污染的首要污染物均为NO2和颗粒物。此外,该模型从兰州市2021年1月数据中解析出,兰州市月污染变化由污染物NO2和PM10共同作用或交替作用引起,日污染变化在受污染小时数和首要污染物(NO2和PM10)出现次数上的变化趋势均呈双峰型,污染区域为城关区。并通过分析上述污染规律的成因,证明该模型在确保数据复杂分布不变的情况下提取关键污染数据的有效性。Abstract: Since the distribution of air pollution sources is influenced by topography, landform and meteorology, the distribution of air pollution data in space is of arbitrary shapes and densities. To more accurately mine the rule of air pollution, this paper proposed a clustering model based on the DP algorithm for discovering the backbones of the cluster. The model could directly group pollution data without statistical analysis and extract key information from air pollution data by keeping the distribution unchanged, so as to excavate the change law of air pollution more accurately. The proposed clustering model and the k-Means algorithm were compared and analyzed on the three hourly pollutant concentration datasets monitored in January of 2017, 2019 and 2021 in Lanzhou respectively. In these three datasets, our model could more clearly mine the pollution data. The key pollution data accounted for 59.0%, 57.2% and 69.0% respectively in the backbones of pollution cluster, and the primary pollutants causing pollution were NO2 and particulate matter. To reflect the applicability of the model, we analyzed our model on the pollution data in Lanzhou in January 2021, then found that the variation of air pollution in that month was caused by the joint or alternate action of pollutants NO2 and PM10, the hourly variation trend of pollution showed a bimodal pattern both on the number of contaminated hours and the occurrence frequency of primary pollutants (NO2 and PM10), and Chengguan District was the polluted area. The validity of the model was tested using the causes analysis of the above pollution laws, which made the model practical and effective for extracting key air pollution data without changing its complex distribution.
-
Key words:
- air pollution /
- clustering model /
- spatial distribution /
- backbone of cluster
-
[1] ZHOU D, LIN Z L, LIU L M, et al. Spatial-temporal characteristics of urban air pollution in 337 Chinese cities and their influencing factors[J]. Environmental Science and Pollution Research, 2021,28(27):36234-36258. [2] BAI X X, TIAN H, LIU X Y, et al. Spatial-temporal variation characteristics of air pollution and apportionment of contributions by different sources in Shanxi province of China[J]. Atmospheric Environment, 2021, 244:117926. [3] ZHU J Y, ZHENG Y, YI X W, et al. A Gaussian Bayesian model to identify spatio-temporal causalities for air pollution based on urban big data[C]//2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 2016:3-8. [4] 王鹏飞.我国典型城市空气污染时空分布特征及影响因素分析[D].兰州:兰州大学,2019. [5] 杜越.我国主要城市空气污染时空分布特征及其影响因素的研究[D].广州:南方医科大学,2018. [6] 王鸥,何秉宇.乌鲁木齐市空气污染5参数对PM2.5的影响及PM2.5浓度时空变化特征分析[J].新疆环境保护,2018,40(1):6-11. [7] 王占山,李云婷,陈添,等.2013年北京市PM2.5的时空分布[J].地理学报,2015,70(1):110-120. [8] LE V D, CHA S K. Real-time air pollution prediction model based on spatiotemporal big data[J]. arXiv preprint arXiv:1805.00432, 2018. [9] HONARVAR A R, SAMI A. Towards sustainable smart city by particulate matter prediction using urban big data, excluding expensive air pollution infrastructures[J]. Big Data Research, 2019, 17:56-65. [10] YANG C T, CHAN Y W, LIU J C, et al. An implementation of cloud-based platform with R packages for spatiotemporal analysis of air pollution[J]. The Journal of Supercomputing, 2020, 76(3):1416-1437. [11] ZAREE T, HONARVAR A R. Improvement of air pollution prediction in a smart city and its correlation with weather conditions using metrological big data[J]. Turkish Journal of Electrical Engineering & Computer Sciences, 2018, 26(3):1302-1313. [12] KINGSY G R, MANIMEGALAI R, GEETHA D M S, et al. Air pollution analysis using enhanced K-Means clustering algorithm for real time sensor data[C]//2016 IEEE Region 10 Conference (TENCON). IEEE, 2016:1945-1949. [13] GHAEMI Z, ALIMOHAMMADI A, FARNAGHI M. LaSVM-based big data learning system for dynamic prediction of air pollution in Tehran[J]. Environmental Monitoring and Assessment, 2018, 190(5):1-17. [14] ZHENG Y, LIU F, HSIEH H P. U-air:when urban air quality inference meets big data[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013:1436-1444. [15] SHARMA K K, SEAL A. Clustering analysis using an adaptive fused distance[J]. Engineering Applications of Artificial Intelligence, 2020, 96:103928. [16] CHOWDHURY H A, BHATTACHARYYA D K, KALITA J K. UIFDBC:effective density based clustering to find clusters of arbitrary shapes without user input[J]. Expert Systems with Applications, 2021, 186:115746. [17] 刘娜,余晔,张莉燕,等.2016-2018年西宁市颗粒物来源及输送差异分析[J].环境科学学报,2021,41(10):4212-4227. [18] GOVENDER P, SIVAKUMAR V. Application of k-means and hierarchical clustering techniques for analysis of air pollution:a review (1980-2019)[J]. Atmospheric Pollution Research, 2020, 11(1):40-56. [19] 梁银双,刘黎明,卢媛.基于函数型数据聚类的京津冀空气污染特征分析[J].调研世界,2017,4(5):43-48. [20] 龙凌波,佘倩楠,孟紫琪,等.中国沿海地区空气污染特征及其聚类分析[J].环境科学研究,2018,31(12):2063-2072. [21] 金仁浩,曾国静,王莎.基于聚类分析的北京市空气污染时空分布研究[J].环境保护与循环经济,2021,41(1):68-72. [22] 武祺然,周力凯,孙金金,等.浙江省空气质量变化特征研究:基于函数型数据分析[J].山东大学学报(理学版),2021,56(7):53-64. [23] MACQUEEN J. Some methods for classification and analysis of multivariate observations[C]//Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, 1967, 1(14):281-297. [24] LLOYD S. Least squares quantization in PCM[J].Information Theory, IEEE Transactions on, 1982,28(2):129-137. [25] CHEN M, LI L J, WANG B, et al. Effectively clustering by finding density backbone based-on kNN[J]. Pattern Recognition, 2016, 60:486-498. [26] CHEN M, WEN X F, YANG Z C, et al. MulSim:a novel similar-to-multiple-point clustering algorithm[J]. IEEE Access, 2018, 6:78225-78237. [27] CHEN M, WANG P F, CHEN Q, et al. A clustering algorithm for sample data based on environmental pollution characteristics[J]. Atmospheric Environment, 2015, 107:194-203. [28] HAN J W, PEI J, KAMBER M. Data Mining:Concepts and Techniques[M]. Elsevier, 2011. [29] RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492-1496. [30] 霍寿喜.冬季大雾和逆温极易诱发空气污染[J].生命与灾害,2021(12):22-23. [31] 杨秀梅. 兰州持续性冷池空气污染特征及其生消过程的研究[D].兰州:兰州大学,2018. [32] 中华人民共和国环境保护部. 环境空气质量指数(AQI)技术规定(试行):HJ 633-2012[S]. 北京:中国环境科学出版社, 2012. [33] 杨燕燕. 甘肃西北关键城市空气污染特征及颗粒物潜在源区研究[D].兰州:兰州大学,2020. [34] 杨雪玲. 兰州市重污染天气过程环流形势与气象条件研究[D].兰州:兰州大学,2018. [35] 刘娜,余晔,何建军,等.兰州冬季大气污染来源分析[J].环境科学研究,2015,28(4):509-516. [36] 马珊,李忠勤,陈红,等.兰州市采暖期空气质量特征及污染源分析[J].环境化学,2019,38(2):344-353. [37] 唐国亮,瞿德业.兰州市环境空气质量变化趋势分析研究[J].甘肃科技,2019,35(3):36-39. [38] 陈桃桃,李忠勤,周茜,等."兰州蓝"背景下空气污染特征、来源解析及成因初探[J].环境科学学报,2020,40(4):1361-1373. [39] 冯皓.兰州市成功治理大气污染经验研究[J].商,2016(33):70-71.
点击查看大图
计量
- 文章访问数: 199
- HTML全文浏览量: 37
- PDF下载量: 2
- 被引次数: 0