Evaluation of Maximum Likelihood Estimation and regression methods for fusion of multiple satellite Aerosol Optical Depth data over Vietnam Pham Van Ha Center of Multidisciplinary Integrated Technology for Field Monitoring University of Engineering and Technology, VNUH Hanoi, Vietnam hapv@fimo.edu.vn Ngo Xuan Truong Center of Multidisciplinary Integrated Technology for Field Monitoring University of Engineering and Technology, VNUH Hanoi, Vietnam truongnx@fimo.edu.vn Astrid Jourdan Ecole Internationale des Sciences du Traitement de l'Information,EISTI Pau, France aj@eisti.eu Abstract—This paper applied different data fusion methods including Maximum Likelihood Estimation (MLE) and Linear Regression methods on satellite images over Vietnam areas from Moderate Resolution Imaging Spectroradiometer (MODIS) and Visible Infrared Imaging Radiometer Suite (VIIRS) sensors In comparison with ground station Aerosol Robotic Network (AERONET), the regression method is better than Maximum Likelihood Estimator (MLE) Our results show that the fusion methods can improve both data coverage and quality of satellite aerosol optical depth (AOD) Strong correlations were observed between fused AOD and AERONET AOD (R2 = 0.8118, 0.7511 for Terra regression and MLE method, respectively) This paper presented the evaluation of data fusion algorithm and highlighted its importance on the satellite AOD data coverage and quality methods from multiple sensors Keywords— data fusion, regression, Maximum Likelihood Estimation, Vietnam, satellite images I INTRODUCTION Aerosols are the liquid droplets or solid particles which present in air as fog or smoke It may have complex effects on clouds and precipitation, air quality and public health [1] Aerosol Optical Thickness (AOT) or Aerosol Optical Depth (AOD) is the amount of aerosol present in the atmosphere AOD has been used in air quality monitoring application in multiple scale [2]–[7] AOD observations can be grouped into two broad kinds: in situ measurements and satellite remote sensing Although very useful, in situ measurements are 978-1-7281-3003-3/19/$31.00 ©2019 IEEE Dominique Laffly University Toulouse II Jean Jaurès, UT2J Toulouse, France dominique.laffly@univ-tlse2.fr Nguyen Thi Nhat Thanh Center of Multidisciplinary Integrated Technology for Field Monitoring University of Engineering and Technology, VNUH Hanoi, Vietnam thanhntn@fimo.edu.vn limited both in time and space Satellites are increasingly used to obtain AOD, taking advantage of technical and scientific developments over the last decades However, satellite AOD data also has its own disadvantages such as coarse temporal frequency, cloud contamination and moderate quality Especially, the high cloud coverage over tropical area such as Southeast Asia has a significant impact on monitoring satellite data in these regions A previous study of Lasko [8] show that the highest cloud cover was observed in Vietnam (monthly average was 72.4%) Therefore, a fusion of multisource satellite AOD data can be an effective solution AOD can be observed from many different satellites, including MODIS sensor on MODIS Aqua / Terra satellite and VIIRS sensor on Suomi-NPP satellite The Moderate Resolution Imaging Spectroradiometer (MODIS) sensor was launched in 1999 on the Terra satellite and Aqua satellite in 2002 [9] The swath width of MODIS sensor is about 2330 km and orbits cycle is 16 days The MODIS aerosol retrieval algorithm is comprised of two different algorithm including Dark Target and Deep Blue The Visible Infrared Imaging Radiometer Suite (VIIRS) onboard the Suomi National Polar-Orbiting Partnership (Suomi NPP) satellite was launched in October 2011 as a successor to MODIS sensor VIIRS aerosol retrievals are made at 550 nm which can measure both Angstrom Exponent and aerosol type [10] In remote sensing and satellite imagery research area, data fusion is defined as data combination of various satellite images The fusion methods are utilized including optimum interpolation [11]–[13], polynomial functions [14], weighted or arithmetic average, least squares and Maximum Likelihood Estimation [15]– [17] China Son La m n The maximum likelihood estimator (MLE) is one of the most widely used methods in statistical inference MLE is used to estimate the parameter value of a probability model based on observed data by maximizing likelihood function In data fusion area, different statistics information were used for fusing AOD data [17] Linear regression analysis is a method of analyzing the relationship between the response variable and one or more predictor variables The model parameters are estimated from data based on the ordinary least squares (OLS) method Many previous studies used linear regression methods to integrate AOD data [18]–[20] In this paper, we integrate MODIS Terra/Aqua satellite aerosol datasets in Vietnam region using two different methods: Maximum Likelihood Estimation (MLE) and linear regression The research questions are raised as follows: What is the effective data fusion method which enhances quality, data coverage of satellite aerosol data over Vietnam area? The research area and datasets are shown in the next part The methodology is presented in section III Section IV presents the experimental including the assessment of data coverage and data quality of original data and fused data Finally, the conclusion is illustrated in section V II m n Laos Thailand Cambodia m n In Vietnam, four seasons are present for the North including Spring, Summer, Autumn and Winter It can also be divided into dry season (November - March), rainy season (May - September) and seasonal change (April and October) The climate in the Central and the South is more stable with rainy season (September December in the central and May - October in the south) and dry season is the rest in one year In this paper, MODIS Terra/Aqua AOD level at km (MOD04_3K/MYD04_3K), VIIRS AOD EDR at km (VIIRS EDR) over Vietnam are utilized to experiment different data fusion methods Nha Trang Vietnam Bac Lieu m n Fig Vietnam boundary and location of AERONET stations TABLE I EXPERIMENTAL DATA Data Product/ Station Spatial resolution/ Location Temporal resolution MODIS Terra AOD MOD04_3K km Daily MODIS Aqua AOD MYD04_3K km Daily VIIRS NPP AOD VIIRS AOD EDR AERONET AOD NGHIADO Nha Trang Bac Lieu SonLa STUDY AREA AND DATASETS The study area is showed in Figure The study was conducted in Vietnam region which lies on the eastern edge of the Indochinese Peninsula, near the center of Southeast Asia, with a latitude of 23º23 'N to 8º27' N and longitude of 102°10' E to 109°30' E The northern border with 1400 km is bordered by China, the western bordered by Laos and Cambodia, and 3260 km by sea along the east and south NGHIA DO km 105.800, 21.048 109.206, 12.205 105.730, 9.280 105.800, 21.048 Daily 15m Period 2012 2016 2012 2016 2012 2016 2012 2016 MODIS data collection at 3km spatial resolution from 2012 – 2016 are collected from LAADS DAAC system [21] VIIRS Aerosol Level products onboard the Suomi-NPP satellite are also collected All data at km spatial resolution in period of 2012 –2016 are collected from NOAA website [22] Finally, station AOD data from AERONET network set up by NASA and PHOTONS are collected for validating original and fused satellite AOD data Currently, seven AERONET stations have been built in Vietnam but only four stations which provide data during 2012 to 2016 All cloud-screened and qualityassured AERONET AOD data (Level 2) at four stations in Vietnam including Nghia_Do, Nha_Trang, Bac_Lieu and SonLa (see Fig.1) were gathered All collected data are summarized in Table III METHODOLOGY A Data preprocessing Both MODIS Terra/Aqua and VIIRS aerosol algorithm can produce not only AOD value but also quality flag for each pixel The quality flag is stored aerosol product as a separated dataset Quality flag values range from – where is lowest quality and is highest quality Firstly, we extract Optical_Depth_Land_And_Ocean and Land_Ocean_Quality_Flag datasets on MODIS Terra/Aqua AOD images For VIIRS NPP, AOD value and quality flag data are extracted from AerosolOpticalDepth_at_550nm and QF1_VIIRSAEROEDR datasets Each raw image must, be separately converted to a georeferenced image In this paper, Thin Plate Spline (TPS) method were applied for georeferencing raw images using GCPs in the metadata of each image Our previous study [23] show that TPS function is the best georeferencing method for satellite images Then, Both MODIS and VIIRS NPP AOD data were re-projected into grids of km pixel resolution These processes were implemented by GDAL library (Geospatial Data Abstraction Library) For comparing of AERONET with MODIS AOD and VIIRS AOD, It was converted into 550 nm using the Angström Exponent (440 nm – 675 nm) as following equation: τ 0.5 µm τ 0.55 µm = e −α 0.44 µm −0.67 µm ln (1) 0.5 0.55 Where τ0.55μm is AERONET AOD at 550 nm and τ0.5μm is at 500 nm, respectively α0.44 µm – 0.67 µm is the Angstrưm Exponent for the range of 440 nm – 675 nm B Data fusion Two different methods were used to combine aerosol data including Maximum Likelihood Estimation (MLE), and Linear Regression Maximum Likelihood Estimation (MLE) In this study, AOD images are combined day by day For each day, a set of three AOD images are merged to produce only one consistent merged image For each pixel of the fused image, AOD values are calculated as the average of aerosol values from each sensor base on a given weight The MLE combination is defined as following equation: ܦܱܣ ݂݀݁ݏݑ,݅ =ܹ ܶ݁ܽݎݎ,݅ ܦܱܣ ܶ݁ܽݎݎ,݅ + ܹ ܦܱܣ ܽݑݍܣ,݅ ܽݑݍܣ,݅ + ܹ ( ݏݎܸ݅݅ܦܱܣ2) ܸ݅݅ݏݎ,݅ ,݅ Where AOD ், is MODIS Terra AOD value at pixel i,Wୣ୰୰ୟ,୧ is weighting of pixel i on Terra AOD images The weighting is based on information about quality flag of each pixel Quality flags are an indicator of the algorithm team’s for assessment of the AOD data quality In this study, the weighting value and quality flag are directly proportional It means that the weighting value of high-quality flag pixel is greater than the weighting value of medium or low-quality pixel It is calculated from quality flags as follows: Wܶ݁ܽݎݎ,݅ = Wܽݑݍܣ,݅ = Wܸ݅݅ݏݎ,݅ = QFܶ݁ܽݎݎ,݅ ∑3݇=1 QF݇,݅ QFܽݑݍܣ,݅ ∑3݇=1 QF݇,݅ (3) QFܸ݅݅ݏݎ,݅ ∑3݇=1 QF݇,݅ Where WTerra,i is weighting of pixel i, QFTerra,i is quality flag value of pixel i on Terra AOD images and QFk,i is Quality flag value of pixel i on other sensors (k=1,2,3) Linear Regression Based on the quality, the satellite data which was the best quality is used as target data In order to increase AOD image coverage, this method employed other AOD data to retrieve the target data where target AOD values were not available but other AOD values exist Firstly, the data which is the highest data quality is used as target variable In this step, by validating with AERONET data as described in our previous studies [24], [25], MODIS Terra AOD was chosen as a target variable and the empty MODIS Terra pixels were filled up with values where MODIS Aqua or VIIRS NPP existed After that, for each day, the pixels which have both MODIS Terra/Aqua or VIIRS AOD value were selected as training dataset to build the model as following regression equation: AODTerra = a * AODAqua + b (4) AODTerra = c * AOD VIIRS + d Where AODTerra, AODAqua and AODVIIRS are value of satellite AOD (a,c) and (b,d) are regression slope and intercept Then, regression images of MODIS Aqua and VIIRS NPP are estimated by using regression coefficient Finally, MODIS Terra and all regress images are merged All pixels on Terra without cloud are kept For all cloud pixels on Terra, if it is available on Aqua Regress image then its values are selected Finally, for the pixels which are cloud on both Terra and Aqua, VIIRS AOD values is used if it does not contain cloud C Data evaluation After combination, the spatial coverage of original images and fused images are also computed for comparison In addition, fused images are evaluated with AERONET data to assess the fused image quality and the accuracy of fusion methods Several fusion methods are compare using following criteria such as coefficient of determination (R2), Root Mean Square Error (RMSE), Relative Error (RE), Mean Fractional Bias (MFB) and Mean Fractional Error (MFE) as follows: ܴଶ = ሺ∑௧ୀଵሺݔ௧ െ ̅ݔሻሺݕ௧ െ ݕതሻሻଶ ∑௧ୀଵሺݔ௧ െ ̅ݔሻଶ ∑௧ୀଵሺݕ௧ െ ݕതሻଶ ∑௧ୀଵሺݔ௧ െ ݕ௧ ሻଶ ܴ = ܧܵܯඨ ݊ |ݔ௧ െ ݕ௧ | 100% ܴ= ܧ ݔ௧ ݕ௧ െ ݔ௧ = ܤܨܯ ݔ+ ݕ ݊ ௧ ௧ ௧ୀଵ ቀ ቁ |ݕ௧ െ ݔ௧ | = ܧܨܯ ݔ+ ݕ ݊ ௧ ௧ ௧ୀଵ ቀ ቁ Where ݔ ഥ , ݕഥ are mean of ground AOD and satellite AOD, ݔ௧ , ݕ௧ are ground and satellite AOD at time t, and n is the number of samples IV Fig Data coverage of original images and fusion image over Vietnam region from 2012 - 2016 B Evaluation of data quality The scatter plots in Figure show the correlation between AOD values of satellite images with AERONET AOD data, including original images (MODIS Aqua, MODIS Terra, VIIRS NPP) and fused images (MLE, Terra regression) On these charts, the horizontal axis is the AOD value of the image, the vertical axis is the corresponding AOD value of the AERONET station In the original images, MODIS Terra satellite data has the highest determination coefficient (R2 = 0.76), MODIS Aqua and VIIRS satellite image data have lower R2 coefficients (R2 = 0.5824 and 0.6018 respectively) In the fused images, with the same number of samples (n = 680), the highest coefficient of determination (R2 = 0.8118) was observed on Terra regression method, follow by the MLE method (R2 = 0.7511) EXPERIMENT RESULTS A Evaluation of data coverage Figure depicts the monthly averaged of data coverage original data and fused data for the period from 2012 to 2016 The data coverage of an image is calculated as the number of pixels that have valid data compared to the total number of pixels in the whole of Vietnam The monthly average of data coverage is calculated as the average of the data coverage for all images in the given month The results show that the data coverage for each month are different, with the monthly variety trend are similar for both satellite data The MODIS Terra/Aqua data coverage is relatively low, approximately 5% to 20% over the months While data coverage for VIIRS NPP images are relatively high, approximately from 25% to 75% The data coverage of the fusion methods is approximately similar, slightly higher than the data coverage of the VIIRS NPP image Although the data coverage increased slightly, however, the quality of the combined image increased significantly compared to the VIIRS NPP image (presented in the next section) This shows that the combined image takes advantage of the original images simultaneously to increase the spatial coverage and increase the quality of the data Fig Correlation of AERONET data with original and fused data (2012 – 2016) Table summarizes the evaluation results on quality of the original and fusion data using a variety of methods The assessment has been conducted by comparing AERONET monitoring stations at stations in Vietnam for the period from 2012 to 2016 For the original data, the yearly data coverage of VIIRS NPP from 2012 to 2016 was the highest (approx 45.4396% compared to MODIS Terra and MODIS Aqua are 12.5033% and 10.4254% respectively) However, considering the data correlation, MODIS Terra has a better correlation (R2 = 0.7594, 0.5824, 0.6018 corresponding to MODIS Terra, MODIS Aqua and VIIRS NPP) The error of MODIS Aqua data is the lowest (RMSE = 0.2806, RE = 73.9%), followed by the MODIS Terra (RMSE = 0.2379, RE = 74.3%) and the highest were the VIIRS NPP (RMSE = 0.2667, RE = 99.25%) Both MFB and MFE are within acceptable range (approximately 42 to 52%) TABLE II DATA QUALITY OF ORIGINAL AND FUSED IMAGES Data type Data coverage Samples MODIS Aqua MODIS Terra VIIRS NPP MLE VI ACKNOWLEDGMENT The author would like to thank the Agence Universitaire de la Francophonie (AUF) for PhD Fellowship This research is also funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.99-2016.22 REFERENCES [1] [2] [3] [4] Terra Regression 10.4254 12.5033 45.4396 47.9188 48.4046 550 495 680 680 680 0.8029 R2 0.5824 0.7594 0.6018 0.7511 RMSE 0.2806 0.2379 0.2667 0.2455 0.2225 RE 73.9229 74.325 99.2505 80.5099 72.7096 MFB -0.3506 -0.34 -0.4113 -0.3343 -0.3459 MFE 0.4246 0.4314 0.5253 0.4599 0.4476 For fusion data, both methods offer a better data coverage and data quality Data coverage of fusion methods not differ significantly, with a value of 48.5% However, the correlation between fusion methods is quite distinct The MLE and AERONET regression correlations were approximately 0.75, which is similar to the MODIS Terra image and higher than the MODIS Aqua and VIIRS NPP images Terra regression methods has a higher correlation (approximately 0.8) than the original images and other methods In terms of data error, the fusion methods produce a lower error than the original image AERONET regression method had the lowest error (58.18%), followed by Terra regression (RE approx 72-76%) and MLE (RE = 80%) Both MFB and MFE are in acceptable levels, showing that the results of the fusion methods are relatively good V error Therefore, depending of the application so that we can choose the appropriate data fusion method [5] [6] [7] [8] [9] [10] [11] [12] CONCLUSION In this paper, several data fusion methods were applied on MODIS and VIIRS AOD images in Vietnam AERONET AOD were used to validate these data fusion methods The results indicate that Terra Regression method is better than MLE method Data fusion methods generally increase data coverage and improve data quality Terra regression model shows the best correlation while MLE gives the lowest relative [13] [14] [15] J Lenoble, L Remer, and D Tanre, Aerosol Remote Sensing Springer Berlin Heidelberg, 2013 M Sukitpaneenit and N T Kim Oanh, “Satellite monitoring for carbon monoxide and particulate matter during forest fire episodes in Northern Thailand,” Environ Monit Assess., vol 186, no 4, pp 2495–2504, 2014 T H Le, T N T Nguyen, K Laskob, S Ilavajhalac, P K Vadrevub, and C Justice, “Vegetation fires and air pollution in Vietnam,” Environ Pollut., 2014 T T N Nguyen et al., “Particulate matter concentration mapping from MODIS satellite data: a Vietnamese case study,” Environ Res Lett., vol 10, no 9, p 95016, 2015 N A F Kamarul Zaman, K D Kanniah, and D G Kaskaoutis, “Estimating Particulate Matter using satellite based aerosol optical depth and meteorological variables in Malaysia,” Atmos Res., vol 193, no October 2016, pp 142–162, 2017 A Jamil, A A Makmom, P Saeid, R M Firuz, and R Prinaz, “PM10 monitoring using MODIS AOT and GIS, Kuala Lumpur, Malaysia,” Res J Chem Environ., vol 15, no 2, pp 1–5, 2011 T Kanabkaew, “Prediction of hourly particulate matter concentrations in Chiangmai, Thailand using MODIS aerosol optical depth and ground-based meteorological data,” EnvironmentAsia, vol 6, no 2, pp 65–70, 2013 K Lasko, K P Vadrevu, and T T N Nguyen, “Analysis of air pollution over Hanoi, Vietnam using multi-satellite and MERRA reanalysis datasets,” PLoS One, vol 13, no 5, p e0196629, May 2018 V V Salomonson, W Barnes, and E J Masuoka, “Introduction to MODIS and an Overview of Associated Activities,” in Earth Science Satellite Remote Sensing: Vol 1: Science and Instruments, J J Qu, W Gao, M Kafatos, R E Murphy, and V V Salomonson, Eds Berlin, Heidelberg: Springer Berlin Heidelberg, 2006, pp 12–32 J M Jackson et al., “Suomi-NPP VIIRS aerosol algorithms and data products,” J Geophys Res Atmos., vol 118, no 22, pp 12673–12689, 2013 H Yu, R E Dickinson, and M Chin, “Annual cycle of global distributions of aerosol optical depth from integration of MODIS retrievals and GOCART model simulations,” J Geophys Res Atmos., no February, 2003 Y Xue, H Xu, J Guang, L Mei, J Guo, and C Li, “Observation of an agricultural biomass burning in central and east China using merged aerosol optical depth data from multiple satellite missions,” Int J Remote, no November, pp 37–41, 2014 Y Xue, H Xu, and L Mei, “Merging aerosol optical depth data from multiple satellite missions to view agricultural biomass burning in Central and East China,” Atmos Chem Phys Discuss., pp 10461–10492, 2012 F Mélin, G Zibordi, and S Djavidnia, “Development and validation of a technique for merging satellite derived aerosol optical depth from SeaWiFS and MODIS,” Remote Sens Environ., vol 108, pp 436–450, 2007 J Guo, X Gu, T Yu, T Cheng, H Chen, and D Xie, “Trend analysis of the aerosol optical depth over china using fusion of MODIS and MISR aerosol products via [16] [17] [18] [19] [20] adaptive weighted estimate algorithm,” vol 8866, pp 1–8, 2013 H Xu et al., “A consistent aerosol optical depth ( AOD ) dataset over mainland China by integration of several AOD products,” Atmos Environ J., vol 114, pp 48–56, 2015 M Nirala, “Technical note: Multi-sensor data fusion of aerosol optical thickness,” J Remote Sens., vol 29, no 7, pp 2127–2136, 2008 X Qifang, Z Obradovic, H Bo, L Yong, A Braverman, and S Vucetic, “Improving aerosol retrieval accuracy by integrating AERONET, MISR and MODIS data,” 2005 7th Int Conf Inf Fusion, FUSION, vol 1, pp 654–660, 2005 L Li, R Shi, L Zhang, J Zhang, and W Gao, “The data fusion of aerosol optical thickness using universal kriging and stepwise regression in East China,” Remote Sens Model Ecosyst Sustain XI, vol 9221, pp 1–11, 2014 B Lv, Y Hu, H H Chang, A G Russell, and Y Bai, “Improving the Accuracy of Daily PM2.5 Distributions Derived from the Fusion of Ground-Level Measurements with Aerosol Optical Depth Observations, a Case Study in North China,” Environ Sci Technol., vol 50, no 9, pp [21] [22] [23] [24] [25] 4752–4759, 2016 EARTHDATA, “Level-1 and Atmosphere Archive and Distribution System, Distributed Active Archive System (LAADS DAAC).” [Online] Available: https://ladsweb.modaps.eosdis.nasa.gov/ NOAA, “Comprehensive Large Array-data Stewardship System, National Oceanic and Atmospheric Administration (CLASS NOAA).” [Online] Available: https://www.avl.class.noaa.gov/saa/products/welcome P V Ha, N T N Thanh, B Q Hung, P Klein, A Jourdan, and D Laffly, “Assessment of georeferencing methods on MODIS Terra / Aqua and VIIRS NPP satellite images in Vietnam,” in 2018 10th International Conference on Knowledge and Systems Engineering (KSE), 2018, pp 282–287 T Vinh et al., “Satellite Aerosol Optical Depth over Vietnam - An Analysis from VIIRS and CALIOP Aerosol Products.” pp 499–522, 2018 T T N Nguyen et al., “Particulate matter concentration mapping from MODIS satellite data: a Vietnamese case study,” Environ Res Lett., vol 10, no 9, p 95016, 2015 ... 0.4476 For fusion data, both methods offer a better data coverage and data quality Data coverage of fusion methods not differ significantly, with a value of 48.5% However, the correlation between fusion. .. of data coverage Figure depicts the monthly averaged of data coverage original data and fused data for the period from 2012 to 2016 The data coverage of an image is calculated as the number of. .. are mean of ground AOD and satellite AOD, ݔ௧ , ݕ௧ are ground and satellite AOD at time t, and n is the number of samples IV Fig Data coverage of original images and fusion image over Vietnam