eee 70 Figure 4.8: The ANN model compares the forecast water level value and the actual water Figure 4.9: LSTM model water level prediction compared to test dafa...: -‹- 72Figure 4.10: T
INTRODUCTION .- S22: 11212112 1E Hee 2 1.1 Introduction to River Water LeYelÌS che 2 1.2 Challenges in Measuring and Forecasting Water Level
Applying Deep Learning in Water Level Prediction
In general, water levels can be estimated using physics-based methods such as numerical modeling which is mainly used for 2D and 3D water level estimation In addition, recently, multi-source satellite image data has been used to monitor water level changes in many different areas[1] However, physics-based methods require complex calculations with many parameters It can cause uncertainty in results and time consumed in water level estimation.
In recent years, advances in information technology and artificial intelligence have opened new opportunities for research and the application of more effective prediction models Traditional models such as ARIMA have been used and trusted for decades based on theoretical and empirical foundations But although ARIMA can make accurate predictions in some cases, it also has many limitations, especially when faced with large and complex data.
But now, with the advancement of deep learning, we can no longer rely solely on models based on traditional statistical assumptions Models such as artificial neural networks (ANN) are capable of self-learning from data through adjusting the weights of each neuron in the network LSTM and GRU, which are variants of sequential neural networks, are specifically designed to process time series data, which is well suited for river level prediction based on historical data Meanwhile, XGBoost, a decision tree-based boosting algorithm, has demonstrated superior performance in many prediction problems.
These models and algorithms not only improve prediction accuracy but also allow us to process large amounts of data in real time, which was previously unimaginable The combination of artificial intelligence, machine learning and massive computing has
Applying AI to forecast Saigon river water level ushered in a new era in river level prediction and management, helping scientists and water resource managers make informed decisions faster and more accurate.
Scope and Objective of the Research - 0 cece eteeeetteeneeeees 5
The topic "Applying AI to forecast Saigon river water level" focuses on analyzing and predicting Saigon River water level based on data collected at Phu An station.
By placing IoT sensors based on automatic telemetry technology at Phu An station observation locations, dynamic water levels can be periodically observed After identification, a large amount of historical data on water levels was collected How to exploit the value of these historical water level data, especially grasping water level trends for accurate predictions is still a topic to research and explore in this thesis.
Building a mathematical model that considers factors (e.g., rainfall, temperature, riverbed, irrigation projects, and other physical factors) to capture and analyze water level trends is too difficult complex and difficult Therefore, to predict water levels, the more mainstream methods are to use historical water level data to build models using statistics and machine learning techniques.
Specifically, the data used in the study includes two main columns: collection time("datetime") and corresponding water level ("Wwaterlevel") Based on this data, the study will investigate and compare the performance of various prediction models, includingARIMA, ANN, LSTM, GRU and XGBoost, to determine the most suitable model for predicting river water level under given conditions and data.
Applying AI to forecast Saigon river water level
“oy a Q ích sò AN KHANH 2
‘ fe g VÂN, Ọọ j my Nha Hang ni Voi Vàng \ Ÿ # Nhà hát Thành ` § hi ‘
) 8 phð Hồ Chí Minh erland Yen ` cro Bến Thành ọ a
Bảo tàng Mỹ thuật ys © Thành phố Hồ Chí Min be
Higùchi Sushf & Grill ba Dam đà nude sốt mì lạnh soba- Nha han hiển đại - Nội that.
Bùi Thiện Ng UBND phường oS An Loi Dong f AN LỢI DONG
Figure 1.2: Location of Phu An water level monitoring station.
The main purpose of the research is to propose the water level forecast model with the highest accuracy from the ARIMA, ANN, LSTM, GRU, XGBoost models that we have researched, aiming to aid flood warning agencies locally, river traffic and people in flooded areas of the city The core of this research is to build a web service application using advanced deep learning techniques, leveraging real-time data sources This system will empower various organizations, agencies, and individuals by providing them with important insights into future water levels along the Saigon River These forecast results will enable them to make informed decisions and take appropriate actions tailored to their specific needs and goals Specifically, forecasting possible floods, ensuring boats can safely cross bridges or predicting high tides, our project serves as an important tool to improve safety and effective in the region.
Applying AI to forecast Saigon river water level
LITERATURE REVIEW nhu Hee 7 2.1 Related researches nh HH HH HH nh nh tt 7 2.2 Time series đafa cọ SH HH HH HH HH HH HH HH hp 9 2.2.1 Introduction to Time Series Forecasting - cà càccicecre 9 2.2.2 Components of time series analysis{[9] .- ¿- -¿- 22222322 *22 2z xsxsvxsexes 10 2.2.3 Decomposition models [1(] - - 5: 2: 22 *22*22*3212*2E12EE2EESEsrsrkey 11 2.2.3.1 Decomposition based on rates of change - 5ò: + cà: 11 2.2.3.2 Additive vs multiplicative decomposifion -.- 5s: ccc>: 12
Correlation funcfẽ0n úc ch Hee 12 2.3 The ARIMA model [12] 52:2: S: S222 E21212121 1212111 tre 13 2.3.1 Introduction to ARIMA Model - St St sehrkererrree 13 2.3.2 AR (p) model c2: 20121121 121111111111111111111111111111 11111 11 11 11k Hay 14 2.3.3 MA (q) model cece 22122321121 121 351151151151 151 1511111111011 11 01 01 11 11 He 14 2.3.4 I(d) Integrafion - -.- S 2n S2 HH1 11111211111111 11111 HH 15 2.3.5 ARIMA model - c2 2: 21121121151 151 121111111111 111111 111211111 11 11 11 11 tra 15 2.3.6 How to find the order of differencing (d) in ARIMA model
ACF - Autocorrelation Function is a graph or histogram that represents the degree of correlation between values in a time series and values at different lags ACF is commonly used to evaluate correlation in temporal data and helps identify correlation cycles or patterns in time series.
Important characteristics of the ACF chart include:
Horizontal axis (X-axis): The horizontal axis of the ACF chart represents lag values Each delay corresponds to a time interval between the values being compared.
Applying AI to forecast Saigon river water level
Vertical axis (Y-axis): The vertical axis of the ACF chart represents the correlation value. The ACF value can range from -1 to 1, with | indicating a completely positive correlation, -l indicating a completely negative correlation, and 0 indicating no correlation.
Intended use to determine the overall level of correlation and cycle in a time series.
PACF (Partial Autocorrelation Function) is a variation of ACF, which measures the correlation between two values in the same time series and eliminates the influence of values at intermediate lags PACF helps determine the direct correlation between values separated by a fixed delay, without being affected by intermediate delays It is often used to define the overall autocorrelation model.
ARIMA stands for Autoregressive Integrated Moving Average model It belongs to a class of models that explain a given time series in terms of its own past values
(including its own lags and lagged forecast errors) This equation can be used to predict future values ARIMA models can be applied to any "non-seasonal" time series with patterns, not just random white noise.
The ARIMA model is specified by three order parameters: (p, d, q), where: e pis the order of the AR term. e q is the order of the MA term. e dis the differential required to make the time series stationary.
ARIMA: Non-Seasonal Autoregressive Integrated Moving Average.
SARIMAX: Seasonal ARIMA with exogenous variables.
Applying AI to forecast Saigon river water level
If the time series has a seasonal pattern, we need to add a seasonal term, which becomes SARIMA, short for Seasonal ARIMA.
2.3.2 AR (p) model p is the order of the Autoregression (AR) term It refers to the number of lags of Y used as a predictor based on the dependency relationship between the current observation and the observations in the previous period.
An autoregressive (AR) model is a model in which Yt depends only on its own lag. That is, Yt is a function of the lags of Yt It is depicted by the following equation.
Where: e Yt-1 is the lag1 of the series, e B1 is the coefficient of lag! that the model estimates, and e ais the intercept term, also estimated by the model.
MA(q) Moving Average — a model that uses the dependency between an observation and a residual error from a moving average model applied to lagged observations A moving average component depicts the error of the model as a combination of previous error terms The order q represents the number of terms to be included in the model.
In the Moving Average (MA) model, Yt depends only on lagged forecast errors It is described by the following equation:
Applying AI to forecast Saigon river water level
Where the error terms are the errors of the autoregressive models of the respective lags.
The errors Et and E(t-1) are the errors from the following equations:
I(d) Integration, is a process of differentiating time series observations, which involves subtracting the current observation from the observation at the previous time step This is done to achieve stationarity in the time series Robustness is important because the 'Autoregressive' (AR) component in ARIMA is essentially a linear regression model that relies on its own lag values as predictors For linear regression models to work effectively, the predictors must be uncorrelated and independent To achieve this, the difference method is often used by subtracting the previous value from the current value. Depending on the complexity of the time series, various steps may be required The value of d represents the minimum number of different steps required to make string stationary.
If the time series has stopped then d = 0.
An ARIMA model is one where the time series was differenced at least once to make it stationary, and we combine the AR and the MA terms So, the equation of an
Y,=a+fPYi-1 +2Ÿ,_a+ +pŸ,T—p€y + Pieri + Hr€1-2t thg€r-q
Applying AI to forecast Saigon river water level
Predicted Yt = Constant + Linear combination Lags of Y (upto p lags) + Linear Combination of Lagged forecast errors (upto q lags).
2.3.6 How to find the order of differencing (d) in ARIMA model
As previously mentioned, difference is a crucial step in making a time series stationary However, it's essential not to over-difference the series, as this can still lead to stationarity, which may negatively impact model parameters.
Determining the right order of differencing is key The appropriate order of differencing is the minimum needed to obtain a near-stationary series that fluctuates around a defined mean, and the autocorrelation function (ACF) plot approaches zero quickly.
If autocorrelations remain positive for an extended number of lags (10 or more), further differencing is necessary Conversely, if the lag 1 autocorrelation is overly negative, it suggests over-differencing.
When uncertain between two orders of differencing, choose the order that results in the least standard deviation in the differenced series.
To illustrate these concepts, consider the following example:
First, I will assess whether the series is stationary using the Augmented Dickey Fuller test (ADF Test) from the stats models package The reason for this is that difference is only required for non-stationary series; otherwise, no difference is needed
The null hypothesis (Ho) in the ADF test posits that the time series is non- stationary If the p-value of the test is less than the significance level (0.05), we reject the null hypothesis, concluding that the time series is indeed stationary.
In our case, if the p-value is greater than 0.05, we proceed with determining the order of differencing.
Applying AI to forecast Saigon river water level
An Artificial Neural Network (ANN) is a computational model inspired by the human brain’s neural structure It consists of interconnected nodes (neurons) organized into layers Information flows through these nodes, and the network adjusts the connection strengths (weights) during training to learn from data, enabling it to recognize patterns, make predictions, and solve various tasks in machine learning and artificial intelligence.