Google Trends search index data, for example, has been found toboost influence and have an impact on business results across a wide range of sectors.This thesis contributes to the field
Time-serles TechniQUe€S - -s + vn ng ng nưệt 3
Statistical Modelẽing .- Ă HS HH kg 3
Statistical modeling refers to the data science process of applying statistical analysis to datasets A statistical model is a mathematical relationship between one or more random variables and other non-random variables The application of statistical modeling to raw data helps data scientists approach data analysis in a strategic manner, providing intuitive visualizations that aid in identifying relationships between variables and making predictions The most common statistical modeling methods for analyzing this data are categorized as either supervised learning or unsupervised learning Some popular statistical model examples include logistic regression, time-series, clustering, and decision trees.
Auto-Regressive Integrated Moving Average (ARIMA) model is the most widely used and accurate short-term time series forecasting method for univariate time series The model holds that the current time sequence values are linearly related to the past time sequence values and the amount of external disturbance, that is, the model contains both autoregression item (AR) and moving average item (MA) The premise of ARMA is that the time series is stationary If there is a trend in the series, it must not meet the stationary condition, so it can be stabilized by difference The mathematical expression is as follows:
Xt = 1Xt—1 TP + @pXt—p + ỗ + Ut =P Ó1U¿_1 + vee + OgUt-q (2.1)
Where @j, ,Qp is the autoregressive coefficient, ỉy, , 0, 1s the moving average coefficient, ổ is the constant term, 1¿, , ¿_„ 1s the residual series, p and q q representing the autoregressive and the moving average order respectively, d is the sequence difference degree, and the model is denoted as ARIMA (P, D, Q).
Machine Learning Modeling - -. ô<< << x++sex+seexesess 4 2.1.3 Combining Statistical and Machine Learning Modeling
Machine Learning has become one of the mainstays of information technology over the last two decades Machine learning is a branch of statistics, data mining, computer science, engineering, and other disciplines concerned with data modeling for prediction and inference purposes [6] Machine learning classifiers are divided into three groups, namely supervised, unsupervised, semi-supervised machine learning.
Supervised machine learning is occurred when the data includes both the input and output values It is feasible to minimize the error by calculating the difference between the model's projected value and the actual output value and changing the model's weights and biases [7] Some methods used in supervised learning include neural networks, Naive Bayes, linear regression, logistic regression, random forest, support vector machine (SVM), and so on Regression and classification problems are solved via supervised learning Data may be classified into a class using classification, such as red or automobile Using regression, it is possible to estimate a numerical output value, such as the number of cookies that will be sold.
Unsupervised learning is employed when there is no output data Instead, the program analyzes the data for patterns on its own Unsupervised learning is used to locate clusters and place previously undetected data in an appropriate location [7]. This may be employed in sales patterns, for example, if a consumer buys milk, he is more likely to buy eggs.
Between supervised and unsupervised learning, semi-supervised learning is a good compromise It guides categorization and feature extraction from a larger, unlabeled data set using a smaller labeled data set during training.
This section provides an overview of the deep learning approach, which is the main topic of this thesis Deep learning is comprised of multiple “units” modeled on human neurons, and these units are linked to process calculations Deep learning refers to a group of learning methods that aim to model data using complicated architectures using multiple non-linear transformations The neural networks, which are combined to form deep neural networks, are the fundamental building blocks of deep learning [8] There are several types of architectures for neural networks, such as the multilayer perceptrons, which are the oldest and simplest ones, the Convolutional Neural Networks (CNN), particularly adapted for image processing, the recurrent neural networks, used for sequential data such as text or times series.
A neural network works similarly to the human brain’s neural network A
“neuron” in a neural network is a mathematical function that collects and classifies information according to a specific architecture The network bears a strong resemblance to statistical methods such as curve fitting and regression analysis A neural network contains layers of interconnected nodes Each node is a known as perceptron and is similar to a multiple linear regression The perceptron feeds the signal produced by a multiple linear regression into an activation function that may be nonlinear.
A multilayer perceptrons (also known as a neural network) is a structure made up of numerous hidden layers of neurons in which the output of one layer's neurons becomes the input of the next layer's neurons Furthermore, a neuron's output might be the input of a neuron in the same layer or a neuron from a previous layer (this is the case for recurrent neural networks) Depending on the sort of issue we have at hand: regression or classification, we can use a different activation function on the last layer, termed the output layer, then we did on the hidden layers A neural network with three input variables, one output variable, and two hidden layers is depicted in the diagram below.
Input Hidden Output layer layer layer
Multilayer perceptrons are not well suited to various forms of data, particularly photographs Indeed, they are designed to work with vectors as input data, therefore, to use them with photos, we must first convert them to vectors, losing any spatial information included in the images, such as shapes.
Convolutional neural networks (CNN) have revolutionized image processing by eliminating the need for human feature extraction For pictures with three RGB color channels, CNN act directly on matrices or even tensors CNNs are currently often used for image classification, image segmentation, object identification, and face recognition, among other applications.
Recurrent Neural Networks (RNN) are used to infer sequential data such as text or time series In the 1980s, the simplest recurrent networks were constructed, in which a hidden layer at time t is dependent on the entry at time t, xt, as well as the same hidden layer at time t-1 or the output at time t-1 [8] RNNs have been successfully used in a variety of applications in recent years, including speech recognition, translation, and image captioning This achievement is largely owing to the abilities of LSTMs (Long Short-Term Memory), a type of recurrent neural network that will be discussed in Chapter 3.1.2.
Artificial neural networks (ANN) can be defined as a highly connected array of elementary processors called neurons One major application area of ANNs is forecasting for both researchers and practitioners [9] ANNs have high modeling flexibility and adaptability, as they can control the learning process On the other hand, it can adjust its parameters if new input data are available Using ANN in studies has its own benefits to some extent.
ANN should be trained before the forecasting process is implemented Neural networks give weights to connections during the training process Forecasting problems arise in so many different disciplines and the literature on forecasting that uses ANNs is scattered in so many diverse fields It is hard for a researcher to be aware of all the work done to date in the area [9] ANN models are increasingly being used as a decision aid The number of areas such as manufacturing, marketing, and retailing used it Several authors have given comprehensive reviews of neural networks, examples of their applications, and comparisons with the statistical approach.
2.1.3 Combining Statistical and Machine Learning Modeling
Combining Statistical and Machine Learning Modeling are often developed because they can combine the capabilities of many models to create a new forecasting method As result, “hybrid models” occur and several of them are thought to be more efficient than pure statistical and artificial intelligence models In the fashion forecasting literature, hybrid approaches frequently mix distinct schemes like ANN, and ELM with other techniques like statistical models, the grey model (GM), and so on [10] According to Liu [10], Vroman [11] employ a NN model with corrective coefficients of the seasonality feature for mean-term forecasting horizon They claim that their suggested hybrid technique can prediction of short-term prices as well. Furthermore, predicting using the extreme learning machine (ELM) is speedy Its
"fast speed" makes it a strong candidate to be a component model for more complex hybrid models for fashion forecasting, even though it is not ideal due to its unstable character They claim that their suggested approach outperforms classic ARIMA models as well as two newly developed neural network models for forecasting apparel sales.
Furthermore, based on the historical sales volume sequence, forecasting the sales volume with the appropriate accuracy in the following period can reduce costs.
To increase accuracy, Zhang [12] evaluated the characteristic of the sales volume sequence and incorporated external meteorological data such as air quality, peak temperature, and lowest temperature XGBoost is chosen as the training model to predict the time series of sales volume and would be compared with other Results showcase the dominance of XGBoost performance measured in MSE, RMSE.
Related WOFKs - HH nh 8
Since there have been several papers that have examined web search data in various fields The topic we are following is about using Google Trends data to collaborate with historical sales to improve forecasting.
In 2011, Choi [13] demonstrate how to forecast the near-term values of economic indices using search engine data Automobile sales, jobless claims, trip destination planning, and consumer confidence are all examples Furthermore, in
2018, Robin [14] investigates the use of Google Trends in improving monthly e- commerce retail forecasts in France Robin [14] employs a SARIMA model with monthly retail trend surveys, a SARIMA model with Google Trends series, and a composite model with the weighted average forecast from the different models in his research Google queries, as evaluated by Google Trends aid in improving the predictive accuracy of the final model, which is created by combining separate models.
In 2018, Boone [15] investigate if Google Trends might help a specialty food online shop improve the accuracy of sales estimates The study's concept is to investigate the link between a search term and a purchase decision for a specific product in an online sale The search key terms are often used to describe a product category in time-series sales forecasting at the stock-keeping unit (SKU) level The results show that there are improvements when including Google Trends data into forecasting models However, they mention the difficulties in discovering and choosing keywords that might have an impact on forecasting products In 2019, Silva
[16] use Google Trends in fashion retail forecasting, comparing parametric and non- parametric forecasting models to see which model is the best for predicting Burberry sales using Google Trends In 2021, the work from Al-Basha [17], use plenty of metrics to evaluate the forecasting accuracy from forecasting weekly retail sales.However, the results turn to no statistically significant difference in the predictions made by a model that uses only real, and model use real data and Google Trends data.
In summary of the related work, several papers show that the power of internet search data could use to improve prediction in various products With machine learning approaches such as neural networks for regression, the performance can be improved better We follow the work of [15] Boone in some food and beverage In addition, we follow the work from Al-Basha [17] to evaluate the previous results and work with a new dataset collected from Vietnam, to examine if there is any change
Feature EngIneerIng .- cece ce ôsàn HH, 10
Scale Dependent MetrICS - 55-5 Sc SE sseeeeeerseeeree 12
There are some commonly used accuracy measures whose scale depends on the scale of the data These are useful when comparing different methods applied to the same set of data.
The Mean Absolute Error (MAE) is calculated by taking the mean of the absolute differences between the actual values and the predicted values:
Where y; — ¥, is the error in Period t is the difference between the forecast for Period t and the actual demand in Period t.
To avoid the MSE’s loss of its unit we can take the square root of it The outcome is then a new error metric called the Root Mean Squared Error (RMSE):
Where y; — ¥, is the error in Period t is the difference between the forecast for Period t and the actual demand in Period t.
Percentage Error Metrics are scale independent and used to compare forecast performance between different time series However, their weak spots are zero values in a time series Then they become infinite or undefined which makes them not interpretable.
The mean absolute percentage error (MAPE) is one of the most popular used error metrics in time series forecasting It is calculated by taking the average (mean) of the absolute difference between actuals and predicted values divided by the actuals.
Where y; — ¥; is the error in Period t is the difference between the forecast for Period t and the actual demand in Period t.
One kind of percentage error metrics are Weighted Average Percentage Error we also use for comparation WAPE referred to as the MAD/Mean ratio, means Weighted Average Percentage Error It weights the error by adding the total sales:
Where y; — ¥, is the error in Period t is the difference between the forecast for Period t and the actual demand in Period t.
In this chapter, we explain the technique for a comparison experiment that looked at Google Trends' predictive power in retail sales forecasting using three real datasets.
By comparing the performance of the XGBoost and LSTM models before and after including Google Trends as input data to create predictions, the predictive ability of Google Trends in projecting retail sales is investigated The models and data input utilized to make predictions are deviaed in to four experiment.
In experiment 1, we utilize XGBoost and LSTM machine learning models to produce predictions using historical sales data as input This arrangement allows for a comparison of performance between machine learning without Google Trends and machine learning with Google Trends (experiment 2) The Brazilian e-commerce, Becungshop, and Breakfast at the Frat datasets are used in this experiment.
In experiment 2, we extend XGBoost and LSTM to incorporate Google Trends series as a feature for creating sales forecasts Comparing the performance of the XGBoost and LSTM models that just utilize historical sales data as input (experiment
1) with the XGBoost and LSTM models that additionally use Google Trends as input (experiment 2), the extent to which Google Trends data enhances retail sales projections is investigated (experiment 2) The Brazilian e-commerce, Becungshop, and Breakfast at the Frat datasets are used in experiments 1 and 2.
In experiment 3, historical sales and extra transactional data, such as information on shop visits, promotions, price, base price which are used as data input to make predictions using XGBoost and LSTM Experiment 3 use the Breakfast at the Frat dataset to further the research.
Finally, in experiment 4, XGBoost and LSTM predictions are performed utilizing sales history, extra transactional data, and Google Trends as data input.
Experiment 4 use the Breakfast at the Frat dataset to further the research As a result, comparing trials 3 and 4 provides an extra chance to see how real data mixed with Google Trends affects predicting accuracy.
The given table below describe the sumarry of experiment model for each experiment.
Table 3.1 Description of experiment models
Experiment Brazilian Breakfast at the
2 Sales History and Google Trends
Compare the performance of machine learning models (XGBoost, LSTM) in experiments 1 and 2.
Compare the performance of machine learning models (XGBoost, LSTM) in experiments 3 and 4.
A conceptual flow of the experiment is shown in figure below The flow shows the way we make forecasting process step by step Beginning with collection dataset and ending with visualization those forecasting data During those time, the accuracy measurement will be examined to decide which models perform better.
Collect Process Brazilian E-commerce Dataset by Olist
Breakfast at the Flat Dataset by dunnhumny Becungshop Dataset
Data Pre-Processing Configuration File(s), Python
Training Dataset ¢ Weekly Sales Forecasts By Product Category - Brazilian e-commerce e Weekly Sales Forecasts By Product Category - Becungshop ¢ Weekly Sales Forecasts By Product-Store Combination - Breakfast at the Flat fey
When it comes to the accuracy evaluation, the RMSE, R*, MAPE, WAPE are performance metrics used To present the findings, each of metrics have different ways to evaluate the accuracy, we present four type of metrics performance Each statistic aids in evaluating performance from a different perspective The mathematical formulas for RMSE, MAPE, WAPE, and R? defined below, where y; represents actual sales on time t, 7; represents the forecast on time t, y represents the mean of the observed data, and n represents the number of observations.
RMSE combines the magnitudes of forecast errors into a single measure of model accuracy, allowing you to compare residuals from various models on the same dataset The lower the RMSE, the better the model However, outliers affect the RMSE because the impact of each prediction inaccuracy on the final RMSE computation is proportional to the amount of the error.
The MAPE measure makes interpreting relative error simple and is a very commonly used metric for forecast accuracy, the M stands for mean (or average) and is simply the average of the calculated APE numbers across different periods Since MAPE is a measure of error, high numbers are bad and low numbers are good For example, if the observed dataset comprises 'zero sales' days, which is the situation in the Brazilian ecommerce dataset, an issue with the MAPE metric arises, since this would entail division by zero in the metric computation As a result, +le-6 is added to the denominator of the equation to account for this constraint.