Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 82 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
82
Dung lượng
1,61 MB
Nội dung
Chuyên ngành: K T : 8.52.01.17 Công N 08 2021 -HCM PGS.TS 1: 2: Tr ngày 15 tháng Thành ph n H i ng i h c Bách Khoa, HQG Tp HCM 202 ng nh giá lu n v n th c s g m: .- Xác nh n c a Ch t ch H i ngành sau lu n v n ã ng h giá LV c s a ch a (n u có) lý chuyên i - - NHI M V LU MSHV: 1970237 04/09/1996 Chuyên ngành: I Bình : 8520117 TÀI : Phân II DUNG : báo - III : 22/02/2021 IV : 13/6/2021 V PGS PGS Tp HCM, ngày PGS.TS tháng 21 PGS ii L I Ch c, nh E , vào công v cô Lê t i C th nh cô n gi C nh ch d v ng hành c n môn, tr nh M cd Tp HCM, ngày 31 tháng 21 iii TÓM T T LU -Exponential V iv ABSTRACT This study would present an enhancement of the forecast process that has been used in a jewelry retailer Group is TSCZ in Ho Chi Minh area which is the highest proportion Thus, TSCZ is chosen in this study Due to basic characteristics of historical time series, appropriate theoretical forecast models are used Comparing SARIMA and Holt-Exponential Winter's Smoothing techniques in order to provide high-accuracy customer transaction forecasts They would be ranked by comparing forecast accuracy and forecast bias to find out which one is the best forecast model for the case study After applying the solution, the forecast accuracy was increased by 10% The results of this study could be applied to other group with some necessary modifications The findings would assist in more accurate financial planning and budgeting when the demand forecast was done better v L I Tôi Phân Tp HCM, ngày 31 tháng 21 vi M CL C ii iii TÓM iv ABSTRACT v vi vii ix x xi 1: 1.1 1.2 1.3 1.4 2.1 2.1.1 2.1.2 Phân 2.1.3 -AR) 2.1.4 - MA) 2.1.5 Mơ hình ARMA ( Autoregressive - Moving average) 2.1.6 2.1.7 Mơ hình ARIMA (Autoregressive Intergrated Moving Average) 2.1.8 2.1.9 2.1.10 forecast bias FB 10 2.1.11 10 2.1.12 11 2.2 2.2.1 2.2.2 2.2.3 2.2.4 2.2.5 11 13 13 13 14 14 15 vii 3.1 3.1.1 3.1.2 3.1.3 3.1.4 3.1.5 3.1.6 15 15 16 17 20 20 22 24 4.1 4.1.1 4.1.2 24 24 24 4.2 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 4.2.6 4.2.7 4.2.8 26 26 27 29 30 31 34 36 42 4.3 44 4.4 hàng khác 45 46 5.1 46 5.2 47 49 51 51 63 viii DANH SÁCH B NG BI U B ng 2.1 13 B ng 3.1 23 B ng 4.1 25 B ng 4.2 29 B ng 4.3 -11/2018 29 B ng 4.4 33 B ng 4.5 33 B ng 4.6 38 B ng 4.8 Winter 44 B ng 4.9 B ng 5.1 - 45 47 ix # First we penalize higher values by taking log df_log = np.log(df['SL']) df_log.plot(figsize=(15, 6)) plt.show() # Take 1st differencing on transformed time series df_log_diff = df_log - df_log.shift(1) # yt, yt-1 df_log_diff.dropna(inplace=True) # drop NA values df_log_diff.plot(figsize=(15, 6)) test_stationarity(df_log_diff.dropna()) #.dropna(inplace=True)) plt.figure(figsize=(15,9)) plt.subplot(211) plot_acf(df_log_diff, ax=plt.gca(), lags = 30) plt.subplot(212) plot_pacf(df_log_diff, ax=plt.gca(), lags = 30) plt.show() df_log.sort_index(inplace=True) # Using decomposition method to decompose time series from pylab import rcParams rcParams['figure.figsize'] = 16, decomposition = sm.tsa.seasonal_decompose(df_log, model = 'additive') fig = decomposition.plot() plt.show() KH BÁO ARIMA Auto ARIMA # Divide into train and validation set train = df_log_diff[:int(0.75*(len(df_log_diff)))] 56 valid = df_log_diff[int(0.75*(len(df_log_diff))):] train.plot() valid.plot() # Build ARIMA model using Auto Arima from pmdarima import auto_arima model = auto_arima(train,start_p=0,d=1,start_q=0,max_p=10,max_d=5,max_q=10,start_P=0,D =1,start_Q=0,max_P=10,max_D=5,max_Q=10,m=12, seasonal=True, trace=True, error_action='ignore', suppress_warnings=True) model.fit(train) forecast = model.predict(n_periods=len(valid)) forecast= pd.DataFrame(forecast,index = valid.index,columns=['Prediction']) #plot the predictions for validation set plt.plot(df_log_diff, label='Train') plt.plot(forecast, label='Prediction') plt.title('RMSE: %.4f'% np.sqrt(mean_squared_error(df_log_diff[valid.index.min(): valid.index.max()], forecast))) plt.show() #Build Sarima from statsmodels.tsa.statespace.sarimax import SARIMAX my_order = (0, 1, 2) my_seasonal_order = (2, 1, 0, 12) sarima_model_auto = SARIMAX(df_log_diff, order=my_order, seasonal_order=my_seasonal_order) sarima_model_fit_auto = sarima_model.fit(disp=-1) plt.plot(df_log_diff) 57 plt.plot(sarima_model_fit_auto.fittedvalues, color='red') plt.title('RSS: %.4f'% np.nansum((sarima_model_fit_auto.fittedvalues- df_log_diff)**2)) #Forecast sale next months after choosing sarima model result_sarima_pred =sarima_model_fit_auto.predict(start=train.index[1], end=pd.Timestamp('2019-03-01')) result_sarima_diff=pd.Series(result_sarima_pred,copy=True) result_sarima_diff # Cumulative sum to reverse differencing result_sarima_diff_cumsum = result_sarima_diff.cumsum() # #Adding 1st month value - was previously removed while differencing result_sarima_log = pd.Series(df_log[0], index=result_sarima_diff_cumsum.index) result_sarima_log = result_sarima_log.add(result_sarima_diff_cumsum, fill_value=0) # Take exponential to reverse Log Transform result_sarima_auto = np.exp(result_sarima_log) result_sarima.head() result_sarima_final_auto=result_sarima_auto.to_frame() result_sarima_final_auto.reset_index(level=0, inplace=True) result_sarima_final_auto=result_sarima_final.rename(columns={'index':'Time',0:'Sa rima'}) result_sarima_final_auto.tail() # Compare with the original time series -SARIMA plt.plot(df) plt.plot(result_sarima_auto) plt.title('RMSE: %.4f'% np.sqrt(mean_squared_error(df[valid.index.min(): valid.index.max()],result_sarima_auto[valid.index.min(): valid.index.max()]))) 58 Forcasting Time series Autoregression - AR(p) models the next steps in the sequence as a linear function of the observations at prior time steps Moving Average - MA(q) Autoregressive Moving Average - ARMA(p,q) Autoregressive Integrated Moving Average - ARIMA(p,d,q) # Build AR(p) model ar_model = ARIMA(df_log_diff, order=(2, 1, 0)) moving average ar_model_fit = ar_model.fit() ar_model_fit.summary() plt.plot(df_log_diff) plt.plot(ar_model_fit.fittedvalues, color='red') plt.title('RSS: %.4f'% np.nansum((ar_model_fit.fittedvalues-df_log_diff)**2)) plt.show() # Build MA model ma_model = ARIMA(df_log_diff, order=(0, 1, 2)) ma_model_fit = ma_model.fit(disp=False) plt.plot(df_log_diff) plt.plot(ma_model_fit.fittedvalues, color='red') plt.title('RSS: %.4f'% np.nansum((ma_model_fit.fittedvalues-df_log_diff)**2)) # Build ARMA model arma_model = ARMA(df_log_diff, order=(2, 2)) arma_model_fit = arma_model.fit(disp=False) plt.plot(df_log_diff) plt.plot(arma_model_fit.fittedvalues, color='red') plt.title('RSS: %.4f'% np.nansum((arma_model_fit.fittedvalues-df_log_diff)**2)) # Build ARIMA model 59 arima_model = ARIMA(df_log_diff, order=(2,1,2)) #arima_model_fit = arima_model.fit() plt.plot(df_log_diff) plt.plot(arima_model_fit.fittedvalues, color='red') plt.title('RSS: %.4f'% np.nansum((arima_model_fit.fittedvalues-df_log_diff)**2)) #Build Sarima from statsmodels.tsa.statespace.sarimax import SARIMAX my_order = (2, 1, 2) my_seasonal_order = (0, 1, 0, 12) sarima_model = SARIMAX(df_log_diff, order=my_order, seasonal_order=my_seasonal_order) sarima_model_fit = sarima_model.fit(disp=-1) plt.plot(df_log_diff) plt.plot(sarima_model_fit.fittedvalues, color='red') plt.title('RSS: %.4f'% np.nansum((sarima_model_fit.fittedvalues-df_log_diff)**2)) Compare prediction model based on types of error # Calculate error def model_eval(y, predictions): # Import library for metrics from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error # Mean absolute error (MAE) mae = mean_absolute_error(y, predictions) # Mean squared error (MSE) mse = mean_squared_error(y, predictions) # Calculate the Mean Absolute Percentage Error # y, predictions = check_array(y, predictions) MAPE = np.mean(np.abs((y - predictions) / y)) * 100 # Calculate the Root Mean Squared Error 60 RMSE = np.sqrt(mean_squared_error(y, predictions)) print(pd.DataFrame({ "MAE":[round(mae,3)], "MSE":[round(mse,3)], "MAPE":[round(MAPE,3)], "RMSE":[round(RMSE,3)]})) ar=model_eval(df_log_diff.drop(pd.Timestamp('2012-03 01')), ar_model_fit.fittedvalues) ma=model_eval(df_log_diff.drop(pd.Timestamp('2012-03-01')), ma_model_fit.fittedvalues) arma=model_eval(df_log_diff,arma_model_fit.fittedvalues) arima=model_eval(df_log_diff.drop(pd.Timestamp('2012-03-01')), arima_model_fit.fittedvalues) sarima=model_eval(df_log_diff,sarima_model_fit.fittedvalues) #hw=model_eval(df_log,hw_model_fit.fittedvalues) Convert predicted values to original scale for SARIMA #Forecast sale next months after choosing sarima model result_sarima_pred =sarima_model_fit.predict(start=train.index[1], end=pd.Timestamp('2019-03-01')) result_sarima_diff=pd.Series(result_sarima_pred,copy=True) result_sarima_diff # Cumulative sum to reverse differencing result_sarima_diff_cumsum = result_sarima_diff.cumsum() # #Adding 1st month value - was previously removed while differencing result_sarima_log = pd.Series(df_log[0], index=result_sarima_diff_cumsum.index) result_sarima_log = result_sarima_log.add(result_sarima_diff_cumsum, fill_value=0) result_sarima_log.tail() 61 # Take exponential to reverse Log Transform result_sarima = np.exp(result_sarima_log) result_sarima.tail() # Compare with the original time series plt.plot(df['SL']) plt.plot(result_sarima) #plt.title('RMSE: %.4f'% np.sqrt(sum((result_sarima-df['SL'])**2)/len(df['SL']))) plt.title('RMSE: %.4f'% np.sqrt(mean_squared_error(df[valid.index.min(): valid.index.max()], result_sarima[valid.index.min(): valid.index.max()]))) # np.sqrt(mean_squared_error(y, predictions)) KH BÁO WI # Divide into train and validation set train1 = df[:int(0.75*(len(df['SL'])))] valid1 = df[int(0.75*(len(df['SL']))):] train1['SL'].plot() valid1['SL'].plot() from statsmodels.tsa.holtwinters import ExponentialSmoothing hw_model01 = ExponentialSmoothing(train1, seasonal='mul', seasonal_periods=12).fit() hw_pred = hw_model01.predict(start=valid1.index[0], end=valid1.index[-1]) plt.plot(train1.index, train1, label='Train') plt.plot(valid1.index, valid1, label='Valid') plt.plot(hw_pred.index, hw_pred, label='Holt-Winters') plt.legend(loc='best') plt.title('RMSE:np.sqrt(mean_squared_error(df[valid1.index.min(): valid1.index.max()], hw_pred))) predictions_hw = pd.Series(hw_pred, copy=True) predictions_hw.tail() # these are fitted values on the transformed data 62 Convert predicted values to original scale for Winter #predict next months after choosing winter model result_hw_pred = hw_model01.predict(start=train.index[0], end=pd.Timestamp('2019-03-01')) predictions_result_hw = pd.Series(result_hw_pred, copy=True) predictions_result_hw.head() # these are fitted values on the transformed data result_hw_final=predictions_result_hw.to_frame() result_hw_final.reset_index(level=0, inplace=True) result_hw_final=result_hw_final.rename(columns={'index':'Time',0:'HW'}) result_hw_final=result_hw_final.drop(index=0) result_hw_final.head() T NG H P K T QU CHUY N V FILE EXCEL df_final=pd.merge(pd.merge(df,result_hw_final,on=['Time'],how="right"),result_sar ima_final_auto,on=['Time']) df_final df_final['HW']=round(df_final['HW'],0) df_final['Sarima']=round(df_final['Sarima'],0) df_final from google.colab import files df_final.to_excel('df_final.xlsx') files.download('df_final.xlsx') Ph l c B: Ph m - ng d n s d ng GG Colab ng d n kh i t o thao tác ng d ó s n e-mail Google ng d ng Google Drive t ng d ng Google Colab t o notebook m i: + Truy c p vào Google Drive 63 + New > More + Nh p vào tìm ki p ch n > Trên c a s m i hi n lên, nh p ch n Install 64 + Kh i t o notebook m i: Drive > New > More > Google Colaboratory > Th m i xu t hi n notebook v a kh i t o ng d n ng d ng Google Colab - Thao tác t p: t tên file: File > Rename notebook ho c nh p tr c ti p tên c a notebook 65 : Ch n File > Save ho c Ctrl + S + T o b n sao: Ch n File > Save a copy in Drive ho c t o b n tr c ti p Google Drive + Xóa notebook: Ch n File > Move to trash ho c xóa tr c ti p Google Drive + T i v : Ch n File > Download py / Download ipynb N u ch nh d ng py, t p t i v s m ns N u ch c vi nh d ng ghi gi nguyên code nh d ng ipynb, t p t i v s gi Google Colab c n ng d ng thích h cm n code liên t c Các cell m t p N u không t p t i v i d ng py + Chia s quy n xem, bình lu n, ch nh s a: Nh p ch n Share th c hi n l a ch n quy n chia s - Thao tác cell + Thêm cell code ho c text: Ch n Insert > Code cell/Text cell ho c rê chu n i c a m t cell ch n lo i cell mu n thêm 66 + Ch n cell: Nh n gi Ctrl nh p vào m t ho c nhi u cell mu n ch n + Di chuy n cell: Ch n cell > Ctrl + X ho c Nh p chu t ph i vào cell > Cut cell > Nh p vào vùng mu n di chuy n > Ctrl + V + Sao chép cell: Ch n cell > Ctrl + C ho c Nh p chu t ph i vào cell > Copy cell > Nh p vào vùng mu n t o b n > Ctrl + V p chu t ph i vào cell > c nh p ch n phía cu ng + Ch y code cell: ta ch n Insert ch n l nh ch y theo nhu c u ho c s d ng t h p phím Cell hi n t i (Run the focused cell): Ctrl + Enter ho c nh n nút c a ng T t c cell (Run all): Ctrl + F9 T tc T t c cell phía sau (Run after): Ctrl + F10 c l a ch n (Run selection): Ctrl + Shift + Enter n trình ch l nh ch y ho c nh n nút nt tc h y l nh ch y c ng 67 c k t qu : Khi th c hi n ch y code cell, ng d ng s hi n th m t khung k t qu ng T Khi code sai ho c l c k t qu ch y c a cell phát hi n l i sai dòng nào, l nh s a code l i - T i t p t ngu n khác lên notebook: + Nh p ch n + Nh p ch n ti i logo c a ng d xem outline c a notebook m khung ch a t p, ngu n t i t p lên 68 i dùng có th t i t p t máy tính t c t i lên ch gian nh ho c t Google Drive Các c s d ng t m th i s b xóa sau m t kho ng th i nh n s d ng Google Colab Các th c m c khác mà id ng m c ph i s n m m c Help > FAQ 69 Ngày sinh: 04/09/1996 Chí Minh Email: ntduong49@gmail.com Bách Khoa quy - IV Q TRÌNH CƠNG TÁC trí: Sourcing specialist (Chun viên mua hàng) 70 ... th c hi trang s c v tài t i phòng cung ng c a m t công ty bán l a hàng bán l toàn qu c - Ph m vi s li u: thu th p s li - Th c hi n s li u d báo cho tháng v i lead time d báo tháng - xác c a s... Pha 2: Kh o sát mơ hình d báo Pha 3: Chuy n i d li u ki m ch ng mơ hình 27 Sau l a ch c mơ hình d báo v i b ch s phù h p Ch y mơ hình d báo cho tháng k ti p so sánh v i d li u l ch s xác FA c Quy... sát l i t t c mơ hình d báo cho b d li u m i - L a ch - Có th - D báo v i lead time tháng - K t qu báo phù h p nh t báo s i nhóm hàng khu v c xác l tốn , 4.1.2 , L a ch n mơ hình gi i pháp [15]