Summary of Mathematics Doctoral Dissertation: Apply Markov chains model and fuzzy time series for forecasting

27 42 0
Summary of Mathematics Doctoral Dissertation: Apply Markov chains model and fuzzy time series for forecasting

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Thesis with the aim of focusing on two main issues. The first is time series modeling by states in which each state is a deterministic probability distribution (normal distribution). Based on the experimental results to assess the suitability of the model. Second, combine Markov chains and fuzzy time series into new models to improve forecast accuracy. Expand the model with high-level Markov chains to be compatible with seasonal data.

MINISTRY OF EDUCATION AND TRAINING VIETNAM ACADEMY OF SCIENCE AND TECHNOLOGY GRADUATE UNIVERSITY OF SCIENCE AND TECHNOLOGY - DAO XUAN KY APPLY MARKOV CHAINS MODEL AND FUZZY TIME SERIES FOR FORECASTING Major: Math Fundamentals for Informatics Code: 62.46.01.10 SUMMARY OF MATHEMATICS DOCTORAL DISSERTATION Ha Noi, 2017 This work is completed at: Graduate University of Science and Technology Vietnam Academy of Science and Technology Supervisor 1: Assoc Prof Dr Doan Van Ban Supervisor 2: Dr Nguyen Van Hung Reviewer 1: …………………………………………………………………… ………………………………………………………………………………… Reviewer 2: …………………………………………………………………… ………………………………………………………………………………… Reviewer 3: …………………………………………………………………… ………………………………………………………………………………… This Dissertation will be officially presented in front of the Doctoral Dissertation Grading Committee, meeting at: Graduate University of Science and Technology Vietnam Academy of Science and Technology At ………… hrs …… day …… month…… year …… This Dissertation is available at: Library of Graduate University of Science and Technology National Library of Vietnam LIST OF PUBLISHED WORKS [1] Dao Xuan Ky and Luc Tri Tuyen A markov-fuzzy combination model for stock market forecasting International Journal of Applied athematics and StatisticsTM, 55(3):109–121, 2016 [2] Đào Xuân Kỳ, Lục Trí Tuyen, va Phạm Quốc Vương A combination of higher order markov model and fuzzy time series for stock market forecasting” In Hội thảo lần thứ 19: Một số vấn đề chọn lọc Công nghệ thông tin truyền thông, Hà Nội, pages 1–6, 2016 [3] Đào Xuân Kỳ, Lục Trí Tuyen, Phạm Quốc Vương, va Thạch Thị Ninh Mô hinh markov-chuỗi thời gian mờ dự báo chứng khoán In Hội thảo lần thứ 18: Một số vấn đề chọn lọc Công nghệ thông tin truyền thông, TP HCM, pages 119–124, 2015 [4] Lục Trí Tuyen, Nguyễn Văn Hung, Thạch Thị Ninh, Phạm Quốc Vương, Nguyễn Minh Đức, va Đào Xuân Kỳ A normal-hidden markov model model in forecasting stock index Journal of Computer Science and Cybernetics, 28(3):206–216, 2012 [5] Dao Xuan Ky and Luc Tri Tuyen A Higher order Markov model for time series forecasting International Journal of Applied athematics and StatisticsTM, vol 57(3):1-18, 2018 Introduction The time series forcasting with preditve variable object X changing over time in order to achieve predictive accuracy is always a challenge to scientists, not only in Vietnam but also globally Because it is not easy to find a suitable probability distribution for this predictive variable object at the point t was born Historical data need to be collected and analyzed, in order to find a perfect fit It is, however, a distribution can only fit with statistics in a particular time in time series analysis, and varies at other certain point of time Therefore, the use of a fixed distribution for the predictable object is not applicable for this analysis For the above mentioned reason, the building of predictable time series forcasting model requires connection and syncognition between historical and future statistics, in order to set up a dependent model between data obtained at present t and in the past t-1, t-2 If the connection X t  1 X t 1   X t 2  p X t  p   t  1 t 1  q t q is set up, we can generate an autoregressive integrated moving average (ARIMA) [15] model This model is applicatable widely for its practical theory and intergrated into almost current statistical software such as Eviews, SPSS, matlab, R, and etc It is, however, many real time sequencing shows that they not change linearly Therefore, model such as ARIMA does not suit R Parrelli pointed it out in [28] that there is a non-linerable connection in economic or financial time series variance indicators The generalized autoregressive conditional heteroskedasticity (GARCH) [25,28] is the most popular non-linerable time series forecasting analysis to mention The limitation of this model lies in the assumption that statistics vary in a fixed distribution (normally standard distribution), while actual statistics shows that distribution is statistically significant [39] (while standard distribution has a balanced variation) Another time series forecasting is Artificial Neural Network (ANN which was developed recently ANN models not based on deterministic distribution of statistics; instead it functions like human brain trying to find rules and pathes to training data, experimental testing, and result summarizing ANN model is usually used for statistics classification purpose [23] More recently, a new theory of statistical machine learning called Support Vector Machine (SVM) serving as answer to forcast and classification which caught attention of scientiests [36,11,31] SVM is applied widely in many areas such as approximate function, regression analysis and forecast [11,31] The biggest limitation of SVM is that with huge training files, it requires enomous calculation as awell as complexity of the linear regession exercise To address the limitations and promote the strengths of exisiting models, a new and trendy research method was introduced which is called Combined Anaysis (CA) ie a combination of of different methods to increase the forecast accuracy Numerrous studies have been conducted based on this method, and many combined models have been published [43,5,6] Some methods uses the Markov chain (MC) as well as hidden Markov (HMM) Refiul Hassan [19] developed a united model by matching an HMM with an ANN and GA to generate forecast a day -1 stock price This model aims to identify similar patterns from historical statistics Then ANN and GA models are used to interpolate the neighbor values ò the defined statistics model Yang [41] combined the HMM model using synchoronous clustering technique to increase the accuracy of the forecasting model The weighted Markov model was used by Peng [27] in predicting and analyzing desease transmission rate in Jiangsu, China These combined models proved to bring practical and meaningful results, as well as increase the accuracy in prediction compared to traditional ones [27,41,19] The above mentioned models, despite having improved significantly in terms of accuracy in prediction, still face difficulties with fuzzy statistics (there are uncertain molecules) To deal with fuzzy statistics, a new research direction was introduced recently, which is called Fuzzy Time Series (FTS) The first result from this theory worth to mention is Song and Chissom [34] These studies focused on improving the Fuzzy Time Series model and finding ways for the forecasting analysis Jilani and Nan combined Heuristic model with Fuzzy Time Series model to improve the model accuracy [24] Chen and Hwang expanded the Fuzzy Time series model into Binary model [14] and then Hwang and Yu developed it into N-scale model to forecast stock indicators [21] In a recent paper [35], BaiQuing Sun has expanded the Fuzzy Time Series model into multi-order to forecast stock price in the future Qisen Cai [10] combined the Fuzzy Time Series model with ant optimization and regession to obtain a better outcome In Vietnam, the Fuzzy Time Series model was recently applied in a number of specific areas, some to mention include the study of Nguyen Duy Hieu and Partners [2] in semantic analysis Additionally, the study of Nguyen Cong Dieu [3,4] combined The Fuzzy Time Series model with techniques to adjust some parameter in maths or specific charactors of statistics aiming to the forecast accuracy The study of Nguyen Cat Ho [1] used sonographic algebra in Fuzzy Time Series model which showed the higher forecast accuracy compared to several existing modesl Up to now, inspite of many new models combining existing one aiming to improve the forecast accuracy, there is a fact that these models are complex yet accuracy not improving Therefore, there may arise some other direction aiming to simplify the model while ensure the forecast accuracy The objective of this dissertation focuses on two key issues Firstly, to modelize time series by states in which each is a deterministic probability distribution (standard distribution) and to evaluate the suitability of the model based on experimental results Secondly, combine Markov chain and new Fuzzy Time series models to improve the forecast accuracy In addition, to expand the advanced Markov chain model to accommodate seasonal statistics The dissertation consists of chapters – Chapter I presents overall study of Markov chain and hidden Marko and Fuzzy Time Series models; Chapter II presents time series modelling into states in which 1) each state is standard distribution vs average i , variance  i2 , i  1, 2, , m with m is the state; 2) states over time followed Markov chain The model, then was tested on VN-Index indicator to evaluate efficiency of model forecast Last chapter presents the analysis of limitations and unmatches between forecasting models and deterministic probability distribution as a motivation for the combined model proposed in Chapter Chapter III presents combined Markov chain and Fuzzy Time Series models in time series forecasting This chapter also presents the expanded and advanced Markov chain with two chain concepts which are conventional higher order Markov (CMC) and improved higher order Markov (IMC) These models, then, were programmed in the R language and tested wit data sets that corresponded exactly with comparision model sets Chapter - Overview & Proposal 1.1 Markov chain 1.1.1 Definitions Consider an economic or material system S with m possible states, denoted by I : I  1, 2, , m System S evolves randomly in discrete time ( t  0,1, 2, , n, ), called Cn and set to a random variable coresponding to the state ò the system S at the time n (Cn  I ) Definition 1.1.1 Random variable sequense ( Cn , n  ) is a Markov chain if and only if all c0 ,c1 , ,cn  I : Pr (Cn  cn | C0  c0 , C1  c1 , , Cn1  cn1 )  Pr (Cn  cn | Cn1  cn1 ) (1.1.1) (iwith a condition this probability makes sense) Definition 1.1.2 Markov chain is considererd comprable if and only if the possiblity in (1.1.1) is not dependent on n and non-comparable in other cases For the time being, we consider the comparable case, in which Pr (Cn  cn | Cn1  cn1 )   ij , And matrix Γ by definition: Γ   ij  To define fully the development of a Markov chain, it is necessary to fix an iniital distribtuion for state C0 , for example, a vector: p  ( p1 , p2 , , pm ), In this chapter, we stop at considering comparable Markov chain which is featured by couple (p, Γ) Definition 1.2.3 A Markov matrix Γ is considered formal if there exists a positive integer k, such that all elements of the matrix are actually positive 1.1.2 Markov chain classification Take i  I and put d (i) is the largest general divisor of a set of intgers n such that  ii( n )  Definition 1.2.4 If d (i)  , state i is considered a revolving cycle d (i) If d (i)  1, then sate i is not revolving Easy to see, if  ii  then i is not revolving However, the opposite is not pretty true Definition 1.2.5 Markov chain of which all its states not revolving is call irrevolving Markov chain Definition 1.2.6 A state i is called reaching state j (written i that  ijn  iCj means i can not reach j j ) if exist an integer n such Definition 1.2.7 State i and j is called inter-connected if i i j and j i , or if i  j We write j Definition 1.2.8 State i is called essential if it connects with every state that it reaches; the opposite is call non-essential Relationship determines an equivelent relationship in state space I resulted in a class division on I The equivalent class contains symbol i denoted by Cl (i) Definition 1.2.9 Markov chain is called not expandable if there is only one equivalent class on it Definition 1.2.10 Subset E of state space I is considered closed if:  ij  1, với i  E  jE Definition 1.2.11 State i  I of Markov chain (Ct ) is considered regressed if exists state i j  I and n  such that  nji  Oppositely, i is called forwarding state (moving) 1.1.3 Markov matrix estimation Consider Markov chain (Ct ), t  1, 2, and suppose to observe n and other states c1 , c2 , , cn Symbols cn  c1 , c2 , , cn generated by random variables C n then the logical function of forwarding probability matrixnis given by: Pr (C n  c n )  Pr (C1  c1 ) Pr  Ct  ct | C t 1  c t 1  n tn  Pr (C1  c1 ) Pr  Ct  ct | Ct 1  ct 1   Pr (C1  c1 ) t   ct 1ct t 2 Define numbers of transfer nij  number of times that state i forwards, follwed by state j in chain C n , then likelihood looks like: k k n L( p)  Pr (C1  c1 )  ij j 1 L( p) with the hiddens are  ij To solve We need to find the maximum rationali 1function this exercise, first we take logarit of L( p) to make a total function aiming to take the derivative easily ( p)  log L( p)  log Pr (C1  c1 )   nij log  ij m i, j Due to   ij  , each i,  i1     ij , take the derivative by parameter: ij nij n   j 2  i1  ij  ij  i1 Given derivative equals to obtained at  ij we have: nij ni1  ˆij ˆi1 therefore nij ˆij  ni1 ˆi1 true with all j  therefore n ˆij  m ij  nij j 1.2 Hidden Markov Model j 1 A HMM includes two basis components: chain X t , t  1, , T consists observations and Ct  i, t  1, , T , i {1, 2, , m} which were generated from those observations In deed, HMM model is a special case of mixed dependent model [16] and Ct which are mixed components 1.2.1 Definition and Symbols Symbols X(t ) C(t ) displayhistorical statistics from point of time to point of time t , which can be summarized as the simpliest HMM model as follows: Pr (Ct | C(t 1) )  Pr (Ct | Ct 1 ), t  2,3, , T Pr ( X t | X(t 1) , C (t ) )  Pr ( X t | Ct ), t  Now we introduce some symbols which are used in the study In case of discrete observation, by definition: pi  x   Pr  X t  x | Ct  i  In the case of continuity, pi ( x) is X t „s probability function range, if Markov chain receives state i at point of time t We symbolize a comparative Markov chain‟s forwarding matrix as Γ with its components  ij defined by:  ij  Pr (Ct  j | Ct 1  i) From now on, m distributes pi ( x) is called dependent dependencies of the model 1.2.2 Likelihood and maximum estimation of likelihood For discrete observation X t , define ui  t   Pr  Ct  i  với i  1, 2, , T , we have: m m (C  i ) Pr ( X  x | C  i ) Pr ( X t  x)   Pr t t t i   ui (t ) pi ( x ) i 1 (1.2.1) For convenience in calculating , fomula (1.2.1) can be re-written in the form of the 1 following matrix:  p1 ( x)    Pr(Xt =x)=(u1 (t), ,u (m) (t))   0     u(t)P( x )10 pm ( x)  1 in which P(x) is diagonal matrix with the i element on the diagonal line pi ( x) On the other hand, by nature of the pure Markov chain , u(t)  u(1) Γt1 with u(1) is an initital distribution of Markov chain, usually denoted with stop distribution as δ Thus, we have Pr ( X t  x)  u(1)Γt 1P( x)1 (1.2.2) Now call LT is the likelihood function of the model with T observe x1 , x2 , , xT then LT  Pr (X( T)  x( T) ) Derived from theT simutaneous probability formula T (T) ( T) (1.2.3) Pr ( X , C )  Pr (C1 ) Pr (Ck | Ck 1 ) Pr ( X k | Ck ), k 1 k 1 We sum on all possible states of Ck , then using the method as the fomula (1.2.2), we have LT   P( x1 )ΓP( x2 ) ΓP( xT )1 If initial distribution δ is the stop distribution of Markov chain, then LT   ΓP( x1 )ΓP( x2 ) ΓP( xT )1 To calculate likelihood easily by algorithm, reduce the number of operations that the 1, , T by computer needs to perform, we define vector α t where t  t (1.2.4)  t   P( x1 )ΓP( x2 ) ΓP( xt )   P( x1 ) ΓP( xs ), s 2 Then we have LT  T 1, t  t 1ΓP( xt ), t  (1.2.5) It is easy to calculate LT by regression algorithm To find the parameter set satisfies LT maximal, we can perform two methods: Direct estimation of extreme values function LT (MLE): Firsly,from equation (1.2.5) we need to calculate logarit of LT effectively to advantageous to find the maximum based on the progressive probabilities α t For t  0,1, , T , we define the vector t  t / wt , where wt   t (i)   t 1 , and Bt  P( xt ) We have i w0   1   1  1; 0   ; wtt  wt 1t 1Bt ; LT   t 1  wT (T 1)  wT T Then LT  wT   ( wt / wt 1 ) From equation (1.4.13) we have wt  wt 1  Bt 1 , then T T t 1 log LT   log  wt / wt 1    log t 1Bt 1  t 1 t 1 EM Algorithm: This algorithm is called Baum-Welch algorithm[9] for consistent Markov chain (Not necessarily Markov stop) The algorithm uses forward probabilities (FWP) and backward probabilities (BWP) to calculate LT 1.2.3 Forecasting distribution For discrete observations, forecasting distribution Pr ( X nh  x | X ( n)  x( n) ) is a ratio of LT based on conditional probability (T ) (T ) Pr ( X  x , X T  h  x) Pr ( X T  h  x | X (T )  x (T ) )  (T ) (T )h  P(xPr BxT Γ (X )P(x)1 )B B   h  ΓPP(x(x1 )1B2B BT 1  T ) T 1 (T ) (T By T  T / T 1 $, we have Pr ( X T h  x | X  x )  T h P(x)1 Forecasting distribution can be written as probabilitymof dependency random variables: Pr ( X T  h  x | X (T )  x (T ) )   i (h) pi ( x) where the weight i (h) is the i component of vectori 1 T  h 1.2.4 Viterbi algorithm The objective of Viterbi algorithm is to find the best of state sequences i1 , i2 , , iT corresponding to the observation sequence x1 , x2 , , xT which maximizes the function LT Set 1i  Pr (C1  i, X1  x1 )  i pi ( x1 ), where t  2,3, , T ti  max c ,c , ,c Pr (C (t 1)  c(t 1) , Ct  i, X (T )  x(T ) ) Then we can see that probability tj satisfies the following recursion process for t  2,3, , T and i  1, 2, , m : tj  max i (t 1,i ij ) p j ( xt ) The best state sequence i1 , i2 , , iT is determined by regression from iT  argmaxTi and i 1, ,m for t  T  1, T  2, ,1, we have it  argmax(ti i ,i ) t 1  1.2.5 Status forecasting  i 1, , m t 1 For status forecasting, we only use the Bayes formula in classical For i  1, 2, , m, Pr (CT h  i | X (T )  x(T ) )  αT Γh (, i) / LT  T h (, i) Note that, when h  , n Γh moves towards the stop distribution of the Markov chain 1.3 Fuzzy time series 1.3.1 Some concepts Suppose U be the discourse domain This space determines a set of objects If A is a crisp set of U then we can determine exactly a feature function: ( ) { Definition 1.3.1 [34]: Suppose U be the discource domain and U  {u1 , u2 , , un } A fuzzy set A in U defined: A=f A (u1 )/u1 +f A (u2 )/u2 + +f A (un )/un f A is membership function of fuzzy set A and f A : U  [0;1], f A (ui ) is a degree of membership (the rank) of ui on A Definition 1.3.2 [34]: Let Y (t )(t  0,1, 2, ) be a time series that its values in the discource, which is a subset of real numbers On which, the fuzzy sets fi (t )(i  0,1, 2, ) is determined on Y  t  , and F (t ) is a collection the sets f1 (t ), f (t ), , then F (t ) is called fuzzy time series on Y t  Definition 1.3.3.[34]: Suppose that F (t ) is only inferred from F (t  1) , denoted as F (t  1)  F (t ) , this relationship can be expressed as follows F (t )  F (t 1)oR(t , t 1) , in which F (t )  F (t 1)oR(t , t 1) is called as rst-order model of F (t ), R(t , t  1) , is the fuzzy relationship between F (t  1) and F (t ) , and "o" is a component operator Max–Min Definition 1.3.4 [34]: Let R(t , t  1) be a first-order model of F (t ) For any t , R(t , t 1)  R(t 1, t  2) , then F (t ) is said to be a time-invariant fuzzy time series Otherwise, F (t ) isa time-variant fuzzy time series 8: Calculate gamma[ j,k] 9: Calculate mu[j] 10: Calculate sigma [ j] 11: Calculate delta 12: crit ← sum(abs(mu[j] – mu()[j])) + sum(abs(gamma[jk] – gamma()[jk])) + sum(abs(delta[j] –delta()[j]))+sum(abs(sigma[j]−sigma()[j])) { the converge criteria } 13: if crit < tol then 14: AIC← -2 ∗ (llk−np) { the criteria AIC} 15: BIC← -2 ∗ llk+np ∗ log(n) {the criteria BIC} 16: return (mu, sigma, gamma, delta, mllk, AIC , BIC) 17: else { If not converged } mu0←mu; sigma0←sigma; gamma0←gamma; delta0←delta { Reassign the new original parameter } 18: Not converging later, “maxiter”, loop 2.2 Experimental results for HMM with Poisson distribution 2.2.1 Parameter estimation Table 2.2.1 Estimate parameters of model Poisson-HMM for time.b.to.t with states m=2,3,4,5 2 3 4 11,46267 40,90969 5,78732 21,75877 57,17104 5,339722 16,943339 27,711948 58,394102 5,226109 15,679316 25,435562 38,459987 67,708874 0,6914086 0,3085914 0,3587816 0,5121152 0,1291032 0,3189824 0,3159413 0,2301279 0,1349484 0,31513881 0,28158191 0,22224329 0,10376304 0,07727294  0,8 0,2     0,51 0,49   0,46 0,47 0,07     0,33 0,47 0,02   0,2 0,8    0,4 0,46 0,07 0,07      0,53 0,29 0,18  0 0,51 0,49      0,19 0,56 0,25  0,38 0,4 0,15 0,07    0,14   0,5 0,36  0,13 0,33 0,19 0,35    0,53 0,47 0    0,33 0,67 0   Table 2.2.2 Mean and variance compared with sample M Mean 20,45238 20,45238 20,45238 20,45238 Variance 20,45238 205,5624 272,6776 303,7112 216,8401 171,1243 159,898 154,6275 Sample 20,45238 20,45238 303,4568 307,083 The results show that the Poisson-HMM model with states has a variance close to the sample variance However, there is not enough evidence to confirm that the 4-state model is the best In order to have better methods of selection, we need to have standards for selecting models in more detail 2.2.2 Model selection Given the observed x1 , , xT were born by a “real" unknown f model and we model by two different approximations {g1  G1} {g2  G2 } Purpose of model selection is to identify the best model in some aspect Now, apply standards AIC and BIC to Poisson-HMM model for data set time.b.to.t, the results are listed in Table 2.3.3 Table 2.2.3 AIC and BIC Standards m AIC BIC 441,6803 448,6309 360,2486 375,8876 351,7961 379,5988 359,2551 402,6968 2.2.1 Forecast Distribution As mentioned above, training data for the HMM model was obtained from January 2006 to 19 June 2013 We will get the following data from 14/06/2013 to 22/08/2013 to compare with forecast results of the model Figure 2.1.2 shows fluctuation of closing VN-Index during this period As we see, the number of sessions that VN-Index fluctuates from the bottom (26/06/2013) to the peak (19/08/2013) is 35 days Thus, this value corresponds to the state of the model (Poisson distribution with the mean at 27.711948) We will wait to see the results of the forecast model Figure 2.2.1 Vn-Index fluctuation from 14/06/2013 to 22/08/2013 and waiting time from bottom to peak Pr ( X T h  x | X( T)  x( T) ) Now, we need to find the formula for predicting the distribution With the matrix formulas as shown in the previous sections, this distribution can be computed as follows:  T  T    P X T   x T  , X T  h  x P X T  h  x|X  x  δP  x1  ΓP  x2  ΓP  x3 ΓPP XxTT Γh P x T x 1'  h P δP  x1  ΓP xT2 Γ Γ P  xx3 1'  ΓP  xT 1'  T 1' Given T  αT / αT 1' , we have T T P X T  h  x|X    x    T Γh P  x 1      These distributions are summarized in Table 2.3.4 Table 2.2.4 Distribution forecast information & intervals Forecast Mode Forecast mean Forecast interval Probability Reality 27 26 5 42,30338 30,16801 25,53973 23,68432 Estimated range with probability over 90% [ ] [ ] [ ] [ ] 0,9371394 0,9116366 0,9342868 0,9279009 35 - 5 22,48149 21,91300 [ ] [ ] 0,9237957 0,9215904 - 2.2.2 Forecast states In the previous section we have found the conditional distribution of the state Ct given observation X (T ) In this way we only consider the present state and past states However, it is also possible to calculate the conditional distribution for the future state CT  h , This is called state forecast α Γh (, i) Pr (CT  h  i | X( T)  x( T) )  T  T Γh (, i) L  T with t  αT / αT We perform forecasting state of the Poisson-HMM model, states of data time.b.to.t in times with results as shown in Table 2.2.5 Table 2.2.5 times forecasting state of time.b.to.t State = 2.3 0,006577011 0,003744827 0,506712945 0,482965217 0,09686901 0,27624774 0,37858412 0,24829913 0,2316797 0,2658957 0,3104563 0,1919683 0,2688642 0,2931431 0,2698832 0,1681095 0,2934243 0,3048425 0,2508581 0,1508750 0,3060393 0,3098824 0,2407846 0,1432937 Experiemental result of HMM model with normal distribution 2.3.1 Parameters estimation 0283 0000 0000  by EM we have:  0,9717 With any initial distribution (e.g.: (1/0,4,1/ estimated 4,1/0,4,1/ 4) ),0,   0, 0927 0,8106 0, 0804 0, 0163     0, 0000 0, 0748 0,8624 0, 0628    0, 0000 0, 0000 0, 0818 0,9182     (453,9839;484,6801;505,9007;530,8300)   (10,6857;7,1523;6, 4218;13,0746) Figure 2.3.1 describes values of VNIndex with best state range using Viterbi algorithm The dashed lines represent the four states while the dark dots represent the best state for the value at each time Figure 2.3.1 VN-Index data: best state range 2.3.2 Model selection According to the theory of HMM on the BIC and AIC criteria for the VNI index, AIC and BIC selected status The values of the standard given in the Table 2.4.1 Table 2.3.1 VN-Index data: select state number Model 2-state HM 3-state HM 4-state HM 5-state HM -logL 1.597,832 1.510,989 1.439,179 No convergence AIC 3.205,664 3.043,978 2.916,358 BIC 3.225,312 3.087,204 2.991,02 2.3.3 Forecast distribution As described in part 1.3.3 Chapter 1, Figure 2.3.2 represents 10 forecast distributions for VNIndex value We see the forecasts distribution move toward stop distribution very fast Figure 2.3.2 VN-Index data: forecast distribution of 10 following days Thus, the HMM model with distributions is certainly suitable with the prediction in some cases, especially with the data it actually fits with the distribution selected in the model However, whether the time series generated by a random variable that fits into the normal distribution (or mix with normal distributions) or any other distribution selected is a question that will determine the appropriateness and accuracy level of the forecasts 2.3.4 Forecase state Table 2.3.2 Maximum ability (probability) forecast for each state of 30 following days stating from the last date 13/05/2011 Days State=[1,] [2,] [3,] [4,] [1,] [2,] [3,] [4,] [1,] [2,] [3,] [4,] [1,] [2,] [3,] [4,] [1,] [2,] [3,] [4,] [,1] 0,0975 0,8062 0,0799 0,0162 [,2] 0,1695 0,6622 0,1351 0,0330 [,3] 0,2261 0,5517 0,1724 0,0496 [,4] 0,2709 0,4665 0,1971 0,0653 [,5] 0,3065 0,4005 0,2128 0,0800 [,6] 0,3350 0,3492 0,2223 0,0933 [,7] 0,3579 0,3092 0,2274 0,1053 [,14] 0,4355 0,1870 0,2200 0,1573 [,21] 0,4586 0,1593 0,2066 0,1754 [,8] 0,3764 0,2778 0,2296 0,1160 [,15] 0,4405 0,1803 0,2176 0,1613 [,22] 0,4604 0,1576 0,2053 0,1766 [,28] 0,4676 0,1517 0,2000 0,1805 [,9] 0,3915 0,2530 0,2298 0,1255 [,16] 0,4448 0,1749 0,2154 0,1647 [,23] 0,4619 0,1561 0,2041 0,1776 [,10] 0,4039 0,2334 0,2288 0,1338 [,17] 0,4484 0,1705 0,2133 0,1676 [,24] 0,4633 0,1549 0,2031 0,1784 [,29] 0,4684 0,1512 0,1995 0,1807 [,11] 0,4141 0,2177 0,2270 0,1410 [,18] 0,4515 0,1669 0,2113 0,1701 [,25] 0,4646 0,1539 0,2022 0,1791 [,12] 0,4225 0,2052 0,2248 0,1473 [,19] 0,4542 0,1639 0,2096 0,1722 [,26] 0,4657 0,1530 0,2014 0,1797 [,30] 0,4692 0,1507 0,1990 0,1809 [,13] 0,4296 0,1951 0,2224 0,1527 [,20] 0,4565 0,1614 0,2080 0,1739 [,27] 0,4667 0,1523 0,2007 0,1801 We see the highest probability in the first days falls in state and the next day falls into the state.1 Therefore, the model is not effective in long term but good for short term However, we can forecast by continuously updating the data automatically The dissertation will be updated from 14/5/2011 to 23/6/2011 with 30 closing price of the stock to compare the forecast value with the actual value of the data Figure 2.3.4 shows that the value of these 30 days is mostly in state This proves that the forecast is correct Figure 2.3.3 VNIndex data: Comparison of forecast state and actual state 2.4 Result comparison This dissertation presents forecast results of the HMM model with a number of models [19] on some of the data as stock indexes Due to the value characteristics of the growth time series receiving real values, the HMM model with normal distribution is chosen The proposed model in this dissertation and the comparative model are performed on the same training set and on the same test set to ensure the accuracy of the comparison The accuracy measurement used is the average percent error (MAPE) calculated by: n MAPE   pi *100%  n i 1 Table 2.4.1 Multiple MAPE run by HMM model on Apple data 1,812 1,779 1,778 1,788 1,790 1,802 1,784 1,816 1,815 1,778 1,777 1,800 1,812 1,790 1,794 1,789 Mean: 1,795 Accuracy mean 1,795 and average forecast value are illustrated in Firgure 2.4.1 Figure 2.4.1 HMM forecast for Apple share price:actual-real price; predict-forecasted price Similar is Ryanair Airlines stock data from January 6, 2003 to January 17, 2005; IBM Corporation from January 10, 2003 to January 21, 2005, and Dell Inc from 10/01/2003 to 21/01/2005 A comparison of the MAPE accuracy measurement with 400 training observations is shown in Table 2.5.2 Table 2.4.2 Comparison of the MAPE accuracy measurement with other models Data Apple Ryanair IBM Dell ARIMA model 1,801 1,504 0,660 0,972 ANN model 1,801 1,504 0,660 0,972 HMM model 1,795 1,306 0,660 0,863 Given the results in Table 2.4.2 we see the model HMM with normal distribution provide higher forecast accuracy compared with other classic models ARIMA and ANN Chapter EXTENSION OF HIGHER ORDER MARKOV CHAIN MODEL AND FUZZY TIME CHAIN IN FORECASTING 3.1 Higher-order Markov chain Assume that at each data point Ct in a given classified data series, take value in the set I  1, 2, , m and m is limited, i.e the value set has m types or states A Markov chain order k is a random variable set that Pr (Cn  cn | Cn1  cn1 , , C1  c1 )  Pr (Cn  cn | Cn1  cn1, , Cnk  cnk ) In [30], Raftery proposes a higher-order Markov chain model (CMC) This model can be written as follows: k P ( C  c | C  c , , C  c )  i qc c (3.1.1)  n n n 1 n 1 nk nk k i 1 Whereas  i  , and Q  [qij ] isk a shift matrix with total column number is 1, then: i 1   i qc c  1, cn , ci  I (3.1.2) n i n i i 1 3.1.1 Improved higher-order Markov model (IMC) In this subsection, the dissertation presents the extension of the Raftery [30] model to a more general higher-order Markov chain model by allowing Q to change as various degrees of latency Here we assume that the non-negative weight i that satisfies: k  We have (3.1.1) re-written as: i i 0 1 (3.1.3) k Cn  k 1   i QCn  k 1i (3.1.4) Where Cn k 1i is the probability distribution of states at time (n  k   i) Use (3.1.3) and Q a transition probability matrix, we have each element Cn k 1 between and , and sum of all elements is In Raftery model, not given  is not negative then conditions (3.1.2) are added to ensure that Cn k 1 is probability distribution of all states i 1 Raftery model in (3.1.4) can be kgeneralized as below: Cn  k 1   i Qi Cn k 1i i 1 The total number of independent parameters in the new model is (k  km2 ) (3.1.5) 3.1.2 Parameter estimation In this section, the author presents effective methods for estimating parameters Qi and i with i  1, 2, , k To estimate Qi , we can consider Qi as a matrix to transfer step i in classification data sequence Cn Given the classification data Cn , we can count transition l to state(i ) j after i steps Moreover, we can build the frequency f jl(i ) in the sequence from(istate  f11 ) f m1  matrix to transfer i steps for sequence  (i ) Cn as below: (i )  f fm2  F (i )   (12i ) qˆ11 qˆm(i1)  (i ) (i ) Given F , we receive estimation for  ( i()i ) Qi  [ qlj ] as ( i )( ifollow: ) ˆ ˆ q q f f 12 m    ˆ m mm Qi   (i ) m  flj  Where if  flj(i )(i   m(i ) 0 qˆ1m (i ) l 1 qˆmm)  (i )  qˆlj    flj  l 1 0 other We note that the complex calculations of the construction of F (i ) is the calculation of O( L2 ) , where L is length of data sequence So the total complexity of the construction F (i )ik1 is the calculation of O(kL2 ) Where k is number of latencies We now present the steps for estimating the parameters i as follow [15] that the dissertation will use to embed the proposed combined model Given Cn  C when n move to infinity, then C can be estimated from sequence Cn by calculating the occurrence of each statek in the sequence and we set it as Cˆ i Qi Cˆ  Cˆ  1 This gives us an estimation of ithe parameters   (1 , , k ) as below We consider the following minimum problem: k  ||  i Qi Cˆ  Cˆ || k i 1 with condition that  i  1, i  0, i i 1 Here | || is the standard Vector Special case, if select || || , we have the below minimum problem: k  max l | [ i Qi Cˆ  Cˆ ]l | k i 1 with condition that  i  1, i  0, i i 1 Here [.]l identify the l element of Vector The difficulty here is the optimization to ensure the existence of stable distribution C Next, we consider the above minimum problem to be constructed as a linear problem:    1    with condition that    2  ˆ ˆ ˆ ˆ ˆ ˆ ˆ    C  [Q1C | Q2C | | QnC ]      Cˆ  [Qˆ Cˆ | Qˆ Cˆ | | Qˆ Cˆ ] 2 n k   n      0,  i  1, i  0, i  i 1    n  We can solve the linear problem above and obtain the parameter i Instead of solving a min-max problem, we can also choose || ||1m andk build the following minimum problem:   |[ i Qˆ i Cˆ  Cˆ ]l | k with condition that  i  1, i  0, i l 1 i 1 i 1 The corresponding linear problem is given below:m   l  1   1  l 1 with condition that       12  ˆ ˆ ˆ ˆ ˆ 21 ˆ ˆ    C  [Q1 X | Q2C | | QnC ]   2  Cˆ  [Qˆ Cˆ | Qˆ Cˆ | | Qˆ Cˆ ] 2   n k1  m  k   i  0, i,  i  1, i  0, i   i 1  m linear  \In constructing the above problems, the number ofk variables is equal to k and the number of conditions is equal (2m  1) The complexity of solving linear problems is calculated O(k L) , where n is variable and L is the number of binary bits needed to store all the data (conditions and target functions) [18] 3.2 Select fuzzy time series in the combined model Consider time series with observations X1 , X , , X T , with growth chain x1 , x2 , , xT , (defined immediately below) We want to classify growth into different states such as "slow", "normal", "fast" or even more However, each xt at a time t is unclear at what degree even though we define levels clearly This means, xt can vary from one level to another and different levels of membership Therefore, the fuzzy time series theory in Section 1.4 of Chapter can this to classify the subset of xt (defined in the following section) into states that xt are members Assuming that these states follow a Markov chain, the Markov model gives us a predicted future state From the future state, the predicted value xt is calculated from the previous definition of the fuzzy period 3.2.1 Define and segment base sequence Given the training set { yt }tN1 , we can define the base sequence for growth sapce by U   t{1, , N } yt   ; max t{1, , N } yt    with   a positive number is chosen so that growth in the future does not exceed max t{1, , N } yt   With each data we can select different  However, select   can satisfy all stock growths In order to make sequence U fuzzy into growth labels such as "fast increase," "slow increase," "steady increase," or even k levels, the base sequence U is divided into k interval (the simplest is divided into consecutive equal intervals) u1 , u2 , , uk For example, if the zoning of VN-Index (Vietnam stock index) is: U  [0.0449, 0.0150]  [0.0150,0.0149] [0.0149,0.0448] Then VN-Index results are coded as in Table 3.3.1 Table 3.2.1 Fuzzy growth chain Date 04/11/2009 05/11/2009 06/11/2009 09/11/2009 10/11/2009 11/11/2009 xi 537,5 555,5 554,9 534,1 524,4 537,6 Index -0,015997 -0,031866 -0,026580 0,054237 0,020036 0,002917 growth ( yi ) NA 0,0334883 -0,0010801 -0,0374842 -0,0181613 0,0251716 Code NA 1 3.2.2 Fuzzy rule of time series We identify fuzzy Ai , each Ai is assigned with a growth tag and identified on sprecific paragraphs u1 , u2 , , uk Then the fuzzy Ai can be described as: Ai   Ai (u1 ) / u1   Ai (u2 ) / u2    Ai (uk ) / uk where  Ai is a member function of each u j , j  1, , k in Ai , i  1, , k Each fuzzy value of time series yt is re-calculated based on fuzzy rule  Ai For example: A1  1/ u1  0.5 / u2  / u3   / uk A2  0.5 / u1  1/ u2  0.5 / u3   / uk Ak  / u1  / u2  / u3   1/ uk where yt  A2 is a unclear value, then the clear value is re-calculated based on fuzzy rule by: yt  0.5m1  m2  0.5m3 , where m1 , m2 , m3 are midpoints of u1 , u2 , u3 respectively For different fuzy rules, the inversed rule is different 3.3 Combined model of Markov chain and fuzzy time series 3.3.1 Combined model a first-order Markov chain In this section, we describe in detail the combination of the Markov model - fuzzy time series This combination is illustrated in Figure 3.3.1 Details of each step are shown below: Compute the returns of the traing set and define the discouse Step Make partition of the discource Step Fuzzify the returns data Step Train the higher – order Markov model for fuzzy series Step Forecast the encoded series and defuzzy into forecast price Step Figure 3.3.1 Model structure Step 1: Given observation data of a time series {x1 , x2 , , xT } growth chain of training data is calculated as follows: yt  We have xt 1  xt , xt xt 1  (1  yt ).xt Given Dmax and Dmin are the maximum value and minimum values of the growth chain after removing the extraneous value, then the base sequence U  [ Dmin   , Dmax   ] where   can be set as a threshold for the increase of changes Step 2: A partition of universe is generated in simplest way by devide [ Dmax , Dmin ] into k  equal intervals Then the universe U  u1  u2   uk where u1  [ Dmin   , Dmin ] and uk  [Dmax , Dmax   ] Step 3: In this research, the linguistic terms A1 , A2 , A3 , , Ak of the time series representing by fuzy sets, are also defined in the simplest way as follows: A1  1/ u1  0.5 / u2  / u3   / uk A2  0.5 / u1  1/ u2  0.5 / u3   / uk Ak  / u1  / u2  / u3   1/ uk Each Ai is encoded by i for i {1, 2, , k} Therefore, if a datum of the time series belongs to ui , it is encoded by i ( i {1, 2, , k} ) We obtain a encoded time series {ct }Tt 1 , ct {1, 2, , k} For instance, if a partition of discourse VN-Index (Vietnam stock index) as in part 3.2.1 Step 4: This step explains how Markov chains are applied in encoded time series Acording to section 3.2, we assume that the encoded time series is a Markov chain as defined in Definition 1.2.1 To estimate the parameters of the Markov chain as in Section 1.2.3, it is easy to estimate the transition probability matrix Γ  [γ ij ], i, j {1, 2, , k}, where:  ij  Pr (ct 1  j | ct  i) If exist the state ct  i is the absorption state (see 1.2.1), to ensure regularity of Γ define Pr (ct 1  j | ct  i )  with all j  1, 2, , k This means probability change from state i to any k other state is the same Step 5: Next we generate the one-step-ahead forecast for encoded time series and defuzzy forecast fuzzy set into forecast value of the returns Given ct , column Γ[, ct ] is the probability ct 1  j, j  1, 2, , k distribution of If 2 M  ( (m1  0.5m2 ), (0.5m1  m2  0,5m3 ), , (mk 1  0.5mk )) where mi is the middle value of 3 the interval ui then the forecast returns at t  k1 is calculated as below: yˆt 1  Γ[, ct ]*M   a jct m j j 1 In this step, the vector M can be selected differently according to the fuzzy method in Step Finally, the x value is calculated as follows: xˆt 1  ( yˆt  1)* xt 3.3.2 Extension of higher-order Markov chain Higher-order Markov chain model combined with fuzzy time series is different from Markov chain first-order model in Step and Step Step 4: For the conventional higher order Markov model associated with fuzzy time series (called CMC-Fuz), by maximizing the same in the first-order Markov chain model, it is easy to estimate the transition probability matrix l  dimension Γ  [ i i i ], i j {1, 2, , k} As defined in high-order Markov chain,  i i i is the observed probability ct 1 given the known ct , , ct l 1 :  i i i  Pr (ct 1  il 1 | ct  il , , ct l 1  i1 ) For the new combined high-order Markov model (called IMC-Fuz), transfer matrix is l m  m    i Qi as section 3.1.4 1 Step i5: Next, we generate a one-step forecast for the time series encoded by the transition probabilities matrix and the inverse of the prediction value of the time series With model CMC-Fuz, given ct , , ct l 1 , column [, ct , , ct l 1 ] is probability distribution of ct 1  j for all k encoded value j  1, 2, , k Forecast growth value at the time t  is computed by: k xˆt 1  [, ct , , ct l 1 ]* M    jc c m j j 1 With model IMC-Fuz, Forecastl growth value at the time t  is calculated by: xˆt 1   i Qi [, ct i 1 ] 1 Finally, value X t 1 forecast is icomputed by: ˆ X t 1  ( xˆt  1)* X t l 1 l l 1 l l 1 l 1 t t l 1 Algorithm 3.1 Combined Markov - Fuzzy algorithm Input: Data,   1, nTrain, nOrder , nStates Output: predict , RMSE, MAPE, MAE 2: Datat 1  Datat , t  2, , nTrain Datat Train nTrain 0,5mid ( A3 )), , / 3(0.5mid ( Ak 1 )  mid ( Ak ))) ⊲ reverse fuzzy rule with | ( Ai ) is middle point of interval Ai predictt  (transition.Mats[, encodedt 1 , encodedt 2 , , encodedt nOrder 1 ]%*%M  1)* Datat 17: errors (RMSE, MAPE, MAE)  f ( predictt  actualt ) ⊲ calculate accuracy 16: Where, nTrain is the observation number in the training set; nOrder is order of highorder Markov chain and nStates is the number of states (the Ak ) in the model Thus, models CMC-Fuz and IMC-Fuz with order nOrder  coincides with the combined model of first-order as in item 3.4.1 As a result, the experimental results for the first order Markov chain model were performed simultaneously in the high-order Markov chain model 3.3.3 Experimental result Data collection In order to compare results in [19, 20, 17, 26, 38, 33], we use similar data used in [40, 29, 7, 37] Moreover, other different data are also used to check the accuracy of the model Details are given in Table 3.3.2 Table 3.3.2 Comparative data sets Data Name From To Frequency Apple Computer Inc 10/01/2003 21/01/2005 Daily IBM Corporation 10/01/2003 21/01/2005 Daily Dell Inc 10/01/2003 21/01/2005 Daily Ryanair Airlines 06/01/2003 17/01/2005 Daily TAIEX (Taiwan exchange index) 01/01/2001 31/12/2009 Daily SSE(Shanghai Stock Exchange) 21/06/2006 31/12/2012 Daily DJIA( Dow Jones Industrial Average Index) 04/08/2006 31/08/2012 Daily S&P500 04/08/2006 31/08/2012 Daily Unemployment rate 01/01/1948 01/12/2013 Monthly Australian electricity 01/01/1956 01/08/1995 Monthly Poland Electricity Load From 1990‟s 1500 values Daily This study does not fixed sets of training and test data Therefore readers can make appropriate changes when they apply specific data In many cases, the experimental results show that the training data is between 75% and 85% for the best predicted result Results compared with other models The first model to compare is the model in [19] Training and test data sets on Apple inc., Dell comp., IBM cor., Ryanair Airlines are used similarly (nTrain = 400 ) British Airlines and Delta Airlines are not compared due to datawarehouse on http://finance.yahoo.com// are not complete and correspondent with [19] Table 3.3.3 Compare MAPEs against other models Stock HMM-based Fusion Combination CMC-Fuz IMC-Fuz forecasting HMM-ANN- of model model model GA HMM-fuzzy nStates =6 nStates =6 with weighted model(MAPE) nOrder =1 nOrder =2 average (MAPE) Ryanair Air 1,928 1,377 1,356 1,275 1,271 Apple 2,837 1,925 1,796 1,783 1,783 IBM 1,219 0,849 0,779 0,660 0,656 Dell Inc 1,012 0,699 0,405 0,837 0,823 Table 3.3.3, with nStates =6, we can see model IMC-Fuz with nOrder = is better than model CMC-Fuz with nOrder = Both models are better than the models compared against data in [19] Comparative results are shown in Table 3.3.4 Comparison results of the IMC-Fuz and CMC-Fuz models are slightly better than other models for SSE data and much better for the DJIA and S & P500 data Table 3.3.4 Compare different models using SSE, DJIA S\&P500 data sets Data Measurement IMC-Fuz CMC- BPNN STNN SVM Fuz PCA- PCA- BPNN STNN SSE MAE 20,5491 20,4779 24,4385 22,8295 27,8603 22,4485 22,0844 RMSE 27,4959 27,4319 30,8244 29,0678 34,5075 28,6826 28,2975 MAPE 0,8750 0,8717 1,0579 0,9865 1,2190 0,9691 0,9540 MAE 90,1385 90,4159 258,4801 230,7871 278,2667 220,9163 192,1769 RMSE 123,2051 123,2051 286,6511 258,3063 302,793 250,4738 220,4365 MAPE 0,7304 0,7304 2,0348 1,8193 2,2677 1,7404 1,5183 MAE 10,4387 10,4387 24,7591 22,1833 22,9334 16,8138 15,5181 RMSE 14,2092 14,2092 28,1231 25,5039 25,9961 20,5378 19,2467 MAPE 0,8074 0,8074 1,8607 1,6725 1,7722 1,282 1,1872 DJIA S&P500 In the recent published [33], the authors proposeda new fuzzy time prediction model and comparison with the different methods in the TAIEX forecast from 2001 to 2009 Data from January to October of each year used as training data and the rest from November to December for forecasting and accuracy calculation Table 3.2.5 shows that our model with nStates = and nOrder = 1.2 is better than all the models mentioned Table 3.3.5 Compare RMSEs of TAIEX for the years from 2001 to 2009 with nStates = Method 2001 2002 2003 2004 2005 2006 2007 2008 2009 Average Chen 1996[12] 104,25 119,33 68,06 73,64 60,71 64,32 171,62 310,52 92,75 118,36 ARIMA 97,43 121,23 71,23 70,23 58,32 64,43 169,33 306,11 94,39 116,97 Yu 2005[42] 100,54 119,33 65,35 71,50 57,00 63,18 168,76 310,09 91,32 116,34 ETS 96,80 119,43 68,01 72,33 54,70 63,72 165,04 303,39 95,60 115,45 Yu 2005 [42] 98,69 119,18 63,66 70,88 54,69 60,87 167,69 308,40 89,78 114,87 Huarng 2006[22] 97,86 116,85 61,32 70,22 52,36 58,37 167,69 306,07 87,45 113,13 Chen 2011[13] 96,39 114,08 61,38 66,75 52,18 55,83 165,48 304,35 85,06 111,28 ARFIMA 95,18 115,13 59,43 58,47 50,78 51,23 163,77 315,17 89,23 110,93 Javedani 2014 [32] 94,80 111,70 59,00 64,10 49,80 55,30 163,10 301,70 84,80 109,37 Sadaei2016 [33] 89,47 104,37 49,67 59,43 37,80 47,30 154,43 294,37 78,80 101,74 Sadaei2016 [33] 86,67 101,62 45,04 55,80 34,91 45,14 152,88 293,96 74,98 99,00 Order=1 117,73 68,44 55,96 56,58 55,97 51,87 159,36 106,9 71,51 82,7 Order=2 115,75 67,5 53,75 56,58 55,97 51,73 159,36 105,12 71,51 81,92 Order 116,52 68,45 55,97 56,58 55,97 51,87 159,37 106,9 71,51 82,57 Order 119,42 71,51 54,81 56,93 60,12 53,57 164,32 106,97 82,03 85,52 IMC-Fuz CMC-Fuz This dissertation presents the Markov chain model (both first-order and higher-order) and fuzzy time series in time series forecasting First, the fuzzy set of time series proposed by the fuzzy sets becomes the states of a Markov chain Second, the model extension for the classic high-order Markov chain and the improved advanced Markov chain corresponded to the highorder Markov chain parameterization algorithms Third, performing experiments on the same training set and test set for recent prediction models suggests that the proposed model has significantly higher accuracy although the algorithm is simpler CONCLUSION Dissertation results With the aim to develop the forecasting model by integrating existing models into new models to improve predictive accuracy, the dissertation has carried out the research contents: An overview of the Markov chain, the higher-order Markov chain and the Markov chain parameterization methods Analyze the potential applications of the Markov chain in time series prediction The dissertation finds that the fuzzy time series model in predicting time series constraints is unclear in terms of time series, so some fuzzy time series theory as well as some predictive algorithms The use of fuzzy time series is generalized From the basis of the advantages and limitations of existing forecasting models, the dissertation proposes new combined predictive models to improve forecast accuracy First, apply the Hidden Markov Model (HMM) for the Poisson distribution and the Normal distribution for the prediction model for a particular time series based on the analysis of data compatibility with the model (Section 2.1) A series of algorithms are implemented and run on real data that shows the reasonableness of forecasts for short periods of time Secondly, to overcome the disadvantages of the HMM model (based on the deterministic probability distribution that empirical distribution does not follow) and overcome the (unclear) blurring of time series data Output model combines Markov chain and fuzzy time series in time series forecast Algorithms that combine the two models have been established and tested on a wide range of data compared to recent forecasting models that show significant improvement in predicted accuracy In particular, the higher-order Markov model incorporates a potentially large fuzzy time series applied to seasonal time series forecasting The contributions of the dissertation are already installed experimentally on the R programming language and the functions are given in the Appendix Development of dissertation topic Incorporate the Markov chain with more complex fuzzy rules in order to more accurately determine the role of each value in the time series for a fuzzy set This can further improve the accuracy of the forecast Expand the model for multivariate time series, in which the time series components depend on each other The target time sequence is related to the other chains (chain of events) according to the Markov states defined on these impact sequences From many impact sequences, it is possible to combine with the ANN model to build predictive models taking into account external factors This is in line with reality Parametric optimization problem is still an open direction Specifically, the proposed dissertation model is implemented with nOrder  and nStates  sufficient for comparison with other models However, they are not the best parameter Therefore, the construction of an inference facility and the algorithm that determines the best parameters for the model are also issues that can be extended ... the Markov chain model (both first-order and higher-order) and fuzzy time series in time series forecasting First, the fuzzy set of time series proposed by the fuzzy sets becomes the states of. .. Jilani and Nan combined Heuristic model with Fuzzy Time Series model to improve the model accuracy [24] Chen and Hwang expanded the Fuzzy Time series model into Binary model [14] and then Hwang and. .. normal-hidden markov model model in forecasting stock index Journal of Computer Science and Cybernetics, 28(3):206–216, 2012 [5] Dao Xuan Ky and Luc Tri Tuyen A Higher order Markov model for time series forecasting

Ngày đăng: 17/01/2020, 11:57

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan