VIET NAM NATIONAL UNIVERSITY HO CHI MINH CITYUNIVERSITY OF INFORMATION TECHNOLOGY ADVANCED PROGRAM IN INFORMATION SYSTEMS DO HOANG HIEP THESIS GRADUATION REAL ESTATE PRICE FORECAST IN DI
Trang 1VIET NAM NATIONAL UNIVERSITY HO CHI MINH CITY
UNIVERSITY OF INFORMATION TECHNOLOGY
ADVANCED PROGRAM IN INFORMATION SYSTEMS
DO HOANG HIEP
THESIS GRADUATION
REAL ESTATE PRICE FORECAST IN
DISTRICT 7 AT HO CHI MINH CITY BY
LONG SHORT-TERM MEMORY MODEL
BANCHELOR OF ENGINEERING IN INFORMATION SYSTEMS
HO CHI MINH CITY, 2021
Trang 2VIET NAM NATIONAL UNIVERSITY HO CHI MINH CITY
UNIVERSITY OF INFORMATION TECHNOLOGY
ADVANCED PROGRAM IN INFORMATION SYSTEMS
DO HOANG HIEP-18520726
THESIS GRADUATION
REAL ESTATE PRICE FORECAST IN
DISTRICT 7 AT HO CHI MINH CITY BY
LONG SHORT-TERM MEMORY MODEL
BANCHELOR OF ENGINEERING IN INFORMATION SYSTEMS
THESIS ADVISOR
Dr CAO THI NHAN
HO CHI MINH CITY, 2021
Trang 3ASSESSMENT COMMITTEE
The Assessment Committee is established under the Decision
-date by Rector of the University of Information Technology
L ÒÔ — Chairman
— Secretary
+ — Member
Trang 4First of all, I would like to express my gratitude to Dr Cao Thi Nhan for being morethan just my thesis advisor during our graduation thesis but since I joined the University of
Information Technology as a student of the faculty of Information Systems With patience,
motivation, and immense knowledge, she helped me keep track of the research direction
and gave me lots of advice to complete the thesis Furthermore, she carefully reviews my
thesis, and for all the insightful comments, suggestions, and corrections
Besides, I would like to extend my sincere gratitude to Dr Ngo Duc Thanh for your
invaluable guidance and support during the completion of my thesis Your expertise and
insightful suggestions have greatly contributed to the successful outcome of my work
I am very grateful to my university and faculty for allowing me to prepare my
graduation report and to meet the teachers who guided me each semester so that I have
enough knowledge to do this essay
Trang 5TABLE OF CONTENTS
TABLE OF CONTENTS cccssssssssssssssssscnssssnsenssecnsensssenscncassncssencencasencnsenssecnsencsecnsensseeases 2LIST OF FIGURES .ccsssssssssssssssssenssscseenssscnessssecsesncseenssucassscseenesscnsssencnesscnesscsecsesscseseese 4LIST OF TABLEG ccscsssssssssssscssesessesessesscsessssscsesaesessesueseensseesesnesesnesessesnesesseseonsnesesanee 6LIST OF ABBREVIATIONS cccssssssssssssssssscsssensessenscsscssscncensenscscsssencenseasensesscsncenceneenes 7ABSTRACT sessssssssssssscssescssenssscnsencssenssscnsescsecnssscnsensscnsensaecascncasencasenscucnsencasensencnsencneenetee 8CHAPTER 1: INTRODUCTION ccsssssssssssssssesssscsessssscnssscseenssscnsencnecnesscnesncseenesecseseese 9
1.1 Background c.ccscecccccscsceescscsesesesesesnesesesescsnsnessscseeeesesssesssnesesssssesnenesesssesesneseess 91.2 Objective and scope
2.3 Long-Short Term Memory (LSTM) model and Evaluation Metrics Used 23
2.3.1 Recurrent Neural Network (RNN) cccccccccssesseseseeseseeseeeeseeseeeeesesseeeeseseeaees 232.3.2 Long-Short Term IMeImOTy - - - ¿5+ EeEkSkEk‡kEkEEEEEkekekekkrkrkrketee 24
3.1 Technology used
3.2 Data description
3.3 Data pre-processing
3.4 Experiment processes c.ccccssesessesesseceseseseseeneeerseecseseneenesesesseseeeeneneesseeeeeeaeee 373.5 Experiment r€SUÏL +2 1v n TT HH 42CHAPTER 4: CONCLUSION cscsssssssesssssssssssssssessssesssssssssnssessssnssesssensseessseesssessessseeses 57
Trang 642 Limitations and challenges -¿- + xxx vs rrrrrerekekrkrkrkrkrerrre
443 l000Ề/0i c8 e REFERENCES 5< 5< 5 HH TT H01404 01 0000140800080004.8 080000
Trang 7LIST OF FIGURES
Figure 2-1: Data Preparation Process [5] ::cccccseseseseeeeesseeeeeeeeneneeeeseeeseneaeee 4Figure 2-2: Process Flow of Prediction Model [6] ¿5-5 << 4Figure 2-3: Homepage of https://batdongsan.com.VI/ :.ccccceseeeeeeeeeeeeeeeneee 5Figure 2-4: The raw dataset
Figure 2-5: Processed Dataset
Figure 2-6: Number of category post’s sale 0cccecsceeesesseeeseseesesteseeeseeseeee 7Figure 2-7: Number of category post’s F€II( - 5 +5 tt srverrreeeerkrvree 8Figure 2-8: The proportion of post's sale
Figure 2-9: The proportion of post's sale
Figure 2-10: Average price of saÌle - - + ưu 20Figure 2-11: Average price of T€IIK - + + tt St ng reFigure 2-12: The distribution of project real estate
Figure 2-13: Number of posts sale following by project
Figure 2-14: Number of posts rent following by project
Figure 2-15: Sale price of pÏaCe - ST HH HH ng HưFigure 2-16: Format data price
Figure 2-17: RNN model [13]
Figure 2-18: LSTM model [13]
Figure 3-1: Environment running
Figure 3-2: Configuration
Figure 3-3: Step by step of experiment processes
Figure 3-4: Add libraries to practice LSTM model
Figure 3-5: Dataframe of sale
Figure 3-6: Normalize the rent dataset
Figure 3-7: Dataframe of rent
Figure 3-8: Normalize the rent dataset
Figure 3-9: Create and reshape sale matrix
Figure 3-10: Create and reshape rent matrix
Figure 3-11: Create and fit LSTM model
Figure 3- 12: Training MSE and validation MSE [12]
Figure 3-13: MSE of sale and rent
Figure 3-14: The predicted result of sale
Figure 3-15: The predicted result of sale
Figure 3-16: The selling price data in 2015 “8008092” [25]
Trang 8The selling price data in 2015 “8004059” [27]
The stored data use for train model in 2015 “8004059”
The selling price data in 2023 “31928898” [29]
The rent price in 2016 “8000576” [30]
The stored data use for train model in 2016 “8000576” The selling price data in 2023 “32971380” [31]
The rent price in 2016 “8006099” [32]
The stored data use for train model in 2016 “8006099” The selling price data in 2023 “36228195” |34], ‹-+
The selling price of apartments in Ho Chi Minh City increased faster
than the rental price [22] 0.0 cece cece + + SE 2 T102 21 1H 0111011 trên 55
Trang 9LIST OF TABLES
Table 2-1: Data Description
Table 3-1: The result 1* experiment of model evaluation parameters
Table 3-2: The closest and highest mean value 1* experiment
Table 3-3: The result 2" experiment of model evaluation parameters
Table 3-4: The closest and highest mean value 2TM experiment
Table 3-5: The result 3" experiment of model evaluation parameters “ Table 3-6: The closest and highest mean value 3 experimen( .
Trang 10Long-Short Term Memory
Recurrent Neural Network
Mean Square Error
Root Mean Square Error
Ho Chi Minh city
Trang 11To study the impact of factors on housing prices, I propose to build different predictive
models based on deep learning to identify existing real estate data to predict prices more
accurately housing or its changing trends in the future Considering that the factors that
influence housing prices vary widely, the proposed predictive models fall into two
categories The first is based on many factors that are characteristic of real estate The realestate market is one of the most price-focused and volatile This is one of the key areas forapplying machine learning ideas on how to enhance and predict costs with high accuracy
This examination means to predict house prices in Ho Chi Minh city, specifically district
7 with Long-Short Term Memory model It will help clients to put resources into a request
without moving towards a broker The result of this research proved that the model gives
the highest accuracy
Trang 12The main goal of this thesis is to present a methodological framework for usingmodern machines and deep learning techniques to incorporate external data, in realestate price forecasting To experiment with the forecasting ability of Long-ShortTerm Memory (LSTM) methodology, it helps buyer/renter and seller, especiallythose who don't know a lot of information to make a more precise choice and riskreduction This model is suitable for very large data sets, which can be arbitrarily
expanded, which is also the advantage of applying information technology in data
analysis
Due to the needs of new graduates, they often have a need to find themselves a
place to stay/apartment after their term in the school dormitory expires Additionally,
real estate prices change every day, especially during relocation seasons Usually, the
average young person will stay for about 1 year before moving to a new place
Initially, Recurrent Neural Network (RNN) is used to predict some value using
their internal memory to process arbitrary sequences of inputs (short-term) However,
the simple RNN system is not good enough to do the prediction, so I decided to use
the more complicated LSTM architecture in RNN system excellent at remembering
values for either long or short duration of time To have a result better, I must create
Trang 13predicion models specifically designed for the real estate market by collecting
abundant data from public data Additionally, future studies should attempt to extract
parameters that can ensure higher reliability by performing numerous simulations,
even if via a trial-and-error approach
The application will support users:
¢ Know the market price trend
e Make a plan to manage your finances
The data used in this thesis is available from the websitehttps://batdongsan.com.vn and mainly focus on district 7 at Ho Chi Minh city Inparticular, 2 typical transactions of real estate, namely rent and sale
The full list of data variables is given in Section 2.2.1
There are various considerations influencing the price of properties The price ofreal estate is influenced by several important factors like:
e Location factor
e Trending buy/rent factor
e Sale unit price factor
1.2 Objective and scope
1.2.1 Objective
e Understand the implementation of business data analysis and machine
learning on providing results
¢ Develop price prediction model based on attributes of district 7: price, price
level, sale unit price, unit price, city, district, ward, category, start date, end
date, lat, long
se The method helps predict future prices and know distribution of
apartment/condominium in district 7
1.2.2 Scope
Using Long-Short Term Memory Algorithm for the value predictor of real
estate prices Besides, RMSE is the main metric used for an evaluation in terms
of the efficiency
Trang 141.2.3 Thesis structure
e Chapter 1: Introduction
¢ Chapter 2: Literature review and theorical background
e Chapter 3: Experimental result
¢ Chapter 4: Conclusions
Trang 15CHAPTER 2: LITERATURE REVIEW AND THEORICAL
BACKGROUND
2.1 Related work
With today's technology area, the application of machine learning in all fields is
essential, especially in the real estate sector Because real estate prices change
significantly, buyers/sellers have a headache thinking about when they can buy or
change the price appropriately This section describes the previous work done by
several researchers in the selected domain of housing price prediction Following are
the contributions of various researchers in this domain:
In 2016, Hujia Yu and Jiafu Wu are students studying at Stanford Universityapplied Regression and Classification on real estate prices [1] House prices areforecasted with copious regression techniques including Lasso, Ridge, SVMregression and Random Forest According to this paper, for a regression problem, themost effective is SVR with Gaussian kernel with RMSE of 0.5271, however,visualization for SVM was difficult due to its high-dimensionality Following its
analysis, living area square feet, the material of the roof, and the neighborhood have
the greatest statical significance in predicting a houses sale price
In the 2018 Second International Conference on Inventive Communication andComputational Technologies (ICICCT), a group of students studied at KJ Somaiya
College of Engineering used Machine Learning and Neural Networks to house price
prediction [2] This paper aims to make evaluations based on every basic parameterthat is considered while determining the price This model used various regressiontechniques in its pathway, and the results are not solely determined by one technique
rather it is the weighted mean of various techniques to give the most accurate results
The results proved that this approach yields minimum error and maximum accuracythan individual algorithms applied
Besides, the prediction of house prices can be considered as a regression problem
when there is a sufficient transaction and characteristic information of houses There
have been many studies to prognosis housing prices through simple machine learning
techniques (such as some examples above) or deep learning neural networks with
comparably simple structures [3],[4] In recent years, some researchers have pondered
Trang 16the effect of temporal features on house prices, while fully pondering other houses
characteristics, and using time series models to predict housing prices
Predicting real estate prices is a challenging task that can be inherently inaccuratedue to the dependence on a multitude of factors, including location, the property'sstyle and value, economic and political conditions, and so forth Therefore, the task
of predicting real estate prices requires regular updates and verifications to ensure
accuracy.
From the information sources that can be gathered, I would take the upgrade
versions of Recurrent Neural Network (RNN) in deep learning to predict time series,
namely Long-Short Term Memory for a faster training runtime, long time storage andprevent the vanishing gradient that the RNN algorithm had Additionally, the dataset
is crawled from https://batdongsan.com.vn because the official website is the mostpopular chosen by many people and has associated licenses/certifications
There are several reasons why deep learning models such as LSTM may be asuperior choice for real estate price prediction compared to basic models like LinearRegression or Logistic Regression:
= Superior performance: The Deep Learning model can create more complex
models with the support of multiple layers of neurons, enabling it to learnfrom your data in a more sophisticated manner and attain superior outcomes
= Automated learning process: Deep Learning models are capable of
automating the learning process and adjusting model weights, allowing them
to automatically identify important features and relationships betweenfeatures in the data
= High Availability: The Deep Learning model has the capability to learn from
complex structured data, enabling it to make more accurate real estatepredictions with complex structured data
The complexity is illustrated by the fact that the real estate price is a diverse andchanging indicator in recent years It can increase or decrease based on many factors,
including the financial market, the real estate market, social and geographical factors,
legal regulations, etc It can change yearly or monthly and may vary depending on
location and market
However, LSTMs also have some disadvantages, including longer learning times
and less cost-efficient computation Furthermore, if the data lacks the relationship
Trang 17between events over a long period of time, LSTMs may not perform optimally if the
data is complex and lacks such relationships
e Data preparation: To fit the parameter of the model, the raw data needs to be
pre-processed It is required to remove redundant attributes and convert them into
appropriate data type Figure 2-1 shows the process of data preparation after the
data is crawling from the website
e Building model: When the data is ready, multiple algorithms are included in the
calculation after data is trained to get the best-performing one for making forecast
Figure 2-2 shows the process of applying predictive algorithms
Develop Detection or Prediction Model
D)| Tran Model || eae §
Sensor data from machine on which algorithm is deployed
Identify Condition
2 Preprocess b
Data Indicators
Figure 2-2: Process Flow of Prediction Model [6]
Trang 182.2 Data collection and data set generating
2.2.1 Data collection
The thesis mainly concentrates on the real estate of the Vietnamese people
market, specifically HCMC, so although there are many websites for users to
look up information, retrieving their data is a difficult problem The data wascrawled from https://batdongsan.com.vn/ - a website that pursues collectingstatistical data on the fields of Real estate Figure 2-3 illustrates the homepage
of the website after access
Hay dé chúng tôi dẫn lối
a Poin O84 VMứ OF Ove Tam Và Tang
Figure 2-3: Homepage of https://batdongsan.com.vn/
Due to the limited time to complete the thesis, it is not enough to cover alldistricts in Ho Chi Min city, so I took only district 7 to train the suitable modelused The reason for chosen the district because this is the place where manyapartments are concentrated to exploit and analyze data suitable for the usagemodel
The crawled dataset is exported into an excel (.csv) file and there are 451.608records with 50 attributes which have a lot of information and several missingrecords as well as false information First of all, I assembled all the datasets intoone and sorted them by the names of the projects After that, I use the imputation
Trang 19method, which synthesizes statistically relevant data for missing values At the
same time data normalization, which organizes data for more efficient accessed
by the redundant records were removed to clean the data and some of theattributes also were formatted for verification better to fit the models When thedataset is ready, the file will be converted into csv file for training format
suitability, a thorough explanation is available in Section Exploratory Data
Analysis Result Figure 2-4 illustrates the raw data when downloaded and
figure 2-5 shows data has been processed.
tps: Tes batdongson coms [tse batdongsan com.
tp: batdongson coms [tse batdongsancom
'Mdge//letbatdongan coms [tpt batdongsancom,
i
tps: [tet batdongsan coms t/t batdongsan com
'NEye/letbatngran<om[hp+//He batdengsan com, htp://es batdongsoa coms Pits batdongsan com
pet nataongron coms Pte betdongan com tps://fles batdongsan coms [tse batdongsan com,
tps:/esbatdongsen coms Pits batdongsancom,
ụ
ụ
0
0 0 tps: /tes batdongson coms tps tbatdonsan com
tp/edbataongron coms [tps batdongancom.
tps://fesbatongsan coms [its batdongsan com
tps:/iea batdongson coms tse batdongsan com tps: batdongsan coms hits etbatdonsan com,
ˆMfpc/Ietbatdenga coms [tp batdongan com
tp:/les batdongson on v[NHpe//Be batdongsan com
tp: batdonpsan com’ [ps ied batdongsan com
tpt btaongran coms tor et batdongan com
tps: btdongse coms [tse batdongan com, tp:/edbataongsen coms [tse batdongancom tps://flesbatongsan.coma Pits batdorgsan com
ụ
htps://fet batdongsencoms [tse btdongtanzom tps://tles batdongsan coms tse batdongsan com tps://es batdongsan coms Pits et batdongsancom upd bataongsa coms [tse atdongsan com,
is
= me do 26 a4
”
” sả.
me ne 32 se 2 Ps 2s
2 + 3 2S
59 Hồ Chi Minh Quận 7
Png cho tht quận 7, đường 8000050 Phòng cho thu Quận 7 đườc
Sử hữu ân nhà MT đường thự 8000333 **°"* Tạ lẹc tị tu dân
(ho tha nhànguiêncân mặt 8000950 Cho thuê nhà nguyên ce mã 47498m- Nhà mật tần ho th 8000951 "Cho thuê mặt ng
[Mé bản CH Rivera font 2, 3000565 Ci db tw mờ bản ăn hộ
án gấp đặt thổ c mặt Ng 8000566 Bán tp đất hổ cơ mặt tân
Công tô cần bến mt ih đi 3000572 Côn tô căn bn một Họ,
‘ho thuê CH Sune Cy IPN 8000576 Chuyên cho thuê căn hộ co Bán cia hộ điện ch 98m2 HE 8000658 Cân bán cân hộ chung x
(ho i thuê phòng tron nhỉ 8000848 Nhà nguyên cn dn ch Cho thuê phòng máy lạnh her 8000864 Cho thuê png trọ mây
"Cân 1-200 ghép phông tọ đ 8000956 Cân gấp đến 3 nữ ở hd, (ho tho phông chang cư Balle 8000968 Chính chi cho ngườt nước
‘Bin cân hộ hung aM Ph: 8001034 Gia nh mình din căn hộ
‘Bin gp cn cao ấp Survie 8003007 Chủ ah cn ản gấp căn hộ
‘in hh? hương ty hm 8004010 Bá nhà hướngchnh ye
chang cơ Grand View A ăn bi 8004059 Cân bán cin hộ cocấp,
Cân ho thuê ig thy an Thể 8004067 Cn đo thuê bật tự Nam Tỉ (ho the cn hộ Grand View © 8005018 Cn Wf view trực đện sông
cho th bit thy Hg Til W 8005022 Co thud tty Hưng Thả,
‘Bn gấp là đặ th elm mặt 8005078 Cn bin ht đấlớ dn
(ho thu ca hộ Hung Vượng 8009099 Cho thuế cân ộ iện kh Hơn
Căn bận lệ thự Mỹ Văn 2 Ph 8005103 à dự ân bật thự đọ hủ Mỹ
‘Bin gp thà hôn e hơi đường 800510% Cân bả gi hà én xe hơi
Căn ch that gt thy pS vv 8005111 Cân ho thuê bật tự phố(ho thu bật thự tạ MP 8005137 Ch thuê bật thợ mi ti Bán đố góc ông vdn Phi Mỹ 8005140 Bá ft góc côn văn khu Bán gấp CHcC Hưng Vượng 1, r 8005141 Cân bá gấp căn bộ Marg
{inh An Gia Skye 7 nơi 8005164 ** Vi độ ôn 3 trọng 1:
Bán bgt thự song lp Mỹ im, 3005186 Bán bật hy song do Mỹ.
án ch hộ hụng cự tạ Mung \ 8005185 Cân bí gấp cả bộ Hưng Wa
Figure 2-4: The raw dataset
2 sO6066268en0AESIS Mts btdongran coms tps/led batdongsan.com
tps /tlesbatdongsancoms [tps batdongsancom 52
‘tps /le batdongsan coms [tsi batdonsan com
‘upset batdogsan coms [ple batdongsancam
tpe:/tle bataongeancoms [Mtl batdongan com
‘ps batdogsan.coms [tps le batdongsancom
‘noses batdogsan coms [tps://le batdongsancam
tpe//etbataongsancoms [tpi batdongan com
Thông cho thuê quận, đường 8000090 Phòng cho thuê Quận7, da
‘En ăn nhà MT đường by 8000222 °°" Tạ lạ tj khu dân $000
(ho thu aha niên căn mặt 9000550 Cho thu‡nhà nguyên ân mộ 9
-4759šm- Nhà mặt tần ho 8000551 "Co thuê mặt bằng „
MMở bin Ciera Piet 2P, 8000565 Chi đà tự mở bán căn hộ 3400,
án gốp đất th c mật ăn Ne 8000566 Bán gấp đấtthổ cơ mặt tân - 30000
“Công t tà cân bán một lu đi 8000572 Côn ytôicồ bin một thụ 120000
‘ho thu CH Sunrise Gy 1PN-Z 8000506 Chuyên đo thuê căn hộ ao 19
‘in căn hộ đện tic 92 thể 8000658 Cân bán cin hộ hung dc 1500
(Oho hut phông tron nhỉ 8000848 Nhà nguyên ăn dn teh
(ho thuê ghông mấy lạ her 3000868 Cho thud png trọ máy 23 (Gn 13 n gh phòng tr đ 8000956 Ch ấp 3 đn 3 nữ Behe, OS (oth phông ung cự Bele 8000068 Chink củ ho người nước
‘Bin cânhộ ưng cư Mỹ nó 4: 8001036 Ga ảnh minh ncn
‘Bin gấp cn ộ ho cp Sunfee 8003007 Chủ nhàn bán gấp ăn hộ —— 3500
‘in nhà 2 hướng ty hm € 8004010 Bán nhà hướng chin ye 3600
“Chung cự Grand View A cn bi 8004059 Cân bin ap căn hộ co cấp —_ 4600
{Gn cho thuê bgt thự Nam The 8004067 Gn cho th thy Nam TỊ - 2695 (ho thu cin hộ Grand View C 8005018 Cin view tye ei tông 50 (Oo thu ity Hưng Th, W 8005022 Cho thuê thy Hag Th, 2814
‘dn gp đt th calm mặt 8005078 Cân bin gp đấlớn wen 9900
(ho thuê ân hộ Mung Vượng 28005059 Cho thud cha hộ tên h in tơ
ân bán bật thự Mỹ Văn 2,Ph 8005102 à ự ăn bt thựdo PRA Mỹ - 12300
tản gấp nhà hime hơi đường 8005108 Cân bán ốp nhà hẻm xe hơi — 2660
“ân co tht iệ thự phố vườ 8005111 Cân co thud bật thự phố 30
Cho thuê it thự mới MFP 8005137 Co that thự mới as
‘in đế ic cg iễn-Phú Mỹ 8005140 Bán ít góccông vir hw 13000
tán gấp CHCC Hưng Vượng 1, 8005141 Cân bá lp cá hộ lưng 1820 Căn hộ an Gia 8yineQ7nợtc 8005164 +" Vit độ tên tong — 2000 tán bệ thự Song lp Mỹ im, | 3005186 Bán bật tự song lập Mỹ 16300
án căn hộ dung cự Hơn \ 8005195 Cân bán gấp cia hộ Hưng We 1800
Figure 2-5: Processed Dataset
46
sau, 23H 2661)
By
as}
2H 1638)
asi)
Trang 202.2.2 Exploratory- Data Analysis
2.2.2.1 Number of posts factor
Often buyers will notice and are always impressed by articles posted by
an organization in large numbers to create a behavior for customers Thefollowing is the number of statistics and the number of articles according to
the category place to analyze I have 5 category place such as apartment,
street house, home, boarding house and store:
200000 mm CateName 175000
150000 125000 100000 75000 50000
Ban nhà biệt thự, liền k
Figure 2-6: Number of category post’s sale
Trang 21Cho thuê nha mặt phổ Cho thuê nhà riêng Cho thuê cửa hang, ki 6t |
Figure 2-7: Number of category post’s rent
In the dataset, there are many kinds of real estate market, but most of the
data focus on the apartment model for the following reasons Firstly, viewers
tend to remember the kind of place when reading many related articles At
the same time, it shows the customer's interest in the model and scale of the
real estate for their needs The above 2 diagrams prove that the trend of
choosing the apartment model is interesting and developed by many
customers and investors Especially in Vietnam, they will look for
apartments to buy/rent to settle down, compared to price and demand, mostmodels are for those with stable economic conditions or to expand theirbusiness The figure below will simulate how large the proportion of
apartments in the real estate market is:
Trang 22Hồ Chí Minh Quận 7
Ban can hộ chung cư
Bán đất nền dự an
Bán nhà mặt phố.
Bán nhà riêng Bắn nhà biệt thự, liền kế
Figure 2-8: The proportion of post's sale
Hồ Chi Minh Quận 7
Cho thuê cãn hộ chung cư
Cho thuê cửa hang, ki 6t Cho thuê nhà trọ, phòng tro
Cho thuê nhà riêng
Cho thuê nha mặt phố
Figure 2-9: The proportion of post's sale
2.2.2.2 Average price factor
The development and change of rental and sale prices of apartments over
the years shows how the apartment model occupies an important part in thereal estate market At the same time, buyers tend to be interested in the scale
of real estate, so developers are constantly promoting development
Trang 23Hồ Chí Minh Quận 7: Giá Trung Bình Bán căn hộ chung cư
Tiệu/m2
28
26
2015 2016 2017 2018 2019 2020
Figure 2-10: Average price of sale
Hồ Chí Minh Quận 7: Giá Trung Bình Cho thuê căn hộ chung cư
Figure 2-11: Average price of rent
Figure 2-10 demonstrates that the average sale price of apartments isincreasing year by year as demand increases, and so does the price In laodongnews [20], because the tight supply and fluctuating material prices are causingthe recent apartment prices to fluctuate continuously Many experts believe thathouse prices in Vietnam are about 20 times higher than the average income ofsociety While in developed countries, house prices are only about 7-10 times
higher than the average income Rents were active in the real estate market, but
there is a slight decrease year by year (figure 2-11) In Vietnam, especially GenZyoung people often tend to find themselves an apartment instead of a house,
Trang 24because they must spend a lot of time on research, pricing and long-term
planning to get the house that they want Therefore, the need to have a pre-built
apartment with furniture, everyone owns it sooner or later [21]
2.2.2.3 Location factor
In the essay, I mainly focus on District 7 because this place is likened to
a Singapore in the heart of Saigon with many apartment projects growing
up year by year as the previous factors mentioned Figure 2-12 is the
Figure 2-12: The distribution of project real estate
In addition, the project owners are also active on the website with thenumber of posts stretching from 2015-2020 (figure 2-13, 2-14) Show thatthey have always invested in real estate based on people's needs and the
trend is gradually moving to apartment buildings to live and settle down
Trang 25Hồ Chí Minh Quận 7:Bán căn hộ chung cư
mm Number of Posts 14000
Hồ Chí Minh Quận 7:Bán căn hộ chung cư
“BE Number of Posts
Trang 262.3 Long-Short Term Memory (LSTM) model and Evaluation Metrics Used
2.3.1 Recurrent Neural Network (RNN)
Recurrent Neural Network (RNN) is a type of artificial neural network thatuses sequential data or time series data and where connections between unitsform a directed cycle [12] RNNs can use their internal memory to processarbitrary sequences of inputs Thus, can learn from previous iterations during itstraining
So, in theory, RNNs can carry information from previous layers to laterlayers, but in reality, but limited to a certain number of states, after which it will
have a vanishing gradient, or in other words is that model only learns as long as
states are close to it
Trang 27Let's take the example of short-term memory The problem is to predict the
next word in the passage In the first paragraph "My home is in the direction of
", Ï can just use the previous words in the sentence to guess east However,with the passage, “Before I lived in district 3 My house is in district 7 My home
is in the direction of ” then using only the word in that sentence or the previous
sentence is impossible to predict the word to be filled in is east This explains
the vanishing gradient of the RNN, so LSTM was born to overcome thevulnerability of RNNs that can't be done
2.3.2 Long-Short Term Memory
Long-Short Term Memory, also abbreviated as LSTM, is a specialarchitecture of RNNs capable of learning long-term dependencies introduced byHochreiter & Schmidhuber (1997) LSTM has proved to overcome many
limitations of the previous RNN in terms of vanishing gradient The LSTM
network is an artificial neural network that includes LSTM units instead of, or
in addition to, other network units The LSTM unit is a recurrent network unit
that exceptional at remembering values for either long or short durations of time[14]
In the t state of the LSTM model:
- Output: cht, I call c is cell state, h is hidden state
- Input: cri, hei, Xt, where
Trang 28e x:is the input in the t" state of model ci.1,
¢ hrs is the output of layer before, h plays the same role as s in RNN,
while € is the new point of LSTM
The LSTM (Long-Short Term Memory) is a type of Recurrent Neural
Network (RNN) widely used in processing and predicting time-series data It is
characterized by the following features:
Memory Cell: LSTM has a "memory cell" to retain information throughtime steps
Gates: The LSTM has three gates, the "Input Gate", the "Forget Gate",and the "Output Gate", to control the process of storing and transmitting
information in the Memory Cell
Status: LSTM has two states, the "hidden state" and the "cell state," to
retain information over time steps
Supports long-term data: LSTM can handle long-term data as it can retain
information through time steps without losing or diminishing it due to the
"vanishing gradient problem"
LSTM (Long-Short Term Memory) is a deep learning algorithm used to
predict time series, a type of continuous data with a time component The
characteristics of LSTM make it better at processing data with temporal
relationships between data points and can avoid the vanishing gradient problem
when training the model on long-form data
The impact of training time and input data volume on the quality of real estateprice prediction made by the LSTM model:
Training time: The LSTM model requires a training time to accurately
predict real estate prices If the training time is too short, the model willnot learn enough about the data patterns and may lead to incorrectpredictions On the other hand, if the training time is too long, the modelmay learn too much about the input data and not naturally adapt to newdata patterns, resulting in overfitting
Input data volume: The amount of input data is also an important factorfor the quality of LSTM's real estate price prediction If the amount of
input data is too little, the model won't learn enough from the data
Trang 29there's too much input data, the model may take too long to run while
learning from the data
The LSTM model has some advantages over other models in processing and
predicting time-series data:
e Medium length signal processing: The LSTM model has some
advantages over other models in processing and retaining long-term
signals in time series data
e Solving the vanishing gradient problem: The LSTM model effectively
solves the vanishing gradient problem, a common issue in processing
time-series data using traditional neural network models
e Ability to handle null input signal: The LSTM model can handle missing
input signals better than other models
The LSTM model has some disadvantages, such as slow running speed and
the need for a large amount of input data to learn well Therefore, the choice ofthe most suitable model depends on specific requirements and input data
2.3.3 Performance Metrics
2.3.3.1 MSE
The mean squared error (MSE) will let you how close a regression line
is to a set of points It does this by taking the distances from the points tothe regression line (these distances are called the “errors”) and squaringthem The squaring is essential to remove any negative signs It also givesmore weight to the larger discrepancies It is called the Mean Squared Error(MSE) as you’re finding the average of a set of errors The lower the MSE,
the better the forecast [15] The formula is:
XỚi — 9)?
n (2.1)MSE =
Where:
e yiis the i observed value.
Trang 30e Jiisthe corresponding predicted value.
e nis the number of observations
RMSE
Root Mean Square Error (RMSE) is the standard deviation of theresiduals (prediction errors) The residual is a measure of the distance fromthe data points of the regression line; RMSE is a measure of the spread ofthese residuals In other words, it tells you how concentrated the data isaround the line of best fit Root mean square error is commonly used inclimatology, forecasting, and regression analysis to verify experimentalresults [16] The formula is:
2
RMSEø = [pi á-sa) |
(2.2)
Where:
e >= summation (“add up”)
° (z, = Zoj) ` = differences, squared
e N=sample sizeYou can use whichever formula you feel most comfortablewith, as they both do the same thing If you don’t like formulas,you can find the RMSE by:
1 Squaring the residuals
2 Finding the average of the residuals
3 Taking the square root of the result
Trang 31CHAPTER 3: EXPERIMENT RESULT
3.1 Technology used
e Programming language:
e Python is an interpreted, object-oriented, high-level programming language
with dynamic semantics Python supports modules and packages, encouragingprogram modularization and code reuse The Python interpreter and extensivestandard library are available as free source or binary for all major platformsand can be distributed for free [17]
e Experimental running environment:
[>] import platform
my_system = platform uname()
print(f"System: {my_system.system}”) print(f"Node Name: {my_system.node}") print(f"Release: {my_system.release}")
print(f"Version: {my_system.version}")
print(f"Machine: {my_system.machine}") print(#"Processor: {my_system.processor}")
Trang 32Total size of Disk : 136.7 GB (49.1 GB Used)
Total amount of Mem : 12985 MB (1159 MB Used) Total amount of Swap : @ MB (@ MB Used)
system uptime : @ days, @ hour 1 min
Write performance : 192M8/s
Write TOPS : 25.0k Speedtest
Node Name Ipv4 address Download Speed
main: line 44: ping: command not found
CacheFly 15 6MB/s
main: line 44: ping: command not found
Wultr, Los Angeles, CA 15.1MB/s
main: line 44: ping: command not found Wultr, Seattle, WA 11 5MB/S
main: line 44: ping: command not found
Linode, Tokyo, 3P 28.9B/s
main: line 44: ping: command not found
Linode, Singapore, SG 333MB/s main: line 44: ping: command not found