1. Trang chủ
  2. » Luận Văn - Báo Cáo

Khóa luận tốt nghiệp Hệ thống thông tin: Real estate forecast in area by recurrent neural network model based on long short-term memory

65 0 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Real Estate Price Forecast in District 7 at Ho Chi Minh City by Long Short-Term Memory Model
Tác giả Do Hoang Hiep
Người hướng dẫn Dr. Cao Thi Nhan
Trường học University of Information Technology
Chuyên ngành Information Systems
Thể loại Thesis Graduation
Năm xuất bản 2021
Thành phố Ho Chi Minh City
Định dạng
Số trang 65
Dung lượng 24,57 MB

Nội dung

VIET NAM NATIONAL UNIVERSITY HO CHI MINH CITYUNIVERSITY OF INFORMATION TECHNOLOGY ADVANCED PROGRAM IN INFORMATION SYSTEMS DO HOANG HIEP THESIS GRADUATION REAL ESTATE PRICE FORECAST IN DI

Trang 1

VIET NAM NATIONAL UNIVERSITY HO CHI MINH CITY

UNIVERSITY OF INFORMATION TECHNOLOGY

ADVANCED PROGRAM IN INFORMATION SYSTEMS

DO HOANG HIEP

THESIS GRADUATION

REAL ESTATE PRICE FORECAST IN

DISTRICT 7 AT HO CHI MINH CITY BY

LONG SHORT-TERM MEMORY MODEL

BANCHELOR OF ENGINEERING IN INFORMATION SYSTEMS

HO CHI MINH CITY, 2021

Trang 2

VIET NAM NATIONAL UNIVERSITY HO CHI MINH CITY

UNIVERSITY OF INFORMATION TECHNOLOGY

ADVANCED PROGRAM IN INFORMATION SYSTEMS

DO HOANG HIEP-18520726

THESIS GRADUATION

REAL ESTATE PRICE FORECAST IN

DISTRICT 7 AT HO CHI MINH CITY BY

LONG SHORT-TERM MEMORY MODEL

BANCHELOR OF ENGINEERING IN INFORMATION SYSTEMS

THESIS ADVISOR

Dr CAO THI NHAN

HO CHI MINH CITY, 2021

Trang 3

ASSESSMENT COMMITTEE

The Assessment Committee is established under the Decision

-date by Rector of the University of Information Technology

L ÒÔ — Chairman

— Secretary

+ — Member

Trang 4

First of all, I would like to express my gratitude to Dr Cao Thi Nhan for being morethan just my thesis advisor during our graduation thesis but since I joined the University of

Information Technology as a student of the faculty of Information Systems With patience,

motivation, and immense knowledge, she helped me keep track of the research direction

and gave me lots of advice to complete the thesis Furthermore, she carefully reviews my

thesis, and for all the insightful comments, suggestions, and corrections

Besides, I would like to extend my sincere gratitude to Dr Ngo Duc Thanh for your

invaluable guidance and support during the completion of my thesis Your expertise and

insightful suggestions have greatly contributed to the successful outcome of my work

I am very grateful to my university and faculty for allowing me to prepare my

graduation report and to meet the teachers who guided me each semester so that I have

enough knowledge to do this essay

Trang 5

TABLE OF CONTENTS

TABLE OF CONTENTS cccssssssssssssssssscnssssnsenssecnsensssenscncassncssencencasencnsenssecnsencsecnsensseeases 2LIST OF FIGURES .ccsssssssssssssssssenssscseenssscnessssecsesncseenssucassscseenesscnsssencnesscnesscsecsesscseseese 4LIST OF TABLEG ccscsssssssssssscssesessesessesscsessssscsesaesessesueseensseesesnesesnesessesnesesseseonsnesesanee 6LIST OF ABBREVIATIONS cccssssssssssssssssscsssensessenscsscssscncensenscscsssencenseasensesscsncenceneenes 7ABSTRACT sessssssssssssscssescssenssscnsencssenssscnsescsecnssscnsensscnsensaecascncasencasenscucnsencasensencnsencneenetee 8CHAPTER 1: INTRODUCTION ccsssssssssssssssesssscsessssscnssscseenssscnsencnecnesscnesncseenesecseseese 9

1.1 Background c.ccscecccccscsceescscsesesesesesnesesesescsnsnessscseeeesesssesssnesesssssesnenesesssesesneseess 91.2 Objective and scope

2.3 Long-Short Term Memory (LSTM) model and Evaluation Metrics Used 23

2.3.1 Recurrent Neural Network (RNN) cccccccccssesseseseeseseeseeeeseeseeeeesesseeeeseseeaees 232.3.2 Long-Short Term IMeImOTy - - - ¿5+ EeEkSkEk‡kEkEEEEEkekekekkrkrkrketee 24

3.1 Technology used

3.2 Data description

3.3 Data pre-processing

3.4 Experiment processes c.ccccssesessesesseceseseseseeneeerseecseseneenesesesseseeeeneneesseeeeeeaeee 373.5 Experiment r€SUÏL +2 1v n TT HH 42CHAPTER 4: CONCLUSION cscsssssssesssssssssssssssessssesssssssssnssessssnssesssensseessseesssessessseeses 57

Trang 6

42 Limitations and challenges -¿- + xxx vs rrrrrerekekrkrkrkrkrerrre

443 l000Ề/0i c8 e REFERENCES 5< 5< 5 HH TT H01404 01 0000140800080004.8 080000

Trang 7

LIST OF FIGURES

Figure 2-1: Data Preparation Process [5] ::cccccseseseseeeeesseeeeeeeeneneeeeseeeseneaeee 4Figure 2-2: Process Flow of Prediction Model [6] ¿5-5 << 4Figure 2-3: Homepage of https://batdongsan.com.VI/ :.ccccceseeeeeeeeeeeeeeeneee 5Figure 2-4: The raw dataset

Figure 2-5: Processed Dataset

Figure 2-6: Number of category post’s sale 0cccecsceeesesseeeseseesesteseeeseeseeee 7Figure 2-7: Number of category post’s F€II( - 5 +5 tt srverrreeeerkrvree 8Figure 2-8: The proportion of post's sale

Figure 2-9: The proportion of post's sale

Figure 2-10: Average price of saÌle - - + ưu 20Figure 2-11: Average price of T€IIK - + + tt St ng reFigure 2-12: The distribution of project real estate

Figure 2-13: Number of posts sale following by project

Figure 2-14: Number of posts rent following by project

Figure 2-15: Sale price of pÏaCe - ST HH HH ng HưFigure 2-16: Format data price

Figure 2-17: RNN model [13]

Figure 2-18: LSTM model [13]

Figure 3-1: Environment running

Figure 3-2: Configuration

Figure 3-3: Step by step of experiment processes

Figure 3-4: Add libraries to practice LSTM model

Figure 3-5: Dataframe of sale

Figure 3-6: Normalize the rent dataset

Figure 3-7: Dataframe of rent

Figure 3-8: Normalize the rent dataset

Figure 3-9: Create and reshape sale matrix

Figure 3-10: Create and reshape rent matrix

Figure 3-11: Create and fit LSTM model

Figure 3- 12: Training MSE and validation MSE [12]

Figure 3-13: MSE of sale and rent

Figure 3-14: The predicted result of sale

Figure 3-15: The predicted result of sale

Figure 3-16: The selling price data in 2015 “8008092” [25]

Trang 8

The selling price data in 2015 “8004059” [27]

The stored data use for train model in 2015 “8004059”

The selling price data in 2023 “31928898” [29]

The rent price in 2016 “8000576” [30]

The stored data use for train model in 2016 “8000576” The selling price data in 2023 “32971380” [31]

The rent price in 2016 “8006099” [32]

The stored data use for train model in 2016 “8006099” The selling price data in 2023 “36228195” |34], ‹-+

The selling price of apartments in Ho Chi Minh City increased faster

than the rental price [22] 0.0 cece cece + + SE 2 T102 21 1H 0111011 trên 55

Trang 9

LIST OF TABLES

Table 2-1: Data Description

Table 3-1: The result 1* experiment of model evaluation parameters

Table 3-2: The closest and highest mean value 1* experiment

Table 3-3: The result 2" experiment of model evaluation parameters

Table 3-4: The closest and highest mean value 2TM experiment

Table 3-5: The result 3" experiment of model evaluation parameters “ Table 3-6: The closest and highest mean value 3 experimen( .

Trang 10

Long-Short Term Memory

Recurrent Neural Network

Mean Square Error

Root Mean Square Error

Ho Chi Minh city

Trang 11

To study the impact of factors on housing prices, I propose to build different predictive

models based on deep learning to identify existing real estate data to predict prices more

accurately housing or its changing trends in the future Considering that the factors that

influence housing prices vary widely, the proposed predictive models fall into two

categories The first is based on many factors that are characteristic of real estate The realestate market is one of the most price-focused and volatile This is one of the key areas forapplying machine learning ideas on how to enhance and predict costs with high accuracy

This examination means to predict house prices in Ho Chi Minh city, specifically district

7 with Long-Short Term Memory model It will help clients to put resources into a request

without moving towards a broker The result of this research proved that the model gives

the highest accuracy

Trang 12

The main goal of this thesis is to present a methodological framework for usingmodern machines and deep learning techniques to incorporate external data, in realestate price forecasting To experiment with the forecasting ability of Long-ShortTerm Memory (LSTM) methodology, it helps buyer/renter and seller, especiallythose who don't know a lot of information to make a more precise choice and riskreduction This model is suitable for very large data sets, which can be arbitrarily

expanded, which is also the advantage of applying information technology in data

analysis

Due to the needs of new graduates, they often have a need to find themselves a

place to stay/apartment after their term in the school dormitory expires Additionally,

real estate prices change every day, especially during relocation seasons Usually, the

average young person will stay for about 1 year before moving to a new place

Initially, Recurrent Neural Network (RNN) is used to predict some value using

their internal memory to process arbitrary sequences of inputs (short-term) However,

the simple RNN system is not good enough to do the prediction, so I decided to use

the more complicated LSTM architecture in RNN system excellent at remembering

values for either long or short duration of time To have a result better, I must create

Trang 13

predicion models specifically designed for the real estate market by collecting

abundant data from public data Additionally, future studies should attempt to extract

parameters that can ensure higher reliability by performing numerous simulations,

even if via a trial-and-error approach

The application will support users:

¢ Know the market price trend

e Make a plan to manage your finances

The data used in this thesis is available from the websitehttps://batdongsan.com.vn and mainly focus on district 7 at Ho Chi Minh city Inparticular, 2 typical transactions of real estate, namely rent and sale

The full list of data variables is given in Section 2.2.1

There are various considerations influencing the price of properties The price ofreal estate is influenced by several important factors like:

e Location factor

e Trending buy/rent factor

e Sale unit price factor

1.2 Objective and scope

1.2.1 Objective

e Understand the implementation of business data analysis and machine

learning on providing results

¢ Develop price prediction model based on attributes of district 7: price, price

level, sale unit price, unit price, city, district, ward, category, start date, end

date, lat, long

se The method helps predict future prices and know distribution of

apartment/condominium in district 7

1.2.2 Scope

Using Long-Short Term Memory Algorithm for the value predictor of real

estate prices Besides, RMSE is the main metric used for an evaluation in terms

of the efficiency

Trang 14

1.2.3 Thesis structure

e Chapter 1: Introduction

¢ Chapter 2: Literature review and theorical background

e Chapter 3: Experimental result

¢ Chapter 4: Conclusions

Trang 15

CHAPTER 2: LITERATURE REVIEW AND THEORICAL

BACKGROUND

2.1 Related work

With today's technology area, the application of machine learning in all fields is

essential, especially in the real estate sector Because real estate prices change

significantly, buyers/sellers have a headache thinking about when they can buy or

change the price appropriately This section describes the previous work done by

several researchers in the selected domain of housing price prediction Following are

the contributions of various researchers in this domain:

In 2016, Hujia Yu and Jiafu Wu are students studying at Stanford Universityapplied Regression and Classification on real estate prices [1] House prices areforecasted with copious regression techniques including Lasso, Ridge, SVMregression and Random Forest According to this paper, for a regression problem, themost effective is SVR with Gaussian kernel with RMSE of 0.5271, however,visualization for SVM was difficult due to its high-dimensionality Following its

analysis, living area square feet, the material of the roof, and the neighborhood have

the greatest statical significance in predicting a houses sale price

In the 2018 Second International Conference on Inventive Communication andComputational Technologies (ICICCT), a group of students studied at KJ Somaiya

College of Engineering used Machine Learning and Neural Networks to house price

prediction [2] This paper aims to make evaluations based on every basic parameterthat is considered while determining the price This model used various regressiontechniques in its pathway, and the results are not solely determined by one technique

rather it is the weighted mean of various techniques to give the most accurate results

The results proved that this approach yields minimum error and maximum accuracythan individual algorithms applied

Besides, the prediction of house prices can be considered as a regression problem

when there is a sufficient transaction and characteristic information of houses There

have been many studies to prognosis housing prices through simple machine learning

techniques (such as some examples above) or deep learning neural networks with

comparably simple structures [3],[4] In recent years, some researchers have pondered

Trang 16

the effect of temporal features on house prices, while fully pondering other houses

characteristics, and using time series models to predict housing prices

Predicting real estate prices is a challenging task that can be inherently inaccuratedue to the dependence on a multitude of factors, including location, the property'sstyle and value, economic and political conditions, and so forth Therefore, the task

of predicting real estate prices requires regular updates and verifications to ensure

accuracy.

From the information sources that can be gathered, I would take the upgrade

versions of Recurrent Neural Network (RNN) in deep learning to predict time series,

namely Long-Short Term Memory for a faster training runtime, long time storage andprevent the vanishing gradient that the RNN algorithm had Additionally, the dataset

is crawled from https://batdongsan.com.vn because the official website is the mostpopular chosen by many people and has associated licenses/certifications

There are several reasons why deep learning models such as LSTM may be asuperior choice for real estate price prediction compared to basic models like LinearRegression or Logistic Regression:

= Superior performance: The Deep Learning model can create more complex

models with the support of multiple layers of neurons, enabling it to learnfrom your data in a more sophisticated manner and attain superior outcomes

= Automated learning process: Deep Learning models are capable of

automating the learning process and adjusting model weights, allowing them

to automatically identify important features and relationships betweenfeatures in the data

= High Availability: The Deep Learning model has the capability to learn from

complex structured data, enabling it to make more accurate real estatepredictions with complex structured data

The complexity is illustrated by the fact that the real estate price is a diverse andchanging indicator in recent years It can increase or decrease based on many factors,

including the financial market, the real estate market, social and geographical factors,

legal regulations, etc It can change yearly or monthly and may vary depending on

location and market

However, LSTMs also have some disadvantages, including longer learning times

and less cost-efficient computation Furthermore, if the data lacks the relationship

Trang 17

between events over a long period of time, LSTMs may not perform optimally if the

data is complex and lacks such relationships

e Data preparation: To fit the parameter of the model, the raw data needs to be

pre-processed It is required to remove redundant attributes and convert them into

appropriate data type Figure 2-1 shows the process of data preparation after the

data is crawling from the website

e Building model: When the data is ready, multiple algorithms are included in the

calculation after data is trained to get the best-performing one for making forecast

Figure 2-2 shows the process of applying predictive algorithms

Develop Detection or Prediction Model

D)| Tran Model || eae §

Sensor data from machine on which algorithm is deployed

Identify Condition

2 Preprocess b

Data Indicators

Figure 2-2: Process Flow of Prediction Model [6]

Trang 18

2.2 Data collection and data set generating

2.2.1 Data collection

The thesis mainly concentrates on the real estate of the Vietnamese people

market, specifically HCMC, so although there are many websites for users to

look up information, retrieving their data is a difficult problem The data wascrawled from https://batdongsan.com.vn/ - a website that pursues collectingstatistical data on the fields of Real estate Figure 2-3 illustrates the homepage

of the website after access

Hay dé chúng tôi dẫn lối

a Poin O84 VMứ OF Ove Tam Và Tang

Figure 2-3: Homepage of https://batdongsan.com.vn/

Due to the limited time to complete the thesis, it is not enough to cover alldistricts in Ho Chi Min city, so I took only district 7 to train the suitable modelused The reason for chosen the district because this is the place where manyapartments are concentrated to exploit and analyze data suitable for the usagemodel

The crawled dataset is exported into an excel (.csv) file and there are 451.608records with 50 attributes which have a lot of information and several missingrecords as well as false information First of all, I assembled all the datasets intoone and sorted them by the names of the projects After that, I use the imputation

Trang 19

method, which synthesizes statistically relevant data for missing values At the

same time data normalization, which organizes data for more efficient accessed

by the redundant records were removed to clean the data and some of theattributes also were formatted for verification better to fit the models When thedataset is ready, the file will be converted into csv file for training format

suitability, a thorough explanation is available in Section Exploratory Data

Analysis Result Figure 2-4 illustrates the raw data when downloaded and

figure 2-5 shows data has been processed.

tps: Tes batdongson coms [tse batdongsan com.

tp: batdongson coms [tse batdongsancom

'Mdge//letbatdongan coms [tpt batdongsancom,

i

tps: [tet batdongsan coms t/t batdongsan com

'NEye/letbatngran<om[hp+//He batdengsan com, htp://es batdongsoa coms Pits batdongsan com

pet nataongron coms Pte betdongan com tps://fles batdongsan coms [tse batdongsan com,

tps:/esbatdongsen coms Pits batdongsancom,

0

0 0 tps: /tes batdongson coms tps tbatdonsan com

tp/edbataongron coms [tps batdongancom.

tps://fesbatongsan coms [its batdongsan com

tps:/iea batdongson coms tse batdongsan com tps: batdongsan coms hits etbatdonsan com,

ˆMfpc/Ietbatdenga coms [tp batdongan com

tp:/les batdongson on v[NHpe//Be batdongsan com

tp: batdonpsan com’ [ps ied batdongsan com

tpt btaongran coms tor et batdongan com

tps: btdongse coms [tse batdongan com, tp:/edbataongsen coms [tse batdongancom tps://flesbatongsan.coma Pits batdorgsan com

htps://fet batdongsencoms [tse btdongtanzom tps://tles batdongsan coms tse batdongsan com tps://es batdongsan coms Pits et batdongsancom upd bataongsa coms [tse atdongsan com,

is

= me do 26 a4

” sả.

me ne 32 se 2 Ps 2s

2 + 3 2S

59 Hồ Chi Minh Quận 7

Png cho tht quận 7, đường 8000050 Phòng cho thu Quận 7 đườc

Sử hữu ân nhà MT đường thự 8000333 **°"* Tạ lẹc tị tu dân

(ho tha nhànguiêncân mặt 8000950 Cho thuê nhà nguyên ce mã 47498m- Nhà mật tần ho th 8000951 "Cho thuê mặt ng

[Mé bản CH Rivera font 2, 3000565 Ci db tw mờ bản ăn hộ

án gấp đặt thổ c mặt Ng 8000566 Bán tp đất hổ cơ mặt tân

Công tô cần bến mt ih đi 3000572 Côn tô căn bn một Họ,

‘ho thuê CH Sune Cy IPN 8000576 Chuyên cho thuê căn hộ co Bán cia hộ điện ch 98m2 HE 8000658 Cân bán cân hộ chung x

(ho i thuê phòng tron nhỉ 8000848 Nhà nguyên cn dn ch Cho thuê phòng máy lạnh her 8000864 Cho thuê png trọ mây

"Cân 1-200 ghép phông tọ đ 8000956 Cân gấp đến 3 nữ ở hd, (ho tho phông chang cư Balle 8000968 Chính chi cho ngườt nước

‘Bin cân hộ hung aM Ph: 8001034 Gia nh mình din căn hộ

‘Bin gp cn cao ấp Survie 8003007 Chủ ah cn ản gấp căn hộ

‘in hh? hương ty hm 8004010 Bá nhà hướngchnh ye

chang cơ Grand View A ăn bi 8004059 Cân bán cin hộ cocấp,

Cân ho thuê ig thy an Thể 8004067 Cn đo thuê bật tự Nam Tỉ (ho the cn hộ Grand View © 8005018 Cn Wf view trực đện sông

cho th bit thy Hg Til W 8005022 Co thud tty Hưng Thả,

‘Bn gấp là đặ th elm mặt 8005078 Cn bin ht đấlớ dn

(ho thu ca hộ Hung Vượng 8009099 Cho thuế cân ộ iện kh Hơn

Căn bận lệ thự Mỹ Văn 2 Ph 8005103 à dự ân bật thự đọ hủ Mỹ

‘Bin gp thà hôn e hơi đường 800510% Cân bả gi hà én xe hơi

Căn ch that gt thy pS vv 8005111 Cân ho thuê bật tự phố(ho thu bật thự tạ MP 8005137 Ch thuê bật thợ mi ti Bán đố góc ông vdn Phi Mỹ 8005140 Bá ft góc côn văn khu Bán gấp CHcC Hưng Vượng 1, r 8005141 Cân bá gấp căn bộ Marg

{inh An Gia Skye 7 nơi 8005164 ** Vi độ ôn 3 trọng 1:

Bán bgt thự song lp Mỹ im, 3005186 Bán bật hy song do Mỹ.

án ch hộ hụng cự tạ Mung \ 8005185 Cân bí gấp cả bộ Hưng Wa

Figure 2-4: The raw dataset

2 sO6066268en0AESIS Mts btdongran coms tps/led batdongsan.com

tps /tlesbatdongsancoms [tps batdongsancom 52

‘tps /le batdongsan coms [tsi batdonsan com

‘upset batdogsan coms [ple batdongsancam

tpe:/tle bataongeancoms [Mtl batdongan com

‘ps batdogsan.coms [tps le batdongsancom

‘noses batdogsan coms [tps://le batdongsancam

tpe//etbataongsancoms [tpi batdongan com

Thông cho thuê quận, đường 8000090 Phòng cho thuê Quận7, da

‘En ăn nhà MT đường by 8000222 °°" Tạ lạ tj khu dân $000

(ho thu aha niên căn mặt 9000550 Cho thu‡nhà nguyên ân mộ 9

-4759šm- Nhà mặt tần ho 8000551 "Co thuê mặt bằng „

MMở bin Ciera Piet 2P, 8000565 Chi đà tự mở bán căn hộ 3400,

án gốp đất th c mật ăn Ne 8000566 Bán gấp đấtthổ cơ mặt tân - 30000

“Công t tà cân bán một lu đi 8000572 Côn ytôicồ bin một thụ 120000

‘ho thu CH Sunrise Gy 1PN-Z 8000506 Chuyên đo thuê căn hộ ao 19

‘in căn hộ đện tic 92 thể 8000658 Cân bán cin hộ hung dc 1500

(Oho hut phông tron nhỉ 8000848 Nhà nguyên ăn dn teh

(ho thuê ghông mấy lạ her 3000868 Cho thud png trọ máy 23 (Gn 13 n gh phòng tr đ 8000956 Ch ấp 3 đn 3 nữ Behe, OS (oth phông ung cự Bele 8000068 Chink củ ho người nước

‘Bin cânhộ ưng cư Mỹ nó 4: 8001036 Ga ảnh minh ncn

‘Bin gấp cn ộ ho cp Sunfee 8003007 Chủ nhàn bán gấp ăn hộ —— 3500

‘in nhà 2 hướng ty hm € 8004010 Bán nhà hướng chin ye 3600

“Chung cự Grand View A cn bi 8004059 Cân bin ap căn hộ co cấp —_ 4600

{Gn cho thuê bgt thự Nam The 8004067 Gn cho th thy Nam TỊ - 2695 (ho thu cin hộ Grand View C 8005018 Cin view tye ei tông 50 (Oo thu ity Hưng Th, W 8005022 Cho thuê thy Hag Th, 2814

‘dn gp đt th calm mặt 8005078 Cân bin gp đấlớn wen 9900

(ho thuê ân hộ Mung Vượng 28005059 Cho thud cha hộ tên h in tơ

ân bán bật thự Mỹ Văn 2,Ph 8005102 à ự ăn bt thựdo PRA Mỹ - 12300

tản gấp nhà hime hơi đường 8005108 Cân bán ốp nhà hẻm xe hơi — 2660

“ân co tht iệ thự phố vườ 8005111 Cân co thud bật thự phố 30

Cho thuê it thự mới MFP 8005137 Co that thự mới as

‘in đế ic cg iễn-Phú Mỹ 8005140 Bán ít góccông vir hw 13000

tán gấp CHCC Hưng Vượng 1, 8005141 Cân bá lp cá hộ lưng 1820 Căn hộ an Gia 8yineQ7nợtc 8005164 +" Vit độ tên tong — 2000 tán bệ thự Song lp Mỹ im, | 3005186 Bán bật tự song lập Mỹ 16300

án căn hộ dung cự Hơn \ 8005195 Cân bán gấp cia hộ Hưng We 1800

Figure 2-5: Processed Dataset

46

sau, 23H 2661)

By

as}

2H 1638)

asi)

Trang 20

2.2.2 Exploratory- Data Analysis

2.2.2.1 Number of posts factor

Often buyers will notice and are always impressed by articles posted by

an organization in large numbers to create a behavior for customers Thefollowing is the number of statistics and the number of articles according to

the category place to analyze I have 5 category place such as apartment,

street house, home, boarding house and store:

200000 mm CateName 175000

150000 125000 100000 75000 50000

Ban nhà biệt thự, liền k

Figure 2-6: Number of category post’s sale

Trang 21

Cho thuê nha mặt phổ Cho thuê nhà riêng Cho thuê cửa hang, ki 6t |

Figure 2-7: Number of category post’s rent

In the dataset, there are many kinds of real estate market, but most of the

data focus on the apartment model for the following reasons Firstly, viewers

tend to remember the kind of place when reading many related articles At

the same time, it shows the customer's interest in the model and scale of the

real estate for their needs The above 2 diagrams prove that the trend of

choosing the apartment model is interesting and developed by many

customers and investors Especially in Vietnam, they will look for

apartments to buy/rent to settle down, compared to price and demand, mostmodels are for those with stable economic conditions or to expand theirbusiness The figure below will simulate how large the proportion of

apartments in the real estate market is:

Trang 22

Hồ Chí Minh Quận 7

Ban can hộ chung cư

Bán đất nền dự an

Bán nhà mặt phố.

Bán nhà riêng Bắn nhà biệt thự, liền kế

Figure 2-8: The proportion of post's sale

Hồ Chi Minh Quận 7

Cho thuê cãn hộ chung cư

Cho thuê cửa hang, ki 6t Cho thuê nhà trọ, phòng tro

Cho thuê nhà riêng

Cho thuê nha mặt phố

Figure 2-9: The proportion of post's sale

2.2.2.2 Average price factor

The development and change of rental and sale prices of apartments over

the years shows how the apartment model occupies an important part in thereal estate market At the same time, buyers tend to be interested in the scale

of real estate, so developers are constantly promoting development

Trang 23

Hồ Chí Minh Quận 7: Giá Trung Bình Bán căn hộ chung cư

Tiệu/m2

28

26

2015 2016 2017 2018 2019 2020

Figure 2-10: Average price of sale

Hồ Chí Minh Quận 7: Giá Trung Bình Cho thuê căn hộ chung cư

Figure 2-11: Average price of rent

Figure 2-10 demonstrates that the average sale price of apartments isincreasing year by year as demand increases, and so does the price In laodongnews [20], because the tight supply and fluctuating material prices are causingthe recent apartment prices to fluctuate continuously Many experts believe thathouse prices in Vietnam are about 20 times higher than the average income ofsociety While in developed countries, house prices are only about 7-10 times

higher than the average income Rents were active in the real estate market, but

there is a slight decrease year by year (figure 2-11) In Vietnam, especially GenZyoung people often tend to find themselves an apartment instead of a house,

Trang 24

because they must spend a lot of time on research, pricing and long-term

planning to get the house that they want Therefore, the need to have a pre-built

apartment with furniture, everyone owns it sooner or later [21]

2.2.2.3 Location factor

In the essay, I mainly focus on District 7 because this place is likened to

a Singapore in the heart of Saigon with many apartment projects growing

up year by year as the previous factors mentioned Figure 2-12 is the

Figure 2-12: The distribution of project real estate

In addition, the project owners are also active on the website with thenumber of posts stretching from 2015-2020 (figure 2-13, 2-14) Show thatthey have always invested in real estate based on people's needs and the

trend is gradually moving to apartment buildings to live and settle down

Trang 25

Hồ Chí Minh Quận 7:Bán căn hộ chung cư

mm Number of Posts 14000

Hồ Chí Minh Quận 7:Bán căn hộ chung cư

“BE Number of Posts

Trang 26

2.3 Long-Short Term Memory (LSTM) model and Evaluation Metrics Used

2.3.1 Recurrent Neural Network (RNN)

Recurrent Neural Network (RNN) is a type of artificial neural network thatuses sequential data or time series data and where connections between unitsform a directed cycle [12] RNNs can use their internal memory to processarbitrary sequences of inputs Thus, can learn from previous iterations during itstraining

So, in theory, RNNs can carry information from previous layers to laterlayers, but in reality, but limited to a certain number of states, after which it will

have a vanishing gradient, or in other words is that model only learns as long as

states are close to it

Trang 27

Let's take the example of short-term memory The problem is to predict the

next word in the passage In the first paragraph "My home is in the direction of

", Ï can just use the previous words in the sentence to guess east However,with the passage, “Before I lived in district 3 My house is in district 7 My home

is in the direction of ” then using only the word in that sentence or the previous

sentence is impossible to predict the word to be filled in is east This explains

the vanishing gradient of the RNN, so LSTM was born to overcome thevulnerability of RNNs that can't be done

2.3.2 Long-Short Term Memory

Long-Short Term Memory, also abbreviated as LSTM, is a specialarchitecture of RNNs capable of learning long-term dependencies introduced byHochreiter & Schmidhuber (1997) LSTM has proved to overcome many

limitations of the previous RNN in terms of vanishing gradient The LSTM

network is an artificial neural network that includes LSTM units instead of, or

in addition to, other network units The LSTM unit is a recurrent network unit

that exceptional at remembering values for either long or short durations of time[14]

In the t state of the LSTM model:

- Output: cht, I call c is cell state, h is hidden state

- Input: cri, hei, Xt, where

Trang 28

e x:is the input in the t" state of model ci.1,

¢ hrs is the output of layer before, h plays the same role as s in RNN,

while € is the new point of LSTM

The LSTM (Long-Short Term Memory) is a type of Recurrent Neural

Network (RNN) widely used in processing and predicting time-series data It is

characterized by the following features:

Memory Cell: LSTM has a "memory cell" to retain information throughtime steps

Gates: The LSTM has three gates, the "Input Gate", the "Forget Gate",and the "Output Gate", to control the process of storing and transmitting

information in the Memory Cell

Status: LSTM has two states, the "hidden state" and the "cell state," to

retain information over time steps

Supports long-term data: LSTM can handle long-term data as it can retain

information through time steps without losing or diminishing it due to the

"vanishing gradient problem"

LSTM (Long-Short Term Memory) is a deep learning algorithm used to

predict time series, a type of continuous data with a time component The

characteristics of LSTM make it better at processing data with temporal

relationships between data points and can avoid the vanishing gradient problem

when training the model on long-form data

The impact of training time and input data volume on the quality of real estateprice prediction made by the LSTM model:

Training time: The LSTM model requires a training time to accurately

predict real estate prices If the training time is too short, the model willnot learn enough about the data patterns and may lead to incorrectpredictions On the other hand, if the training time is too long, the modelmay learn too much about the input data and not naturally adapt to newdata patterns, resulting in overfitting

Input data volume: The amount of input data is also an important factorfor the quality of LSTM's real estate price prediction If the amount of

input data is too little, the model won't learn enough from the data

Trang 29

there's too much input data, the model may take too long to run while

learning from the data

The LSTM model has some advantages over other models in processing and

predicting time-series data:

e Medium length signal processing: The LSTM model has some

advantages over other models in processing and retaining long-term

signals in time series data

e Solving the vanishing gradient problem: The LSTM model effectively

solves the vanishing gradient problem, a common issue in processing

time-series data using traditional neural network models

e Ability to handle null input signal: The LSTM model can handle missing

input signals better than other models

The LSTM model has some disadvantages, such as slow running speed and

the need for a large amount of input data to learn well Therefore, the choice ofthe most suitable model depends on specific requirements and input data

2.3.3 Performance Metrics

2.3.3.1 MSE

The mean squared error (MSE) will let you how close a regression line

is to a set of points It does this by taking the distances from the points tothe regression line (these distances are called the “errors”) and squaringthem The squaring is essential to remove any negative signs It also givesmore weight to the larger discrepancies It is called the Mean Squared Error(MSE) as you’re finding the average of a set of errors The lower the MSE,

the better the forecast [15] The formula is:

XỚi — 9)?

n (2.1)MSE =

Where:

e yiis the i observed value.

Trang 30

e Jiisthe corresponding predicted value.

e nis the number of observations

RMSE

Root Mean Square Error (RMSE) is the standard deviation of theresiduals (prediction errors) The residual is a measure of the distance fromthe data points of the regression line; RMSE is a measure of the spread ofthese residuals In other words, it tells you how concentrated the data isaround the line of best fit Root mean square error is commonly used inclimatology, forecasting, and regression analysis to verify experimentalresults [16] The formula is:

2

RMSEø = [pi á-sa) |

(2.2)

Where:

e >= summation (“add up”)

° (z, = Zoj) ` = differences, squared

e N=sample sizeYou can use whichever formula you feel most comfortablewith, as they both do the same thing If you don’t like formulas,you can find the RMSE by:

1 Squaring the residuals

2 Finding the average of the residuals

3 Taking the square root of the result

Trang 31

CHAPTER 3: EXPERIMENT RESULT

3.1 Technology used

e Programming language:

e Python is an interpreted, object-oriented, high-level programming language

with dynamic semantics Python supports modules and packages, encouragingprogram modularization and code reuse The Python interpreter and extensivestandard library are available as free source or binary for all major platformsand can be distributed for free [17]

e Experimental running environment:

[>] import platform

my_system = platform uname()

print(f"System: {my_system.system}”) print(f"Node Name: {my_system.node}") print(f"Release: {my_system.release}")

print(f"Version: {my_system.version}")

print(f"Machine: {my_system.machine}") print(#"Processor: {my_system.processor}")

Trang 32

Total size of Disk : 136.7 GB (49.1 GB Used)

Total amount of Mem : 12985 MB (1159 MB Used) Total amount of Swap : @ MB (@ MB Used)

system uptime : @ days, @ hour 1 min

Write performance : 192M8/s

Write TOPS : 25.0k Speedtest

Node Name Ipv4 address Download Speed

main: line 44: ping: command not found

CacheFly 15 6MB/s

main: line 44: ping: command not found

Wultr, Los Angeles, CA 15.1MB/s

main: line 44: ping: command not found Wultr, Seattle, WA 11 5MB/S

main: line 44: ping: command not found

Linode, Tokyo, 3P 28.9B/s

main: line 44: ping: command not found

Linode, Singapore, SG 333MB/s main: line 44: ping: command not found

Ngày đăng: 23/10/2024, 00:46

TÀI LIỆU CÙNG NGƯỜI DÙNG

  • Đang cập nhật ...

TÀI LIỆU LIÊN QUAN