HO CHI MINH CITY UNIVERSITY OF TECHNOLOGYFACULTY OF COMPUTER SCIENCE AND ENGINEERING BACHELOR THESIS DEVELOPMENT OF A MOBILE APPLICATION FOR PRICE PREDICTION OF REAL ESTATES Major: Compu
Trang 1HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY
FACULTY OF COMPUTER SCIENCE AND ENGINEERING
BACHELOR THESIS
DEVELOPMENT OF A MOBILE APPLICATION FOR
PRICE PREDICTION OF REAL ESTATES
Major: Computer Science
Committee : Computer Science 2 Supervisor : Assoc Prof Dr.Quan Thanh Tho Reviewer : Assoc.Prof.Dr Bui Hoai Thang
Trang 2
KHOA:KH & KT Máy tính NHIỆM VỤ LUẬN ÁN TỐT NGHIỆP
BỘ MÔN:KHMT Chú ý: Sinh viên phải dán tờ này vào trang nhất của bản thuyết trình
HỌ VÀ TÊN: Phạm Tuấn Anh MSSV: 1651006
1 Đầu đề luận án:
Development of a mobile application for price prediction of real estates
2 Nhiệm vụ (yêu cầu về nội dung và số liệu ban đầu):
✔ Investigate background technologies and frameworks to build the mobile application.
✔ Analyze and design the desired mobile application
✔ Research theory of Linear Regression.
✔ Implement a price prediction AI model using Linear Regression Model.
✔Implement a prototype
3 Ngày giao nhiệm vụ luận án:
4 Ngày hoàn thành nhiệm vụ:
5 Họ tên giảng viên hướng dẫn: Phần hướng dẫn:
PGS.TS Quản Thành Thơ
PHẦN DÀNH CHO KHOA, BỘ MÔN:
Người duyệt (chấm sơ bộ):
Trang 3Ngày tháng năm
PHIẾU CHẤM BẢO VỆ LVTN
(Dành cho người hướng dẫn/phản biện)
1 Họ và tên SV: Phạm Tuấn Anh
MSSV: 1651006 Ngành (chuyên ngành): KHMT
2 Đề tài: Development of a mobile application for price prediction of real estates
3 Họ tên người hướng dẫn/phản biện: PGS.TS Quản Thành Thơ
4 Tổng quát về bản thuyết minh:
- Số bản vẽ vẽ tay Số bản vẽ trên máy tính:
6 Những ưu điểm chính của LVTN:
The student has successfully developed a mobile application that can process the collected data and visualize information in a meaningful way The student has also employed an AI technique to predict prices of the real estates based
on the historical data
7 Những thiếu sót chính của LVTN:
The thesis needs to be elaborated in many parts to provide more details and discussion about the technologies used
8 Đề nghị: Được bảo vệ □ Bổ sung thêm để bảo vệ □ Không được bảo vệ □
9 3 câu hỏi SV phải trả lời trước Hội đồng:
a
b
c
10 Đánh giá chung (bằng chữ: giỏi, khá, TB): Điểm : 7.3/10
Ký tên (ghi rõ họ tên)
Trang 4PHIẾU CHẤM BẢO VỆ LVTN
(Dành cho người hướng dẫn/phản biện)
1 Họ và tên SV: Phạm Tuấn Anh
MSSV: 1651006 Ngành (chuyên ngành): Khoa học Máy tính
2 Đề tài: Development of a mobile application for price prediction of real estates
3 Họ tên người hướng dẫn/phản biện: Bùi Hoài Thắng
4 Tổng quát về bản thuyết minh:
- Số bản vẽ vẽ tay Số bản vẽ trên máy tính:
6 Những ưu điểm chính của LVTN:
- Showed an understanding about some machine learning techniques based on Linear Regression, such as Simple Linear Regression, Multiple Linear Regression, and Polynomial Regression Those techniques were used in predicting data, especially real estate prices
- Designed and implemented a mobile application for users to predict real estate prices based on location of the properties such as City, District, Ward, Street
7 Những thiếu sót chính của LVTN:
8 Đề nghị: Được bảo vệ X Bổ sung thêm để bảo vệ Không được bảo vệ
9 3 câu hỏi SV phải trả lời trước Hội đồng:
a
b
c
10 Đánh giá chung (bằng chữ: giỏi, khá, TB): Trung bình Điểm : 7.0/10
Ký tên (ghi rõ họ tên)
Bùi Hoài Thắng
Trang 5We declare that this thesis was carried out by ourselves under the guidance andsupervision of Associate Prof.Dr Quan Thanh Tho The presented figures in this thesis foranalysis and evaluations are accomplished by our own work In addition, other figuresfrom various resources used in this thesis are explicitly cited in the reference part We willtake full responsibility for any fraud detected in our thesis
Trang 6First of all, I would like to express many thanks to my supervisor, who instructed meabout knowledge and technologies that will apply to my topic, Associate Professor DoctorQuan Thanh Tho of the Faculty of Computer Science and Engineering at Ho Chi MinhCity University of Technology During the thesis time, he helped me to learn newtechnology effectively and supported me in every small problem to finish my work
Besides, I am also extremely grateful to my family for providing me with unfailingsupport and continuous encouragement throughout my years of study Thisaccomplishment would have never been possible without them
Trang 7The real estate market in Vietnam is currently strongly developing and attracts manyinvestors In investing or buying real estate, price is a factor that investors concern themost According to investors, searching and comparing the real estate prices on manywebsites to make price predictions, which takes them a lot of time Therefore, they desire
to have a tool to solve the above problem In this thesis, we will develop a mobileapplication applying an artificial intelligence model in price prediction and propose thedevelopment directions
Trang 8Declaration 2
Acknowledgement 3
Abstract 4
List of figures 7
Chapter 1 Introduction 8
1.1 Introduction to topic 9
1.2 General Objectives and Scope of topic 9
Chapter 2 Background 11
2.1 React Native 12
2.2 Node.js 12
2.3 Firebase 12
2.3.1 Firebase Authentication 12
2.3.2 Firebase Real-time Database 13
2.4 Python and supported libraries 14
2.5 PostgreSQL 14
Chapter 3 Price Prediction Model 15
3.1 Linear Regression 16
3.1.1 Linear Regression 16
3.1.2 Polynomial Features 17
3.1.3 Cost function 17
3.1.4 Model Evaluation 18
Trang 93.2 Data preparation 20
3.2.1 Data set using 20
3.2.3 Outliers Removal 22
3.3 Training Model 24
Chapter 4 Implementation 27
4.1 System Architecture 28
4.2 Use-case Diagram 30
4.3 Activity Diagram 34
Chapter 5 Conclusion 38
6.1 Achieved Results 39
6.2 Future work of the Thesis 39
BIBLIOGRAPHY 40
Trang 10LIST OF FIGURES
3.1 Data set using 20
3.2 Describing Interquartile Range and Outliers 23
3.3 Training model with Simple Linear Regression 24
3.4 Training model Polynomial Regression (Degree of 6) 25
3.5 The best degree of predicting line in District 7 (2-degree model) 26
3.6 The best degree of predicting line in District 6 (3-degree model) 26
4.1 System Architecture 28
4.2 Use-case Diagram of the app 30
4.3 Register Flow chart 34
4.4 Login Flow chart 34
4.5 Log out flow chart 35
4.6 View profile flow chart 35
4.7 Visualize chart Flow chart 36
4.8 Manage users Flow Chart 37
Trang 121.1 Introduction to topic
Real estate price is one of the vital factors affecting the investment decisions of real estateinvestors Therefore, they need a forecasting model to help them predict the price of aparticular property With the fast development of artificial intelligence, the AI model hasbeen applying in price prediction
In this thesis, we will develop a mobile application incorporating the AI model (The AImodel, which bases on collected data and then generates the predicting result) in order tohelp investors to have a useful tool in investment or buying a property The mobileapplication is convenient to carry when investors travel When they reach the destination,they just choose that location and types of land on the app, the system will automaticallygenerate the chart which describes the prediction of price
1.2 General Objectives and Scope of topic
The main aim of this topic is to develop a mobile application including the followingmain features:
Register/Login/Logout: Before using our system users need to sign up an unique
account After that users can login and logout to the system, before using users need
to login the system
Visualize chart: After users provide enough information for the application, then the
application will generate the chart based on the provided information
View profile: Users/ Admin can view the profile information of theirs.
Manage users: The administrator has the right to manage users, he/she can view the
list of all users who used the application and the detailed information of each person,find exact users based on users’ name or email Besides, admin have permission todelete users
Display price-prediction colors on google maps: if the price of land has an upward
trend, green color will be displayed on that area, otherwise red color will be displayed
Trang 13Since the data about real estate is so big, so we limit our system as followings:
The area that we will implement located in Ho Chi Minh City Vietnam
We will predict for prices of land, not for the whole real estate Real estate usuallycontains buildings and land However, a building having the same type of land has awide range of characteristics, which lead to different prices for a property
The app should be deployed on Android device
The app should be handled at least 1000 users without any problems
The app response’s time for any function should be less than 10 seconds
The app size is maximum 200 MB
Trang 14Chapter 2
Background
In this chapter, we will discuss about technologies to build the application
Contents
2.1 React Native 12
2.2 Node.js 12
2.3 Firebase 12
2.3.1 Firebase Authentication 12
2.3.2 Firebase Real-time Database 13
2.4 Python and supported libraries 13
2.5 PostgreSQL 14
Trang 152.1 React Native
React Native is a framework developed by the famous technology company Facebook in
2015 It is used for creating mobile apps for both Android and IOS platforms under onecommon language which is Javascrip, because of this React Native apps savedevelopment time Furthermore, With React Native Framework, you can render UI forboth iOS and Android platforms It is an open source framework
2.2 Node.js
Modern apps have several requirements, which cannot be provided by the app itself, such
as central data storage, communication routing, and user management In order to providesuch services, apps rely on an external software component known as the back-end Theback-end will be executed on one or more remote servers, listen to network requests fromdevices the run the app, and provide them with the services that requests require Theback-end Node.js is written almost entirely in JavaScript
2.3 Firebase
2.3.1 Firebase Authentication
Most apps need to know the identity of a user Knowing a user's identity allows an app tosecurely save user data in the cloud and provide the same personalized experience acrossall of the user's devices
Firebase Authentication provides backend services, and ready-made UI libraries toauthenticate users to the app It supports authentication using passwords, phone numbers,popular federated identity providers like Google, Facebook and Twitter, and more
Trang 162.3.2 Firebase Real-time Database.
The Firebase Real-time Database is a NoSQL database from which we can store and syncthe data between our users in real-time Real-time syncing makes it easy for your users toaccess their data from any device: web or mobile The Real-time Database integrates withFirebase Authentication to provide simple and intuitive authentication
2.4 Python and supported libraries
Python is an interpreted, object-oriented, high-level programming language with dynamicsemantics Python supports modules and packages, which encourages program modularityand code reuse Python is becoming increasingly popular in Machine Learning along withits frameworks and standard libraries One of the reasons behind Python’s increasingpopularity is the wealth of libraries Some libraries that used for training model as follows:
NumPy: NumPy is a Python library used for working with arrays It is an open
source project and you can use it freely In Python we have lists that serve the purpose
of arrays, but they are slow to process NumPy aims to provide an array object that ismuch faster than traditional Python lists
SciPy: SciPy is a scientific computation library that uses NumPy underneath It
provides more utility functions for optimization, stats and signal processing SciPyhas optimized and added functions that are frequently used in NumPy
Matplotlib: Matplotlib is a plotting library used for 2D graphics Matplotlib can be
used in Python scripts, the Jupyter notebook, web application servers…
Trang 17 Pandas: Pandas is an open-source library that is made for working with relational or
labeled data It provides various data structures and operations for manipulatingnumerical data and time series Pandas is fast and it has high-performance andproductivity
Scikit-learn: is library for machine learning in Python The scikit-learn library
contains a lot of efficient tools for machine learning and statistical modeling
Flask: is a web application framework written in Python, it is a good choice for
building API for machine learning service because it is easy to use and supports manyPython libraries
2.5 PostgreSQL
PostgreSQL is a powerful, open source object-relational database system that uses andextends the SQL language combined with many features that safely store and scale themost complicated data workloads PostgreSQL runs on all major operating systems, it isthe open source relational database of choice for many people and organizations.PostgreSQL comes with many features aimed to help developers build applications
Trang 18Chapter 3
Price prediction Model
_
In this chapter, we discuss about theory of Linear Regression Model, processing Data Sets, building the price prediction Model based on the Theory of Linear Regression
Contents
_
3.1 Linear Regression 16
3.1.1 Linear Regression 16
3.1.2 Polynomial Features 17
3.1.3 Cost Function 17
3.1.4 Model Evaluation 18
3.2 Data preparation 20
3.2.1 Data set using 20
3.2.2 Outliers Removal 22
3.3 Training model 24
Trang 193.1 Linear Regression
3.1.1 Linear Regression
Linear Regression is usually common and simple machine learning algorithm which usedfor prediction analysis in statistics fields In statistics, linear regression is a linearapproach to modeling the relationship between an output (or dependent variable) and one
or more variables (or independent variables) Linear Regression Model was applied inmany real life areas for solving the predictive problems Here we apply Linear RegressionModel for predicting price of particular land in the future based on the price of that
property in the past There are two types of Linear Regression: the first one is Simple
Linear Regression and the other one is Multiple Linear Regression.
Simple Linear Regression: is a linear model that has only dependent variable (the output)
and only one independent variable (the input variable) The Simple Linear Model can berepresented by the following equation:
x b
b
y 0 1
Where:
b0is called intercept
b1is the coefficient of x which is the input (independent) variable
Multiple Linear Regression: is also a linear model that has a target variable (dependent
variable) and a set of independent variables {x1, x2,.…, xn} A Multiple Linear Regressioncan be seen as a generalization of a Simple Linear Regression It is described by thefollowing equation:
n
nx b x
b x
Trang 203.1.2 Polynomial Features
In the Simple Linear Regression, the prediction line is a straight line which shows therelationship between the target variable and input variable The data used for trainingmodel is often complicated so the prediction of linear line is not efficient enough anymore
To solve this problem, we use another approach in order to better improve the importantrelationships between input variables and the target variable This approach is to usePolynomial Features which are features created by raising existing features to an exponent(creating new input features based on the existing features) This is another type of Linear
Regression called Polynomial Regression (Polynomial regression extends the linear
model by adding extra features, obtained by raising each of the original features to apower) which has the following form:
n
nx b x
b x
b
b
y 2
2 1
b x
Trang 21J is the Cost Function
h(x) is the prediction function (the function illustrates the relationship between inputvariables and target variable)
y is the real value in the data samples
In order to find the prediction function h(x) (prediction line), we need to calculate theminimum value of the Cost Function We had a method to find the minimum of this
function is Gradient Descent (an iterative optimization algorithm to find the minimum of
a function)
3.1.4 Model Evaluation
The main objective of Linear Regression is to find a prediction line that minimizes theprediction error of all the data points Thus, we need metrics to evaluate the accuracy ofthe training models There are many metrics for evaluation but we just mainly focus ontwo common metrics which are Root Mean Square Error (RMSE) and R-squared Score(R2)
Root Mean Square Error (RMSE)
Root Mean Squared Error (RMSE) represents the average of the difference between theactual values and predicted values in the data set
RMSE
Where:
y is the actual value
yˆ is the predicted value