VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY
HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY
THONG THI KIM ANH
COST ESTIMATING FOR DESIGN AND BUILD (D&B) PROJECTS ACCORDING TO CONDITION IN VIETNAM
Major: Construction Management
Major code: 8580302
MASTER’S THESIS
Trang 2THIS RESEARCH IS COMPLETED AT:
HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY – VNU-HCM Supervisor: Assoc Prof Long Duc LUONG
Examiner 1: Dr Nhat Minh HUYNH
Examiner 2: Dr Ngoc Chau DANG
This master’s thesis is defended at HCM City University of Technology,
VNU- HCM City on 13th July 2023
Master’s Thesis Committee:
(Please write down full name and academic rank of each member of the Master’s Thesis Committee)
1 Dr Thu Anh NGUYEN - Chairman
2 Assoc Prof Hoc Duc TRAN - Member, Secretary
3 Dr Minh Nhat HUYNH - Reviewer 1
4 Dr Chau Ngoc DANG - Reviewer 2
5 Dr Cuong Viet CHU - Member
Approval of the Chairman of Master’s Thesis Committee and Dean of Faculty of Civil Engineering after the thesis being corrected (If any)
Trang 3VIETNAM NATIONAL UNIVERSITY - HO CHI MINH CITY
HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY
SOCIALIST REPUBLIC OF VIETNAM Independence – Freedom - Happiness
THE TASK SHEET OF MASTER’S THESIS
Full name: Thong Thi Kim Anh Student ID: 217085
Date of birth: 18/04/1985 Place of birth: Ninh Thuan
Major: Construction Management Major ID: 8580302
I THESIS TITLE:
COST ESTIMATING FOR DESIGN AND BUILD (D&B) PROJECTS ACCORDING TO CONDITION IN VIETNAM
ƯỚC TÍNH CHI PHÍ CHO CÁC DỰ ÁN THIẾT KẾ VÀ THI CÔNG THEO ĐIỀU KIỆN TẠI VIỆT NAM
II TASKS AND CONTENTS:
THE GOAL OF THE RESEARCH IS TO PROVIDE A MORE PRECISE METHOD FOR COST ESTIMATION OF NEW PROJECTS BY USING THE DATA ALREADY AVAILABLE FROM PREVIOUS PROJECTS.
III THESIS START DAY: 6TH FEBRUARY 2023
IV THESIS COMPLETION DAY: 11TH JUNE 2023
V SUPERVISOR: LUONG DUC LONG, ASSOCIATE PROFESSOR
Ho Chi Minh City, date ………
SUPERVISOR
(Full name and signature)
LUONG DUC LONG
HEAD OF DEPARTMENT
(Full name and signature)
DEAN OF FACULTY OF CIVIL ENGINEER
(Full name and signature)
Trang 4ACKNOWLEDGEMENT
It is my pleasure to extend my appreciation to all those who helped me to accomplish this master dissertation work
Most importantly, I am greatly indebted to my supervisor Luong Duc Long, Associate Professor for his excellent advice, encouragement, and support throughout my master degree He provided me with unlimited potential to become researcher during the master program His perceptive guidance, keen advice, support have served as the source of my inspiration
Next, I would like to thank the professors of the Department of Construction Management, Faculty of Civil Engineering for their dedication in teaching and imparting specialized knowledge during my study at the school
And thank my colleagues and friends who have given their comments, participated in surveys as well as shared valuable knowledge and experiences and supported me to complete this research
I also would like to thank my classmates for enjoying student life and studying with me during the master course
Trang 5ABSTRACT
In the construction industry, particularly during the tender stage, accurate cost estimates play a crucial role in the Design & Build procurement delivery process These estimates serve as essential data for decision-makers who must determine whether to continue or stop the project Additionally, they are vital for effectively mobilizing capital to ensure successful project execution Therefore, achieving high accuracy in cost estimations at an early stage is of utmost importance
Historically, traditional cost estimating techniques have not been able to effectively leverage existing knowledge from previous projects and their associated costs Consequently, these methods often result in significant variance, are time-consuming, and prone to errors, significantly impacting the financial aspects of proposal drafting
Fortunately, advancements in computer technology and mathematical programming skills have paved the way for more sophisticated cost estimating strategies that rely on large datasets and complex procedures Artificial Intelligence (AI) methods have emerged as a product of these breakthroughs
This research aims to develop a precise AI-based approach to cost estimation, utilizing data from previous projects to estimate costs for new ones The main task is to establish an AI methodology for cost estimation and compare its accuracy with existing methods
The study has identified that AI approaches have the potential to address the limitations of conventional methods Recent research has shown promising results using Case-Based Reasoning (CBR), Random Forest (RF), and Artificial Neural Networks (ANN) to overcome these challenges and successfully estimate project costs
Keywords: Design-Build; Parametric estimating; Comparative estimating; Artificial
Trang 6TÓM TẮT LUẬN VĂN THẠC SĨ
Trong ngành xây dựng, đặc biệt là trong giai đoạn thầu, việc ước tính chi phí chính xác đóng một vai trị quan trọng, đặt biệt là cho hình thức hợp đồng là Thiết kế & Thi cơng Những ước tính này đóng vai trị quan trọng là dữ liệu cần thiết cho những người quyết định phải xác định liệu có tiếp tục hoặc dừng dự án Ngoài ra, chúng cũng rất quan trọng để hiện định vốn một cách hiệu quả nhằm đảm bảo thực hiện dự án thành cơng Do đó, việc đạt được độ chính xác cao trong việc ước tính chi phí ở giai đoạn sớm là rất quan trọng
Lịch sử cho thấy các kỹ thuật ước tính chi phí kiểu truyền thống khơng thể tận dụng hết hiệu quả của các kiến thức đã có từ các dự án trước đó và các chi phí liên quan Do đó, các phương pháp này thường dẫn đến sai biệt đáng kể, tốn thời gian và dễ gây lỗi, ảnh hưởng đáng kể đến mặt tài chính của việc soạn thảo đề xuất
May mắn thay, những tiến bộ trong cơng nghệ máy tính và kỹ năng lập trình tốn học đã mở ra đường cho các chiến lược ước tính chi phí phức tạp hơn, dựa trên các bộ dữ liệu lớn và các quy trình phức tạp Phương pháp Trí tuệ nhân tạo (AI) đã nổi lên như một sản phẩm của những đột phá này
Nghiên cứu này nhằm phát triển một phương pháp tiếp cận AI chính xác để ước tính chi phí, sử dụng dữ liệu từ các dự án trước để ước tính chi phí cho các dự án mới Nhiệm vụ chính là xây dựng một phương pháp tiếp cận AI cho việc ước tính chi phí và so sánh độ chính xác của nó với các phương pháp hiện có
Nghiên cứu đã xác định rằng các phương pháp AI có tiềm năng giải quyết những hạn chế của các phương pháp truyền thống Nghiên cứu gần đây đã cho thấy những kết quả đáng hứa bằng cách sử dụng các phương pháp Học dựa trên trường hợp (CBR), Rừng ngẫu nhiên (RF) và Mạng nơ-ron nhân tạo (ANN) để vượt qua những thách thức này và ước tính chi phí dự án một cách thành cơng
Từ khóa: Design-Build; Ước tính tham số; Ước tính so sánh; Mạng nơ-ron nhân tạo; Lý
Trang 7AUTHOR’S COMMITMENT
The undersigned below:
Name : Thong Thi Kim Anh Student ID : 2170852
Place and date of born : Ninh Thuan, April 18, 1985 Address : Ho Chi Minh City
With this declaring that the master thesis entitled “Cost Estimating for Design and
Build Projects according to condition in Vietnam” is done by the author under
supervision of the instructor All works, ideas, and material that was gain from other references have been cited in the corrected way
Ho Chi Minh City, June 10th 2023
Trang 8CONTENTS
THE TASK SHEET OF MASTER’S THESIS i
ACKNOWLEDGEMENT iiABSTRACT iiiTÓM TẮT LUẬN VĂN THẠC SĨ ivAUTHOR’S COMMITMENT vCHAPTER 1 1INTRODUCTION 11 1- Definitions: 1 1 2- Research background 3 1 3- Problem statement 5 1 4- Research Goal 6 1.5- Relevance 6 CHAPTER 2 8OVERVIEW 8
2.1- Conventional cost estimation approaches 8
2.2- Artificial intelligence estimation methods 13
2.3- The appropriate cost estimation method 16
2.4- Related researches 18 CHAPTER 3 22RESEARCH METHODOLOGY 223.1 - Machine Learning (ML) 22 3.2 - Research Steps 23 3.3 - Random Forest (RF) 23
3.4 - Artificial neural networks (ANN) 25
3.5 - Case based reasoning (CBR) 36
CHAPTER 4 44
COST DATA AS AN EXAMPLE APPLICATION 44
4.1- Problem Definition: 44
Trang 94.3- Identify the factors and profile of casebase: 45
4.4- Selecting performance evaluation metrics: 50
4.5- Performance evaluation: 52
4.6- Result Comparison and Discussion: 65
CHAPTER 5 68
CONCLUSION 68
5.1- Conclusion 68
5.2- Future research direction 69
LIST OF PUBLICATIONS 71
REFERENCES 72
Trang 10LIST OF FIGURES
Figure 1.1- Design‐Bid‐Build………………………………………………………… 1
Figure 1.2- Design & Build………………………………………………………….… 2
Figure 1.3- Transition from early cost estimation to final cost……………………….… 6
Figure 3.1- Machine learning procedure…………………………………… …… …22
Figure 3.2- Different types of machine learning………………………………….….…23
Figure 3.3- Research steps…………………….…………………………………… …23
Figure 3.4- Model of Random Forest…………………….…… ……………….….…24
Figure 3.5- Structure of neural network…………………….…… ………….…….…26
Figure 3.6- Processing Unit…………… ………………….…… ………….…….…26
Figure 3.7- Identity function…………… ………………….…… ………….………27
Figure 3.8- Sigmoid function…………… ………………….… …………………28
Figure 3.9- Hyperbolic Tan Function…………… ……………… …………….……29
Figure 3.10- Feedforward propagation Network…………… ….…………… … …32
Figure 3.11- Recurrent neural network……… …………… ……………… … …33
Figure 3.12- Back-propagation……………… ………….… …….………… … …33
Figure 3.13- CBR cycle process……………… …………… ……………… … …38
Figure 3.14- Formatting and Data Organization……………… ………… … … …39
Figure 3.15- Formula of similarity case for Textual symbol…………………….… …40
Figure 3.16- Formula of similarity case for numerical symbol…………………….… 40
Figure 3.17- Calculation of the Weight……………………………………… … … 41
Figure 3.18- GA Solver Program……….………………………………….… … … 42
Figure 3.19- Calculate the similarity of case……….……………………….…… … 43
Trang 11Figure 4.2-Cronbach's Alpha coefficient……….……………………………… … 47
Figure 4.3-Cost Comparison between Actual cost and predicted cost in RF model.… 54
Figure 4.4-Category of variable……… ……….……………………………… … 56
Figure 4.5- Setup of location variable……… ……….…………………… … … 56
Figure 4.6- Setup of floor type variable……… ……….……………… … … … 56
Figure 4.7- Setup of specification of material variable……… ……….……………….56
Figure 4.8- Arrangement of variables ……… ……….……………… … … … 57
Figure 4.9- Selection of Architecture ……… ……….……………… … … … 58
Figure 4.10- Selection of training type ……… ……….………………… … … 59
Figure 4.11- Selection of Output…… ……… ……….………………… … … 59
Figure 4.12- ANN architecture …… ……… ……….………………… … … 60
Figure 4.13- The error of model …… ……… ……….…………….… … … 61
Figure 4.14- The matrix of weight …… ……… ……….………….… … …….61
Figure 4.15- Scatterplot chart of predicted value of each dependent variable………….62
Figure 4.16- Cost Comparison between Actual cost and predicted cost in ANN model 63
Figure 4.17-Cost Comparison between Actual cost and predicted cost in CBR model 65
Trang 12LIST OF TABLES
Table 1.1- Summary of advantages & disadvantages of Project Delivery Methods.…… 2
Table 2.1- Summary of Strengths & weaknesses of AI estimation methods……….….16
Table 2.2- Cost variable sources…………………………………………………….… 21
Table 3.1- Summary of Strengths & weaknesses of Activation function of ANN method…………………………………………………………………………… … 30
Table 4.1- Construction Index for Construction…………………………………….… 45
Table 4.2- Meaning of Cronbach’s Alpha coefficient values………………………… 46
Table 4.3- Factors affect the cost of faỗade works 48
Table 4.4- Profile of cases for model validation……………………………………… 50
Table 4.5- RF model result……………………………………………………….…… 54
Table 4.6- ANN model result………………………………………………… ……… 63
Table 4.7- CBR model result…………………………………………………… …… 64
Trang 13LIST OF ABBREVIATIONS
AI Artificial intelligence
ANN Artificial neural networks
CBR Case-based reasoning
RF Random Forest
CER Cost estimating relationships
ES Evolutionary systems
HS Hybrid systems
KBS Knowledge-based systems
ML Machine learning
MLR Multiple linear regression
MRA Multiple regression analysis
WBS Work breakdown structure
SPSS Statistical Package for the Social Sciences
D&B Design & Build
Trang 14CHAPTER 1
INTRODUCTION
- - - § - - -
1 1- Definitions:
Project Delivery Methods:
- Traditional Approach: The design must be completed before construction
start the works Design and construction are usually performed by two different parties who interact directly and separately with the owner (Figure 1) (Hegazy, 2002)
Figure 1.1- Design‐Bid‐Build
- Design and Build (D&B): In this approach, Contractor is in charge
Trang 15Figure 1.2- Design & Build
- Turnkey: This approach is similar to the design-build approach, but the Builder is responsible for design, construction and project financing (Hegazy, 2002)
Advantages & Disadvantages of Project Delivery Methods:
Table 1.1- Summary of advantages & disadvantages of Project Delivery Methods
Methods Benefits Shortcomings
Traditional - Competitive price
- The Designer, engineering, and constructor is familiar with this method
- Easy to use in all markets, including public and private
-The project time take longer -Designer cannot get the benefit from construction experience; - Conflict between the parties - Disputes and claims if have any Changes
Design-Build/ Turnkey
- No conflict among the parties -Minimum owner involvement - Design can get benefit from construction experience
- Time can be reduced since
overlaps design and
construction
- Cost may not be known until the end of design
Trang 16Role of Cost Estimating in construction project:
Cost estimation is probably the most crucial function to the success of construction organizations, due to cost estimation serve several purposes, including feasibility analysis, budgeting, preparing owner's funding, and a baseline for evaluating contractors
The primary goal of cost estimating is to offer a basis for controlling project costs through the generation of cost estimates framed within the permitted budget and to give the essential data for the development of projects decision-making process
Bids Cost estimating needs to be done in different manners at different stages of a project The estimator, who often works on behalf of the owner or the designer, may have to make such an estimate from concept, without dimensions, details, specification and schedule of the owner's requirements
1 2- Research background
In a fiercely competitive industry where market shares are declining and profit margins are shrinking, the cost of delivering services or products emerges as a pivotal
factor in decision-making (Günaydın & Dogan, 2004) During the tendering phase of a
project, the cost estimate for capital expenditures significantly impacts planning, bidding, design, construction management, and overall cost management (Arage & Dharwadkar, 2017) Decisions based on these estimates often lead to critical commitments, including resource allocation, with far-reaching consequences Accurate cost estimates enable project managers to assess project feasibility and exercise effective cost control Moreover, these estimates can influence the client's decision on whether to
proceed with the project or not (Ahiaga-Dagbui & Smith, 2012) Clients consistently
expect contractors to deliver projects within the approved budget, as it plays a vital role in ensuring client satisfaction Consequently, precise cost estimates can bolster a contractor's reputation and foster stronger relationships with clients
Trang 17The traditional method is employed when the Bill of Quantity (BoQ) and design drawings, prepared by experienced professionals in a specific field, are in regulatory compliance and ready for execution While this procurement approach provides design certainty and clear risk allocation, it lacks cost-saving measures, speed, and the seamless integration of design and construction, which has led to a shift in client
perception (Young, Seidu, Ponsford, Robinson, & Adamu, 2021) Unfortunately, the
BoQ used during the tender stage does not possess the capability to accurately predict the final project cost This inadequacy can be attributed to incomplete information in the drawings and specifications used during the tender stage, resulting in a limited understanding of the client's requirements
Embracing a more efficient approach than the traditional one is poised to alleviate the challenges confronting Vietnam Design-build, acknowledged as an effective project implementation method, has garnered widespread recognition and offers numerous advantages to all stakeholders involved in the process This progressive approach is extensively adopted worldwide, including:
- In the UK, D&B was already in use in the 1960s, and by the end of the 1990s, DB had captured 23% of the market for new construction projects (Ling, Liu, & Environment, 2004)
- In the USA D&B started in the early 1900s (Ling et al., 2004) [9], in the mid-1990s, more than one-third of construction projects in USA used the D&B approach (Ling et al., 2004)
- In Singapore, D&B has been employed for construction projects since 1992 (Ling et al., 2004) From 1992 to 2000, the share of D&B projects climbed gradually, reaching 16% for public sector projects and 34.5% for private sector projects Hence, adopting the design-build method for project implementation is likely to result in high satisfaction among project participants in Vietnam, thanks to the numerous benefits it offers
Trang 18numerous small packages The method has the potential to save up to 20-30% of project progress, optimizing cash flow for all involved parties In practice, some projects have achieved savings of 10-15% of the total investment capital when implementing design-build
However, it is essential to recognize that design-build is not without its challenges, especially when applied to large-scale and complex projects Nonetheless, embracing the design-build method provides construction companies with an opportunity to showcase their capabilities while also encouraging the adoption of modern technologies in Vietnam, making the country more attractive to foreign investment
1 3- Problem statement
Trang 19Figure 1.3- Transition from early cost estimation to final cost (Petroutsatou et al.,
2012)
1 4- Research Goal
With the limitations outlined in section 1.3 serving as the motivation, the author introduces to the readers an advanced method based on AI platform, aiming to reduce error percentage compared to traditional methods while being cost-effective By integrating with readily available tools like Excel or other existing computational tools
The target of this research is to create an accurate AI-based cost estimation method and use the data already available about previous projects to estimate cost performance using D&B procurement method for new projects By employing this technique, the predictor can accelerate the cost assessment process and make the cost estimate more precise To reach the goal, specific models are constructed to predict each of the factor performance
This research is important because the tender team will know the important variables that they must pay closer attention to so they can be reached expected budget which required at tender stage Those significant variables factors that are controllable could be properly managed to increase the chances of tender package success
1.5- Relevance
Trang 20The research aims to identify the best cost estimation method within the construction industry, and AI methodology is considered the most promising approach, considered a scientific innovation The study's contributions to the scientific literature include:
✓ Providing a comprehensive overview of both AI-based and traditional cost estimation methods
✓ Exploring the potential of AI-based cost estimation methods
✓ Improving accuracy and efficiency in cost estimation through the use of AI techniques
✓ Offering solutions for tender preparation in the construction industry
From a commercial perspective, the research is highly relevant and useful due to the following reasons:
➢ Addressing the issue of low accuracy and inefficiency in cost estimation at the pricing proposal stage, which directly impacts profit margins
➢ Increasing opportunities and competitive advantage for contractors by reducing the costs associated with tenders
➢ Potentially leading to faster cost estimation methods, which can reduce cost overruns and overhead expenses
Trang 21CHAPTER 2
OVERVIEW
- - - § - - -
2.1- Conventional cost estimation approaches
The literature offers comprehensive understanding of construction project cost estimation techniques There are many differences conventional cost estimation techniques and it based on the project's goal, amount of planning and/or design, size, complexity, conditions, timetable, and location
Many of the techniques employed in cost estimate for construction projects can also be applied to cost estimate in others field These techniques can be classified as parametric, historically based on bids, quantity-based on unit costs, within a range, and based on probabilistic risk estimations (Geberemariam, 2018)
Below shows a general division of the identified traditional methods into parametric, detailed, comparative, and probabilistic estimates The advantages and disadvantages of these various strategies will then be analyzed and listed
Parametric estimating:
This approach is typically applied in the project's first stages Using a model with a mathematical representation of the cost estimating relationships (CER) that can forecast and offer a logical correlation between the physical and functional aspects of a project, one can generate a parametric estimation (Geberemariam, 2018) The features could be functional requirements, performance requirements, or physical characteristics Cost-to-cost or cost-to-non-cost variables are used to show CER The cost of the independent variable, for instance, might be used to forecast the cost of the dependent variable in a cost-to-cost relationship For example, the cost of labor hours for one component could be used to forecast the cost of labor hours for another component
The quantity of output items can be used to estimate the cost of labor hours in a cost-to-non-cost scenario The relationships might be as simple as one-to-one or as sophisticated as an algorithm An estimating model is a collection of intricate relationships
Trang 22equations 1 and 2 below The cost estimating relationships (CERs) framework is what equations 1 are known as To quantify the relationship between an independent variable and contract price, the CER employs quantitative methods (Geberemariam,
2018)
Tc= ∑ Pni cr xPI (E.q-1)
Where:
Tc = Total Cost
Pcr= Parameter Cost Ratio
PI = Parameter of an Interest
Equation 2 below for CER with associated nonlinear form cost estimation relationships:
Tc =∑ 𝑃𝑛𝑖 𝑐𝑟 𝑥 𝑃𝑛𝑖 (E.q-2)
Where:
PI = Parameter of independent variable of interest
ni = exponent used to transform 𝑷I
The temporal effects of cost, such as inflation, sharp rises in material costs, and for an independent variable and other metrics, are transformed and normalized using the exponential factors (E.q-2)
The project's cost drivers should be the variables considered in a parametric estimate The underlying presumption is which the factors that influenced costs in the previous would continue to influence costs in the future Access to historical data that can be utilized to identify the cost drivers and the pertinent CER is necessary in order to employ a parametric technique Based on the unique characteristics of the project, the parametric CER can be used to forecast costs for upcoming projects
Pros:
- Cost Estimating is usually quickly and easily
Trang 23Cons:
- To ensure that they are consistent with the present link between project qualities and costs, they should be regularly reviewed
- Should be accurately and completely described because using the CER incorrectly could result in substantial estimate errors (Geberemariam, 2018)
Detailed estimation:
The bottom-up or analytical estimation methods are other names for the thorough estimation approach With the establishment of a Work Breakdown Structure (WBS) for each activity that is computed by elements, time, and scope that is carried out in a project, this method generates a detailed project cost estimate
A quantity surveyor or other technical person with extensive experience in a certain activity often calculates and connects the costs per activity to the WBS parts
The general mathematical formula is shown in equation 3 below But each project requires a different approach
Tc= ∑𝑖𝑞𝑖 𝑥 (Mi + Li + Ei) +∑𝑗𝐼𝑗 (E.q- 3) In which: 𝑻𝑪 = Cost in Total 𝒒𝒊 = Quantity 𝑴𝒊 = Material rates E𝒊 = Equipment rates 𝑳𝒊 = Labor rates I𝒋 = Indirect rate Advantages:
- Finding out exactly what the estimate includes and whether anything was missed is one of the biggest benefits of the thorough estimation approach (Geberemariam, 2018)
Trang 24Disadvantages:
- Executing a thorough estimate might take a lot of time, which makes it expensive - The requirement that every new project require a fresh estimate Estimates of some
recurring tasks may obtain from earlier projects, however, they have to integrate into the new estimate's condition
- To provide a trustworthy estimate, the project specs must be well-known if the specifications’ project always change, the estimate must continuously account for these changes
- During the summation of the many WBS elements, small inaccuracies can become huge errors
- and it take a lot of time to establish, especially in large, complex projects with many components of the work breakdown structure
Comparative estimating:
When a new project is similar to one that has just been finished, the comparative estimating approach can be used to quickly compare the two The main cost factors and current expertise from similar projects in the past are required during this phase The project's size, complexity, performance requirements, length, location, and available technology will all be adjusted according to their individual differences A comparative estimate is typically used to examine the project's viability and provides guidance on whether to move forward with the project within the specified parameters (Burke, 2013) In addition, the analogous method is employed when attempting to estimate a generic system with few specifications available
This method technique normally is used by unit method, cost indexes Cost Capacity Equation 4 or power law and sizing model, and Factored Estimates (Geberemariam,
2018) Equations 4 of the generic mathematical cost estimation are used:
- Unit approach
Tc = ∑ 𝑈 𝑥 𝑁𝑛
Trang 25In which:
𝑻𝑪 = Cost in Total,
U= Unit Price N= Quantity
- Cost Index: The ratio of current costs to previous costs is known as the cost index (CI) The CI is dimensionless and changes in cost over time to account for the
influence of inflation (Geberemariam, 2018)
Tc = ∑ 𝐶𝑜 (𝐼𝑡
𝐼𝑜
𝑡 ) (E.q- 5)
Where:
𝑻𝑪 = Total cost estimation at present
𝑪𝟎 = Cost of previous
𝑰𝒕 = Index value at time t
𝑰𝟎 = Index value at base time
Merits:
- This completes an estimate quite quickly
- The accuracy still remains same in case the data from earlier that use for reference is slight changing
- Everyone involved can easily understand the determined estimate Demerits:
- It is quite difficult to identify a project that is similar perspective with the new project in order to compare
- The method relies on extrapolation and professional judgment for the factor adjustments As a result, the requirement for normalization may result in a subjective evaluation of the data and may affect the estimate's accuracy
Trang 26Probabilistic estimating:
The probabilistic estimation approach concentrates in the risks and uncertainties associated with the project and makes an effort to quantify the project cost variability by using probability distributions for one or more parameters as input for the cost estimate (Zwaving, 2014)
By providing quantified effects on the likelihood of meeting planned cost and schedule baselines, this method more effectively communicates the impact of changes to planned or requested resources It addresses concerns about the likelihood of exceeding a specific cost in the range of possible costs, the potential amount of the cost overrun, and the various types of uncertainties and how they drive cost
(Geberemariam, 2018)
Additionally, the design and requirements are still mostly undefined at this early stage Therefore, using probabilistic estimation as opposed to deterministic estimation or other conventional methods makes sense (Elkjaer, 2000)
In simulation modeling, the probability distribution is essential since it frequently affects output correctness (Chou, Yang, & Chong, 2009) A probability distribution is a statistical function that enumerates all the potential outcomes and probabilities that a random variable might have over a particular range (Zwaving, 2014)
Benefits:
- The likelihood of cost overrun is instructive This resulted in the estimate accuracy being confirmed
Challenges:
- A cost distribution should be determined for each cost component
- It is necessary to determine the relationship between cost components The accuracy of the estimates may not be reliable if this is not done appropriately
2.2- Artificial intelligence estimation methods
Modern and theoretical cost estimate approaches that are capable of fully utilizing the knowledge that already available in data are discussed in the overview to address the issues and limitations’ traditional cost estimation methods
Trang 27intelligence (AI) and AI technologies were made possible by advancements in mathematical programming approaches Investigating multi- and non-linear correlations between design variables and final costs is possible with the aid of AI tools (Günaydın & Dogan, 2004) In this study, four distinct AI cost estimating techniques will be introduced:
Machine-learning:
A system that can learn and generate predictions, identify patterns, or categorize data has been described as a machine learning (ML) system
Pros:
- the capacity to handle uncertainty, cope with inadequate evidence, and make decisions about new circumstances based on knowledge gained from precedent cases
- Able to research the multiple, non-linear link between cost factors and being self-learning
Cons:
- Lack of technical justification: The model is too heavy for devices with small memory, take much time to run the model thus it can be useless
Two primary strategies can be used to distinguish machine learning approach An artificial neural network (ANN) and a support vector machine (SVM) are the first and second, respectively:
- A computational model based on the design of biological neural networks is known as an artificial neural network (ANN) An artificial neural network may train itself and learn from available data by giving it input datasets with known matching output values Without the assistance of an expert, ANNs can find solutions to issues and look for hidden patterns in data (Ahiaga-Dagbui & Smith, 2012) Neural networks' capacity for learning provides a benefit in the solution of complicated issues for which it is challenging to find analytical or numerical solutions (Rafiq, Bugmann, & Easterbrook, 2001)
Trang 28performs rather well with smaller datasets since it is independent of the dimensionality of the input layer (Son, Changmin, & Kim, 2012) And this is merits over ANN in case datasets are available is small Though, SVR systems do not now have the capability to handle multi- or non-linear relationships between cost factors This means that the only relationships that can be identified are those that are linear and simpler
Knowledge-based systems:
A knowledge-based system (KBS) is a type of computer system that analyzes knowledge, data and other information from sources to generate new knowledge It uses artificial intelligent concepts to solve problems, that useful for making decisions These systems often have built-in problem-solving capabilities that allow them to understand the context of the data that they review and process and make informed decisions based on the knowledge that they store
Merits: The capacity to defend any outcome and the simplicity of model development
Demerits: Establishing the rule process at the initial takes a lot of time and KBS to self-learning is challenges
Evolutionary systems:
Evolutionary system (ES) is a family of algorithms for global optimization and the subfield of artificial intelligence and soft computing studying these algorithms When there are numerous solutions but it is unknown which one is the best, ES are employed as an optimization tool This approach generates and repeatedly updates a first collection of potential answers to a given problem By stochastically eliminating fewer desirable solutions and making minor random alterations, a new generation is created every time Most concepts in ES are population-based
The main limitation of ES is that it is developed using particular heuristics, making it challenging to generalize and challenging to self-learn
Hybrid systems:
Hybrid systems (HS) a combination of different approaches to solve a particular issue
Trang 29Benefit: can provide better result compared to individual methods Drawbacks:
- Lack of the tools which assist to compute at implementation stage - Deeply knowledge of varies methods is needed
- Requesting higher effort to create the method is at the initial stage
2.3- The appropriate cost estimation method
In this section, we assess the most appropriate cost estimation approach to achieve our research objective by evaluating the merits and limitations of different techniques Subsequently, we conduct a comparative analysis Provided below is a table enumerating the benefits and drawbacks of several modern cost estimation techniques:
Table 2.1- Summary of Strengths & weaknesses of AI estimation methods Estimation
method
Strengths Weaknesses Requirements
Machine Learning - able to Self-learning - the capacity to incorporate and manage ambiguity - easily retrained with new data
- the capacity to estimate new project
costs using
knowledge of past project costs
- quick and highly
precise cost
estimation
-Identify multi- and nonlinear relationships between cost parameters -technique supporting are lacked -Black box decision -Require a Enormous dataset - Difficult to identify input parameters - Enormous data sets - Predetermining for set of variables of input layer - Target values for variables in the input layers that are in line with
- Software for Statistical
Trang 30Knowledge based systems
- Ability to explain the any result;
- Easy to establish;
-Knowledge can
preservation;
- Difficult for
KBS to self-learn; - Take a lot of time
to develop a process at the beginning; -Establishing If/then rules -wide knowledge is required for Expert(s) - Set of an inference system Evolutionary systems - the capacity to eliminate fewer desirable options -favorable to population-based concepts - Optimization of a group of answers It is hard to generalize because it was gathered using particular heuristics
- the initial group
of potential solutions -paradigms based on a population Hybrid systems - the capacity to go over a method's specific restrictions - Inheriting the benefited some advantages of different methods - Because there are no computational tools available, it is difficult to establish the model; -Requirement a profound knowledge of different methods -Extensive expertise in a variety of methods; - Tools for computing that can handle approaches
Based on the data presented in the table, machine learning emerges as the most suitable approach to address the current issue, specifically through the utilization of Case-Based Reasoning (CBR), Random Forest (RF), and Artificial Neural Networks (ANN) Artificial Neural Networks (ANN) stand out for their ability to self-learn, significantly reducing the time required for technique development and refinement Employing predefined algorithms, this approach learns from existing data and establishes relationships between cost components and project costs Furthermore, ANN excels in identifying non-linear connections between cost components and project costs Once the approach is established, it facilitates quick estimation or projection of the project's expenses
Trang 31unique characteristic of case-based reasoning paves the way for the development of intelligent systems
On the other hand, the random forest model comprises multiple decision trees, effectively mitigating the risk of overfitting and simplifying the determination of feature importance
2.4- Related researches
Many researches for construction cost estimating have issued by applying artificial intelligence approaches in recent decade such as: Construction Cost Estimation Using a Case-Based Reasoning Hybrid Genetic Algorithm Based on Local Search Method (Jung
et al., 2020), Performance evaluation of normalization based CBR models for improving
construction cost estimation (Ahn et al., 2020), Improving cost estimation in construction
project (Sayed, Abdel-Hamid, & El-Dash, 2020), A model utilizing the artificial neural
network in cost estimation of construction projects in Jordan (Al-Tawal, Arafah, &
Sweis, 2021), Use of artificial neural network for predesign cost estimation of building
projects (Ambrule, Bhirud, Computing, & Communication, 2017), Application of
support vector machines in assessing conceptual cost estimates (An, Park, Kang, Cho, &
Cho, 2007), Construction cost estimation of spherical storage tanks: artificial neural
networks and hybrid regression—GA algorithm (Arabzadeh, Niaki, & Arabzadeh, 2018),
Deep learning based cost estimation of circuit boards: a case study in the automotive
industry (Bodendorf, Merbele, & Franke, 2021), Prediction of unit price bids of
resurfacing highway projects through ensemble machine learning (Cao, Ashuri, & Baek,
2018), Evolutionary fuzzy decision model for construction management using support
vector machine (Cheng & Roy, 2010), Comparison of machine learning models to
Trang 32for predicting the ultimate buckling load of variable-stiffness composite cylinders
(Kaveh, et al., 2021);, Efficient training of two ANNs using four meta-heuristic
algorithms for predicting the FRP strength (Kaveh & Khavaninzadeh, 2023) And many efficient and recent metaheuristics can be found in the comprehensive, insightful, and valuable book written by Prof Kaveh (Kaveh, 2021)
This study will focus on introducing three popular and suitable algorithms for cost estimation in construction, which are RF, ANN, and CBR The main reasons for choosing these algorithms are as follows:
Although there are a lot of studies about neural networks for optimization and prediction, all of the above articles use the tools to help build network architecture, which are MATLAB, Gene Algorithms (GA), etc., and very little research has been done on using the Statistical Package for the Social Sciences (SPSS) tool to build architecture neural networks With advantages such as:
(i) User-friendly interface: SPSS provides a user-friendly graphical interface that makes it accessible to users with limited programming experience It simplifies the process of building and analyzing ANN models, allowing researchers or analysts to focus on the data and interpretation rather than coding (ii) Available algorithms and options: SPSS offers a range of pre-built algorithms and options for creating and training ANN models It includes various activation functions, learning algorithms, and optimization methods This allows users to experiment with different configurations and find the best fit for their specific research or analysis goals (iii) Model validation and evaluation: SPSS provides tools for evaluating and validating the ANN models Users can assess model performance through metrics like accuracy and recall The software also supports cross-validation techniques for estimating the generalization capability of the models Therefore, the effectiveness of the ANN model run in SPSS will be assessed and compared with other methods in this paper
Trang 33to outliers: RF is robust to noisy data and outliers, it averages the predictions from multiple trees, making it less sensitive to extreme values that may exist in the dataset (4) Feature importance: RF provides a measure of feature importance, which helps in understanding the most influential variables in making predictions This information can be valuable for feature selection and data analysis (5) Handles missing values: The algorithm can handle missing values without imputation; when making a prediction, missing values are simply skipped in the respective trees (6) Reduces overfitting: RF reduces the risk of overfitting by building an ensemble of trees and aggregating their predictions, which helps to generalize better to unseen data (7) Efficient for large datasets: RF is efficient for training on large datasets and can handle thousands of input variables and millions of instances effectively (8) No assumptions about the data: RF does not make any assumptions about the distribution or relationship of the data; it can handle non-linear relationships and interactions between variables without any prior knowledge Overall, RF is a powerful and flexible algorithm that yields accurate predictions, handles various data types, and is less prone to overfitting
In this study, the authors referenced cost-influencing factors from two sources: H
Murat Gunaydın, S Zeynep Dogan (2004), and Won-Gil Hyung, Sangyong Kim, Jung-Kyu Jo (2005), as presented in Table 2.2 below, to gather expert opinions on cost
management through online meetings After reviewing these sources, nine factors: (i)Gross Floor area, (ii) The height of the building, (iii) The height between stories of the
building, (iv) Location of Building, (v) The floor type of the building, (vi)Number of
Trang 34Table 2.2- Cost variable sources
No Name of Factors Name of Article & the authors
1 8 different factors were used: (a) The total area of the building, (b) The ratio of the typical floor area to the total area of the building, (c) The ratio of ground floor area to the total area of the building, (d) The number of floors, (e) The console direction of the building, (f) The foundation system of the building, (g) The floor type of the building, (h) The location of the core of the building
A neural network approach for early cost estimation of structural
systems of buildings by H Murat Gunaydın, S Zeynep Dogan
2 21 different factors were used: (a) Duration, (b) No of floors, (c) No of parking spaces, (d) Year, (e) Floor area ratio, (f) Building coverage ratio, (g) Total area, (h) Type of building, (i) Location, (j) Form system, (k) Roof, (l) Foundation, (m) Superstructure, (n) Substructure, (o) Retaining wall, (p) External wall, (q) Internal wall, (r) Ceiling, (s) Floor
Improved similarity measure in case-based reasoning: a case study of construction cost estimation by
Trang 35CHAPTER 3
RESEARCH METHODOLOGY
- - - § - - -
3.1 - Machine Learning (ML)
3.1.1- General:
In computer science, machine learning is a technique which creates a "model" out of
"data" (P Kim, 2017) and predicts future data based on current data (Paluszek & Thomas,
2017) In place of being explicitly designed, it functions by learning from training data
As shown in Figure 3.1 below, the machine learning procedure:
Machine Learning
Input DataModel
Dataset for Training
Output
Figure 3.1- Machine learning procedure (Source: Phil Kim)
Trang 363.1.2- Elements of machine learning:
There are different types of machine learning techniques, based on the training method, three types can be classified are: unsupervised learning, supervised learning and reinforcement learning Figure 3.2 below general show three different techniques:
Figure 3.2- Different types of machine learning
3.2 - Research Steps
Step 1- Import libraries
and divide the data
Step 5: Import
prediction result
Step 6: Evaluation
of the result
Step 3-Evaluate model
Step 2- Training the
model
Step 4: Make
prediction
Figure 3.3- Research Steps
3.3 - Random Forest (RF)
3.3.1- General:
Trang 37algorithm because it is able to find out which attributes are more important than others and it also can show attributes do not affect in the decision tree
RF algorithm usually works in four steps, figure 3.4:
- Select random samples from the given data set
- Set up a decision tree for each sample and get prediction results from each
decision tree
- Vote for each prediction outcome
- Choose the most predicted outcome as the final prediction
TRAINING SET TEST SETTraining Sample 1Training Sample 2Training Sample nDecision Tree 1Decision Tree 2Decision Tree nPollingPrediction
Figure 3.4- Model of Random Forest (Source: Nguyen Duy Sim, 2018)
In this research Python 3.10.9 and Spyder IDE 5.4.1 were used to develop the Random Forest model
3.3.2- Valid the model:
The study evaluates the model's ability to anticipate outcomes using the Nash Sutcliffe Efficiency (NSE) index which calculate as the equation 6:
Trang 38- n is the sample size
- yt is the value selected for evaluation
- xt is the predicted value
- ȳ is the mean of yt in the sample
The more accurate the model's prediction performance when NSE value is closer to “1” The accuracy of the predictive model is evaluated by randomly selecting a set of result values and comparing them with the corresponding set of test values, the closer the predicted value is to the test value, the higher accuracy prediction has
3.4 - Artificial neural networks (ANN)
3.4.1- Definition:
Artificial Neural Networks (ANN) simulate biological neural networks learning is a block structure of simple node of computation that are closely associated with where the connections between neurons determine the function of the network
The basic features of neural networks include:
- Consists of a set of processing nodes (artificial neurons) - Enable or output status of the processing node
- Links between node In general, each link is defined by a weight Wjk that provide
information of effecting that the signal of node j has on node k The weight increases or decreases the strength of the signal at a node
- A propagation rule that determines how the output of each node is calculated from its input
- An activation function, or transfer function, that determines another activation level based on the current activation level
- Unit of adjustment: deviation/bias/offset
Trang 39Generally, neurons are aggregated into layers as figure 3.5 below Different layers can perform different transformations on their inputs Signals move from the first layer (the input layer), to the last layer (the output layer)
X1X3X5X7X9X2X4X6X8H1H2H3OH4
The input layer
The hidden layer
The output layer
Figure 3.5- Structure of neural network
3.4.2- Components of an artificial neural network
Known as a neuron or a node, it does a very simple task: receives input from the front unit or an external source and uses them to calculate the output signal that will be propagated to other units
g(aj)x1xnx0ajZjj ѲjZj= g(aj)aj= wjixi + Ѳjwj0wj1wjn
Trang 40In which:
- The circle and arrow illustrate the node and signal flow
- 𝑥0, 𝑥1, … 𝑥n: the input variables
- 𝑤ji: the weights of the corresponding signals
- Ѳj: the bias
- aj: net input
- zj: output
- g(x): transfer function (activation function) In a neural network there are three types of units:
✓ Input node: receive signals from the outside ✓ Output node: sending data to the outside
✓ Hidden node: its input and output signals located in the network
Each j node can have one or more inputs: x0, x1, x2, …xn, but only an output zj An input
to a node can be data from outside the network, or output of another node, or its own output
3.4.3- Activation function:
- Linear function, Identity function: Equation 7 will use to calculate when the
inputs are treated as a node Sometimes a constant multiplied by net-input to produce a uniform function
𝐺(𝑥) = 𝑥 (E.q- 7)