1. Trang chủ
  2. » Luận Văn - Báo Cáo

Luận văn thạc sĩ Quản lý xây dựng: Application of support vector machine (SVM) to predict potential leaks on water supply network

129 0 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Application of Support Vector Machine (SVM) to Predict Potential Leaks on Water Supply Network
Tác giả Nguyen Duc Phuong Huy
Người hướng dẫn Assoc. Prof. PhD. Tran Duc Hoc
Trường học Ho Chi Minh City University of Technology
Chuyên ngành Construction Management
Thể loại Master’s Thesis
Năm xuất bản 2024
Thành phố Ho Chi Minh City
Định dạng
Số trang 129
Dung lượng 1,27 MB

Cấu trúc

  • CHAPTER 1 INTRODUCTION (16)
    • 1.1 Problem Statement (16)
      • 1.1.1 HCMC water supply’s status quo (16)
      • 1.1.2 The use of machine learning in water supply industry (18)
      • 1.1.3 Constructions projects’ status quo (20)
      • 1.1.4 Conclusion (21)
    • 1.2 Research Objectives (22)
    • 1.3 Research Subjective (22)
    • 1.4 Scope of Study (22)
    • 1.5 Methodology (23)
      • 1.5.1 Risk Identification (23)
      • 1.5.2 Methodology to apply SVM (23)
    • 1.6 Contribution to academic and practical fields (24)
      • 1.6.1 Academically (24)
      • 1.6.2 Pragmatically (24)
  • CHAPTER 2 LITERATURE REVIEW (25)
    • 2.1 Investment conceptualisations (25)
      • 2.1.1 Investment activity (25)
      • 2.1.2 Capital construction investment (CCI) (26)
      • 2.1.3 Features of capital construction investment (26)
      • 2.1.4 The role of capital construction investment (27)
    • 2.2 Investment project (IP) explanation (28)
      • 2.2.1 Definition of an investment project (28)
      • 2.2.2 Obligations of an investment project (29)
      • 2.2.3 Project scheduling (33)
      • 2.2.4 Schedule overruns (33)
    • 2.3 Delay risks of previous research (34)
      • 2.3.1 Group of external environmental factors (34)
      • 2.3.2 Group of elements of management information system (35)
      • 2.3.3 Policy related factors (35)
      • 2.3.4 Group of factors on decentralisation of authority to the investor (35)
      • 2.3.5 Group of factors on capital sources for project managing (36)
      • 2.3.6 Group of factors related to the capacity of project participants (36)
      • 2.3.7 Group of factors on the capacity of investors (37)
      • 2.3.8 Group of project characteristics factors (37)
    • 2.4 Introduction to WDS (38)
      • 2.4.1 General (38)
      • 2.4.2 Water supply network in Ho Chi Minh City (39)
    • 2.5 Support vector machine (40)
      • 2.5.1 Introduction to support vector machine (40)
      • 2.5.2 How support vector machine works (41)
      • 2.5.3 Kernel function in support vector machine (44)
      • 2.5.4 Margin in support vector machine (45)
      • 2.5.5 Solution to optimisation in support vector machine (45)
    • 2.6 Linear regression (47)
    • 2.7 Previous studies on leakage pinpointing (48)
      • 2.7.1 Usage of variational autoencoder and support vector machine to establish a (48)
      • 2.7.3 Integration of hydraulic software in machine learning (49)
      • 2.7.4 Conventional methods besides machine learning techniques (50)
      • 2.7.5 The revolution of water leak detection and localisation (50)
  • CHAPTER 3 FRAMEWORK STRUCTURING (55)
    • 3.1 Architecture of proposed framework (55)
    • 3.2 Survey data compilation (56)
      • 3.2.1 Questionnaire establishment (56)
      • 3.2.2 Variables size and methodology of determination (57)
    • 3.3 Data investigation technique (60)
      • 3.3.1 Surveying (60)
      • 3.3.2 Evaluation of the reliability (60)
      • 3.3.3 Exploratory factor analysis (61)
      • 3.3.4 Linear regression analysis in statistics (62)
      • 3.3.5 Test for difference (64)
    • 3.4 Model operation (64)
      • 3.4.1 Support vector regression (SVR) (64)
      • 3.4.2 Python programming language integration (65)
    • 3.5 Model assessment (66)
      • 3.5.1 Root mean squared error (66)
      • 3.5.2 Mean absolute error (66)
      • 3.5.3 Coefficient of determination (67)
  • CHAPTER 4 DATA PROCESSING (68)
    • 4.1 Surveying (68)
    • 4.2 Cronbach’s alpha (74)
    • 4.3 Factor analysis (81)
    • 4.4 Factors convergence (84)
    • 4.5 Linear regression analysis (85)
      • 4.5.1 Correlation investigation (85)
      • 4.5.2 Inspection for Multicollinearity (87)
      • 4.5.3 Assumptions testing (89)
    • 4.7 Model training (90)
      • 4.7.1 Data taking technique (90)
      • 4.7.2 Correlation matrix (92)
  • CHAPTER 5 RESULT (96)
    • 5.1 Calculation results of Support Vector Regression model (96)
    • 5.2 Analysis between two datasets in kernel function (98)
  • CHAPTER 6 CONCLUSION (101)
    • 6.1 Conclusion (101)
    • 6.2 Values of research (101)
      • 6.2.1 Academically (101)
      • 6.2.2 Practically (102)
    • 6.3 Study’s limitation (102)
    • 6.4 Future perspectives (103)
  • APPENDIX 1: QUESTIONAIRE (112)
  • APPENDIX 2: SPSS STATISTICAL RESULT (117)

Nội dung

VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY NGUYEN DUC PHUONG HUY APPLICATION OF SUPPORT VECTOR MACHINE SVM TO PREDICT POTENTIAL LEAKS ON WA

INTRODUCTION

Problem Statement

1.1.1 HCMC water supply’s status quo

In any civilisation during the history of the globe, water resource has been always a pivotal key contributing to the development of socio-economic conditions of a society; thus, nowadays, in Vietnam generally and Ho Chi Minh City (HCMC) specifically, the duty of ensuring the security of hygienic water is one of the very political missions of the local government In recent years, HCMC faces dynamic enhancement in various fields, results in the explosive increase in population and labour resource, which requires HCMC’s water supply network (WSN) to be invested to grow simultaneously to meet the rocketing demand

Water supply in HCMC witnesses a roughly 15-decade history of development since the first recorded document of water tower (French - “Château d’Eau”) and water harnessing structures (French - “Captage”) Before 1975, the Department of Saigon Water (Vietnamese - “Sài Gòn Thủy Cục”) was responsible for this welfare service and the years between 1975 and 2005 marked the existence of HCMC Water Supply Company prior to the transformation to SAWACO Since then, the People's Committee of HCMC and the enterprise have been insisting on the endless effort to serve the city residents

Saigon Water Corporation - Single share-holder limited company (hereinafter referred to as “SAWACO”), located at No 1 Cong Truong Quoc Te - Vo Thi Sau Ward - District 3 - Ho Chi Minh City (HCMC), was established in 2005 and being operated in the form of Parent Company - Subsidiary Company The charter capital of 5.139.426.000.000 Viet Nam Dong was contributed by Ho Chi Minh City People's Committee Therefore, SAWACO is a state-owned enterprise

The WDS of around 11.578 kilometres of water transmission pipelines (pipe with diameter of not less than 100mm) in HCMC is being managed, supervised, and operated by different establishments including public companies and private ones SAWACO, member units and subsidiaries are running water treatment plants, namely Thu Duc, Tan Hiep, Tan Phu groundwater and the entire WSN in 21 out of 22 districts

NGUYEN DUC PHUONG HUY - 2270185 2 and city of HCMC (clean water in Cu Chi district is produced and provided by Saigon Water Infrastructure Joint stock company)

Figure 1-1 The logo on the SAWACO’ 17th anniversary of establishment

During the past few years, the abovementioned network has been invested thousands of billion in such as technology, monitoring system, and new pipe installation; notwithstanding, one of the features of the HCMC’s WSN is that the aging pipelines of which some were installed more than 40 years ago; therefore, pipe leaks along the water distribution system (WDS) and burst are inevitable Reportedly, the amount of water leakage worldwide in a year could reach 32 billion cubic metres (Darsana & Varija, 2018); besides, in 2024, HCMC WSN’s water loss is anticipated to be 15,90% among approximately 2 million cubic metres of daily capacity (Tổng Công ty Cấp nước Sài Gòn TNHH MTV [SAWACO], 2024)

Today, HCMC’s water supply has organisationally established District Metrered Areas (DMAs) to control flows and quantity; moreover, when it comes to the management of these sectorised areas, a remarkable number of leakages has induced a plethora of methods and state-of-the-art technology based on either water balance, acoustic or non-acoustic ways to decrease the loss There are some popular practices that are being organised by water supply enterprises, all of them have their own merits and demerits The night flow methodology is widely adopted thanks to the ability to remark that the water consumption during night hours is more stable than in the daytime (Cabrera, Cabrera Jr., & García, 2008); thus, the mean minimum from this method can be used to compare with a new flow figure illustrating leakage

(Boxall, Machell, & Mounce, 2007; Abebe, Mohammed, & Zeleke, 2021) In fact, this method requires well-trained staff to perform professional skill of handling various parametres and the method contains several uncertainties, which turns itself unreliable Moreover, conducted studies also prove the effectiveness of many other approaches that are being put into operation such as water needs calculations, sensors, and ultrasound (Alves, Blesa, Duviella, & Rajaoarisoa, 2022; Amditis, Bimpas, & Uzunoglu, 2010; Cody & Narasimhan, 2020; Goh et al., 2011, Kothari & Marimuthu, 2019; Huang, Xin, Li, & Tao, 2014; Müller et al., 2020; Csail, Madden, Nachman, Tokmouline, & Stoianov, 2007; Xue et al., 2020; Vanijjirattikhan et al., 2022)

1.1.2 The use of machine learning in water supply industry

Artificial intelligence (AI) is a branch of computer science concerned with creating systems or software capable of performing tasks that typically require human brilliance The advent of AI has set different milestones for numberless application since it was first introduced in the 1950s although many people do not recognise its integrity

In the field of medical, AI has promoted enhancement in disease diagnosis and treatment by being capable of analysing data, identifying symptoms, and providing accurate treatment recommendations Furthermore, manufacturing economy takes advantages of the integration of AI in industrial robots throughout the production line, of which the key value is the escalation in productivity, national competence and drop in industrial wastes Transportation sector undergoes safety and efficient automation with the ability to monitor traffic flow, detect and create alternative adjustments in congestion; spontaneously, AI-based self-driving technologies become dominant, which helps to reduce accidents and boost user experience In finance and business, algorithms forecast market trends, optimising risk management and finding potential investment opportunities thanks to data accuracy and sufficient analysis In education, AI has created more effective and personalised learning experiences AI- based learning systems can assess competencies, manage learning progress, and provide teaching materials tailored to individual needs or per requests to enhance learning efficiency, encourage creativity, and meet diverse learning needs

The world witness breakthroughs by in solving complex and diverse problems comprising of the findings factors causing delay in construction projects; more specifically, the advancement of AI, especially through Machine learning (ML), provides effective and supportive methods to accurate and trustworthy predictions First, ML offers the ability to process big data efficiently In the construction industry, there is tons of data from past projects about costs, materials, labour, machinery, and project scheduling Organising and analysing this amount of data are time-consuming and require considerable effort However, with ML, this entire process can be automated, saving significant time and resources (Mitchell, 2017) More correct projections can be made by ML models considering their capability of learning from errors in previous predictions and self-adjusting to improve accuracy, which means this not only minimises the risk of time overruns, but also helps contractors distribute resources more productively and adequately Burkov (2019) in his book perceives that ML support to detect non-obvious trends and patterns in data of which humans may not be aware The calculations are further proven to classify possible factors affecting delays such as variations in raw material prices, or changes in construction policy regulations

On account of these ability, using ML to classify leak points in the WDS is not unrealistic; otherwise, the method could reasonably outweigh other conventional leakage pinpointing performances WSN have many relevant components such as pressure, flow quantity, pipe diametre and materials; and even geologic conditions and elevation play considerable role in the sustainability of the system This creates a large and consistent amount of data, which is great for training ML models According to Elshaboury & Marzouk, several models adopting different methods are applied, which include Multiple Linear Regression, Feed-Forward Neural Network, General Regression Neural Network and Support Vector Regression, to anticipate the lifecycle of an actual water supply network in Egypt The research concludes that the system is at the end of the service life and needs repairing plans; besides, the comparison result among ML models is shown in Table 1-1

Table 1-1 The output of four machine learning models

The certain prosperity of a nation’s wealth is believed to be built on the investment and construction projects since this industry plays a key role to such strategy Take China as an example, billions of US dollars have been oriented in numerous constructions from ground to air infrastructure since the past few decades The fruitful outcome can be seen by how huge this economy is In Vietnam, during the past few years, strategic decisions to allocate and push the state fund endorsement into public projects have been urge by central government and broadly welcome and appreciated by local authorities in order serve the country’s thirst of breaking through in all areas Scientifically, the failure of a project consequently leads to societal drawbacks, for example, adverse economic impact and commercial competence, lack of amenities and welfare facilities, poverty, and low living standards; as a result, in accordance with a certain risk factor, the assessment is sometimes followed by valuable proposals (Mai, Nguyen, & Vu Quoc, 2021)

It is undeniable that RRUWPP are among construction ones; as a result, once a water supply project is not accomplished according to the initial plan, it is highly possible that the community must encounter shortage of clean and potable water In the study byDurdyev & Hosseini (2020), the probability of being behind scheduled progress is high due to couples of reasons and there have been a lot of studies cataloguing causes of delays in construction projects Sometimes, because of limited financial resource, the investment is not well allocated either objectively or subjectively, which consequently makes urgent tasks poorly managed or at legal risks

NGUYEN DUC PHUONG HUY - 2270185 6 1.1.4 Conclusion

Therefore, ensuring the stable exploitation of this long-standing WDS requires a sustainable solution to resources allocation to crucial water supply projects whilst comprehending a machine learning method appear to be more than a trend in this era of globalisation; hence, it could be considered as the transition movement toward a 4.0 construction sector where the community is adapting the excellences of AI Furthermore, reducing the amount of uncontrolled water loss while implementing projects on schedule on the network not only protects natural, financial, and human resources, the environment, and operating expenditure but also helps ensure the safety of surrounding structures In the research, Yeng et al., (2024) conduct an experiment showing that ensuring water supply also makes contribution to the decrease in water and food shortage risk considering the impacts on each other of the three elements of water, energy, and food systems

Moreover, in the aspect of AI, there are some certain challenges and difficulties throughout the models training, of which a huge obstacle is the collection and management of input data, especially in nations applying traditional methods without recording past leakages To be able to make accurate forecasts, it is necessary to have a complete, accurate and reliable source of input data or an appropriate method based on real-time or real data In addition, using machine learning models in the water supply industry also requires understanding of algorithms and technology from relevant engineers who are acquainted with various parametres, and support from professionals in the field of computer science Training and educating WDS operating staff in ML are important issues to ensure project expertise and quality; in the long run, the operation expense amount on traditional acoustic method which is currently broadly used in HCMC will outweigh the initial investment

Research Objectives

This study aims to spot liabilities causing delays of RRUWPP and employ SVM to pinpoint potential leakages and bursts on the WDS To achieve this goal, the following steps are proposed:

- Systematise scientific works such as domestic and internation journal articles regarding risks affecting the progress of construction projects On that basis, a study model is proposed to find factors causing delays of RRUWPP using state fund in HCMC, thereby providing implications for relevant parties participating in the project to have solutions for better scheduling and managing RRUWPP in this biggest economy of Vietnam

- Data collection and analysis: To build a model to anticipate probable leak points in the WDS, it is necessary to have a source of real, accurate data, which includes information about pressure, pipe length for research purposes The data can be processed by a professional modelling software from raw data inputs or other sources

- Model training: Apply SVM for forecast and research aspiration using the Python programming language to process with input data and train the model The data collected is provided as well as built from the one which is analysed by a specific modelling software called Bentley WaterGEMS in the water supply industry from actual numbers in a specific WSN in HCMC More specifically, the training model is expected to predict the size and location of leakage in the WDS.

Scope of Study

The survey in this dissertation is studied within HCMC area; however, the hydraulic information is from the system of 37 wards across 3 districts in this metropolitan while the study is conducted from February to May 2024, during which the questionnaire and its correspondences have been being delivered and received until the end of April 2024.

Methodology

In order for the primary factors inducing delay of the construction of RRUWPP to be analysed, a questionnaire is established to obtain data compilation from the beginning outlines More precisely, literature reviews from relevant studies conducted domestically and internationally help identify factors for the questionnaire; in addition, preliminary consultation with experts, government agency specialists and engineers with more than 10 years of exposure to water supply construction project is also implemented by direct interviews on the purpose of finalising the questionnaire delivered to the correspondents Diagnosed risks belong to group factors of supplies, design, external conditions, finance, labour, equipment, regulations and other factors will be selected on the 5-level Likert scale so as for the author to conduct a quantitative survey Furthermore, correspondents are professionals, specialists, engineers, and individuals working in the field of project management along with their experience, knowledge and participating either directly or indirectly in the water supply projects using state capital in HCMC

The studied data is collected from a water supply joint stock company operating WDS adjacent to the centre of HCMC; however, the numbers are requested to be anonymously provided Generally, after analysing the network by using a water supply hydraulic modelling software called WaterGEMS by Bentley, the inputs to train model for SVM comprise of information regarding a few pivotal parametres such as pressure, quantity flow, and elevation Engineers are aware of one of WaterGEMS’ advantages is its ability to foresee different scenarios such as pipe burst, fire and population growth in the future; importing real-time data from ArcGIS by Esri to WaterGEMS is among the methods to supervise, operate and manage the WSN The research on SVM in this study includes the feasibility in predicting potential leak points on the WSN through basic parameters of the supply network water in the system.

Contribution to academic and practical fields

The topic proposes theory on RRUWPP including the methodology to find leakage and an application by ML in the water supply sector In addition, this study can be used as a reference for students researching issues related to the progress of RRUWPP, contributing part of the theoretical basis and discussion for further understandings in this field

Positive outcomes are the primary delay risks of RRUWPP in HCMC and the actual application of support vector machine, which is a branch of machine learning, in the time of information-technology-oriented era

LITERATURE REVIEW

Investment conceptualisations

To the most general understanding, investment is the activity of execute planned resources in socio-economic fields to gain benefits in different forms in the future; it is the act that people using tangible or intangible assets to form assets to conduct investment activities Some people believe that investment is the sacrifice of current means such as money, labour, and physical assets to conduct activities to attract prospective results greater than the expenditure exploited to implement a specific strategy The expression of all the abovementioned resources is collectively called investment capital Theoretically, the activity is usually catergorised into various sectors based on the nature and scope of advantages brought by investment:

- Financial asset investment allows people lend money or buy valuable certificates to earn predetermined interest rates, namely savings and Government bonds, or rates depending on the business results of the issuing enterprises via stocks, corporate bonds

- Commercial venture provides businesspeople opportunities to purchase goods then sell them at a higher price to gain profits due to the difference between the buying price and the selling price

- Investment in physical assets and labour is a type of investment in which people spend available funds to conduct activities to maintain and directly create new assets for the economy, increasing capacity, production, business, and all other social activities, of which the preference can be seen in the decrease in unemployment and social devils’ rate

- Capital investment is an activity to create fixed assets to put into operation in different socio-economic fields In investment activities, investors must pay attention to the following factors: labour power and materials, and objects Unlike objects of labour, namely raw materials, unfinished products, semi-finished products, and the means of labour such as machinery and transportation, physical means are used to influence the object of labour, transforming it to personal purposes

Overall, there is no investment activity that does not require settled assets It includes qualified technical facilities and can be adjusted to suit the price in compatible period The activity contains different types including direct investment, indirect investment (loan), short-term, medium-term, and long-term investment

CCI is a type of investment in which the purpose of spending available resources to build construction works in accordance with the investor’s purpose, which is the field of physical production that creates fixed assets and create technical facilities for society, thus, it is one of economic activities The function of creating immovable assets for the economy through forms of new construction, expansion, modernisation, or restoration of established properties makes CCI be believed to be a part of general investment activities, which is the investment of capital to conduct capital construction activities for simple or expanded reproduction of fixed benefits for the national economy In all lands, investment capital is accumulated mainly from post- consumption savings of individuals and the government from gross domestic product Therefore, in the long term, internal savings will ensure stable economic growth and development, helping the country become independent and self-reliant in the economic field as well as other fields As a matter of fact, every nation is interested in attracting foreign investment to meet the country's socio-economic development requirements

2.1.3 Features of capital construction investment

CCI activities are a part of development investment, consequently, bearing characteristics of development investment:

- Long for remarkable and stagnant amount of financial, labour and material costs and this capital source may be dormant throughout the project lifetime Therefore, during the investment process, projects stakeholders especially investors must have plans to mobilise and allocate resources appropriately for the project to be on schedule The resources are but not limited to finance, labour, machinery, materials

- Long time with many fluctuations: the period of an investment until its results become fruitful often always takes time during which many fluctuations happen

- CCI often carries long-term use value, sometimes hundreds, thousands of years, and even forever if periodically and correctly maintained The worldwide reputed constructions are but not limited to Babylon hanging garden in Iraq, the Egyptian pyramids, the Great Wall of China, the Angkor Wat temple in Cambodia

- The results of construction investment activities are construction works that are settled where they are built, so geographical and topographic conditions have a great influence on the implementation process as well as promoting investment results Therefore, it is necessary to reasonably arrange the construction site to ensure national security and defense requirements and to be consistent with approved plans and arrangements in places with favourable circumstances; furthermore, the outcome of the investment must exploit and ensure comparative advantages of the region

- CCI activities are very complex and involve many industries and fields It arises not only within one but also in many localities together Therefore, when this activity is implemented, there needs to be close connection among sectors and levels in managing the investment process to clearly define the scope of responsibility of the entities participating in the investment

2.1.4 The role of capital construction investment

Overall, CCI is first and foremost an investment activity and therefore plays common roles in investment activities such as aggregating demand, affecting stability and supply chain, and strengthening the country's scientific and technological capabilities In addition, with its specific nature, capital construction investment is a necessary foundation for national prosperity, having its own effects on the economy and each production facility More specifically:

- CCI ensures correspondence between technical facilities and production methods whilst making itself a condition for enhancing economic sectors and changing the balance among them

- CCI can solve the complicated combination among methods Each production method, namely product characteristics, human resources, capital, and location conditions, delivers different requirements for machinery and equipment, factory When investment in capital construction is enhanced, the development in technical facilities of the industries will rocket material and service production capacity of the industry, which leads to the positive trend of establishing and forming industries to serve the national economy Thus, the production capacity of the entire economy is improved thanks to that the CCI has changed the structure and development scale of a plethora of economic sectors This is a case for rapidly increasing production and total value of domestic products, boosting accumulation and improving the material and spiritual life of workforce, meeting the requirements of basic political, socio-economic targets and tasks

In a nutshell, CCI is a momentous activity in the process of implementing development investment because it has a direct decision on the formation of economic advancement strategies for each period, contributing to changing the economic management mechanism and economic policies of a state.

Investment project (IP) explanation

2.2.1 Definition of an investment project

An investment project is believed to be a set of interrelated policies, activities and costs planned to achieve certain goals within a certain period whereas others think of it as a collection of proposals to invest determined medium and long-term capital to conduct investment activities in a specific area, within a planned schedule to achieve a certain goal or requirement Investment projects are a primary step for competent state agencies to supervise management measures and grant investment licenses Also, it is the basis for investors to implement investment activities and evaluate the effectiveness of a project whilst acquiring undeniable crucial values in convincing investors to decide to invest and credit institutions to fund the project For this study, RRUWPP projects absorb capital fund from SAWACO or its subsidiaries These capital sources are state capital outside the budget, so the above projects are classified as capital construction investment projects to serve social benefits

Formally, it is a set of documents that present in detail and systematically the activities and costs according to a plan to achieve results and achieve certain goals in the future From a management perspective, an investment project is a tool that uses capital, materials, and labour to create financial and socio-economic results over a long period of time Regarding planning, an investment project is a tool that represents a detailed plan of an investment in production and business, socio- economic development, as a premise for investment decisions In terms of content, an investment project is a set of related activities planned to achieve set goals by creating results

2.2.2 Obligations of an investment project

To ensure feasibility, investment projects must meet the following basic requirements:

- Scientific characteristics of an IP shows that relevant drafting parties of the IP must have a thorough research process, carefully and accurately calculate each content of the project, especially the financial and technical features Scientific feature is also shown in the process of drafting investment projects, requiring advice from specialised agencies

- Practicality demands the contents of IP must be comprehensively studied and determined based on consideration, analysis, and proper evaluation of the specific conditions and circumstances directly and indirectly related to the operation such as exchange rate and supply chain

- When it comes to legality, IPs need to have a solid legal basis, being consistent with prevailing policies and laws This is one of the pivotal requirements that project stakeholders must carefully study the State’s guidelines and policies and legal documents related to investment activities

- Regarding uniformity, IPs must comply with general regulations of authorities on investment activities, including regulations on investment procedures Importantly, the IP in accordance with agreements with foreign stakeholders must obey international regulations

The project lifecycle includes 3 stages: Investment preparation; investment implementation; construction inspection and operation (end of investment project)

Pre- feasibility Feasibility Design Bidding Construction Inspection

Investment preparation Investment implementation Accomplishment

Figure 2-1 Diagram of a project lifecycle

- Investment preparation stage requires thorough understandings of laws and regulations and how they can be applicably used on a certain project It is because of the nature, purpose, scale, and/or importance of the project that a specific procedure is applied pertinently For example, important national projects, the investor must prepare an investment report (or pre-feasibility report) to submit to the Government for consideration to submit to the National Assembly to approve the policy and permit investment For group A projects that are not included in the industry planning approved by competent authorities, the investor must report to the industry management ministry to consider and supplement the planning in accordance with its authority or propose to the Prime Minister for approval to supplement the planning before establishing a construction investment project The location and scale of construction must be consistent with the construction planning approved by the competent authority, otherwise it must be approved by the People's Committee of the province or city

+ Pre-construction period includes but not limited to these activities construction survey; formulation, appraisal of pre-feasibility study report, decision or approval of investment policy; formulation, appraisal, approval of detailed construction planning to formulate feasibility study report; formulation, appraisal of feasibility study report for construction approval or decision and other necessary tasks related to pre-construction

- Investment implementation stage: After the IP is approved, it is transferred to the next stage of investment implementation

+ The first step is choosing a consulting organisation which is experienced in consulting and designing and capable of executing research in early stages from the design stage to the construction management and supervision stage It is believed to a decisive and complex task While choosing a consulting unit, the decisive factor is that this consultant must have experience through projects they have previously performed via multi-perspective evaluation A popular method applied to select is to demand consulting agencies to provide their profile and portfolios including past projects and works The selection of construction consulting contractors must be carried out in accordance with ongoing regulations and laws such as bidding and appointing

+ After selecting the design contractor, based on the approved project, the selected consultant organises its implementation for the work depending on the scale and nature of the construction project, the design can be done in one step, two steps or three steps regarding the total investment and construction grade However, to comprehensively understand which regulations to be applied for the project, the consultant must be knowledgeable One-step design is construction drawing design applied to projects that only prepare techno-economic reports while two-step design includes basic design and construction drawing design applicable to works requiring investment projects Additionally, three-step design includes basic design, technical design and construction drawing design applied to works that require project preparation and are of special grade, grade I and grade II works and/or demands complex techniques The decider of each project must comply with ongoing regulations

+ After the design product is formed for each step, the procedure is that the investor organises the dossier of the appraisal and submits it to the competent agency (specifically the person with authority to make investment decisions) for approval In case the investor does not obtain inspection capacity, they are allowed to hire qualified consulting establishments and individuals to verify the project design, estimates, and technology (if any) as a basis for approval Based on the results of appraisal, the person with investment decision-making authority decides to approve

NGUYEN DUC PHUONG HUY - 2270185 17 the IP, which is followed by the bidding procedures to select a contractor with sufficient capacity and relevant experience to provide appropriate construction products and services at a reasonable bid price as well as meeting the requirements from the investor and project’s targets

+ After selecting the construction contractor, the investor organises negotiations to sign a construction contract with the contractor and proceeds the construction management of the project The content of construction management includes management of but not limited to construction quality; construction progress; construction volume; labour safety on construction sites; environment; safety to the surroundings

+ In short, during this period, the investor is responsible for compensation, construction site clearance according to progress and handing over the construction site to the construction contractor; design approval; procurement procedure; negotiating and signing contracts, managing the technical quality of the project throughout the construction process and take responsibility for all work performed during project implementation

+ Construction phase includes but not limited to these activities site preparation, demining (if any); construction survey; formulation, assessment and approval for design and construction estimate; issuance of construction permit (if required); selection of contractors and signing of construction contracts; execution of the works; supervision of work execution; advances or payments for completed works; commissioning; taking-over of the works; handover of completed works and putting into operation; and other necessary tasks

Delay risks of previous research

2.3.1 Group of external environmental factors

The group of external environmental factors is comprised of not only economic and natural factors but also risks which are beyond the control of project participants Vasugi (2018) and Berihu et al (2023) in their studies believe from previous studies that out-of-reach risks that cause project overruns are but not limited to severe weather and ground conditions, inflation, interest rates, escalation and force majeure For community projects, the capital sources for implementation are allocated annually by the budget and the interest rate factor does not contribute much to the total estimates; therefore, sales target is only meaningful to the financial capacity of the contractors Sometimes, projects absorbing local financial fund rarely demand imported equipment, in that case, the exchange rate factor is not the key, which can be inclined that inflation and construction material prices remain unchanged because when inflation fluctuates, construction material prices compatibly change and the only remaining economic factor is construction material price slippage Besides, adverse weather conditions and engineering geological factors are also a natural factor that can affect project implementation progress because of design adjustments and ground treatment (Doloi, 2012) On-site foundations will take a long time when the difference compared to the survey results occur Therefore, external factors can be but not limited to economic issues and weather and geological conditions

NGUYEN DUC PHUONG HUY - 2270185 20 2.3.2 Group of elements of management information system

A different feature between projects using state budget resource compared to private capital is that they are strongly influenced by the construction legal system and other relevant regulations from central to local level authorities, as a result, the timely dissemination of policies also makes certain contributions to project completion progress (Nhân, 2010) The group of elements of the management information system are popularity of regulations on investment and construction management, level of dissemination of project area planning information, and grade of geological information in the project area

According to BS 6079-3 (2000, p.21), 7 policy factors include unexpected changes in management regulations, changes in tax policy, nationalisation, change of government, war and enemy destruction, property rights, compensation costs In Vietnam, the political system is stable and sustainable, no nationalisations, wars, enemy sabotage and strikes occur; moreover, capital for project implementation comes from the state budget, as a result, tax policy is almost not impacted unless the government strategically changes to encourage economic growth, however, this is usually a inconsiderable change The remaining policy factors are mainly directly related to regulations on investment and construction management for state budget capital Therefore, the policy elements belong to the group are guideline establishment on investment and construction, stable level of salary arrangement, stability in bidding policy, and contract policy cohesion In a survey conducted by Nguyen et al (2021), policy change risks make decisive contribution to the accomplishment process of water supply projects

2.3.4 Group of factors on decentralisation of authority to the investor

The role of clients or investors in investment and construction management is a crucial factor contributing to accelerating project implementation progress as some studies prove the importance of the responsibility of the mentioned party to the process (Abdelaal, Farrell, & Emam, 2015; Israngkura Na Ayudhya, 2011) Strongly decentralising decision-making authority to investors will be effective on account of

NGUYEN DUC PHUONG HUY - 2270185 21 the rising initiative to promptly adapt to market mechanisms The element of decentralisation for investors is associated with the construction investment sequence from project planning, detailed design, bidding, settlement and in between are adjustment procedures in cases of changes compared to the original plan As a result, the group of factors on decentralisation of authority to investors is comprised of the decision authority by the investor in the approval of the project, designs and estimates, bidding results, payment, and adjustments compared to the plan

2.3.5 Group of factors on capital sources for project managing

According to Faniran Olusegun, Oluwoye Jacob, and Lenard Dennis (1998), asset for the project is one of important factors having great impacts on a project’s scheduling; moreover, about the opinion, a research by Belassi and Tukel (1996, p.146) shows that construction projects can benefit from the availability of costs in budgeting plan Vietnam witnesses the economy moving towards the market and the need for investment resources therefore accelerates According to behavioral theory, contractors will not carry out construction on schedule if capital is not sufficiently and timely allocated or paid following completed works The importance of financial sources for CCI projects has been confirmed through the government’s determination to improve capital allocation in past few years Besides capital arrangement, completing payment documents is also a factor that contributes to timing payment to the contractor Based on the above hunches, capital sources for project implementation are of the following representative factors: availability of project capital in the budget plan, timeliness in completing payment documents, and timeliness of payment after completing documents

2.3.6 Group of factors related to the capacity of project participants

For project participants, Thi (2006) separates the capacity of the project manager and the capacity of the remaining parties as two groups of factors that affect the success of the project For projects from the state budget, project management can be accomplished by that the investor either implements the project or hires a project management consultant; most strikingly, the investor must be responsible for the decisions and advice in accordance with the regulations since project management

NGUYEN DUC PHUONG HUY - 2270185 22 consultants bear the responsibility of an advisor Due to its great influence, the investor's capacity factor is separated for consideration and the group of capacity factors of the remaining project participants includes capacity of design consultant, supervising consultant, project management consultant, labour resource capacity of the main contractor, financial capacity of the main contractor, machinery and equipment capacity of the main contractor

2.3.7 Group of factors on the capacity of investors

Belassi and Tukel (1996) affirm that the group of factors includes “ability to decentralise; negotiation ability; coordination ability; decision-making ability; cognitive ability about the roles and responsibilities of managers” delivers huge impact on the success of construction projects The role of investor for budget projects in Vietnam is undeniably decisive; in addition to the above factors, investors of projects using state budget capital in Vietnam must ensure the ability to understand the law and the capacity of statistically reporting project implementation to superiors or management levels regarding relevant plans and budgets On that basis, the study proposes factors representing investor capacity to coordinate contract implementation, to understand regulations and technical expertise, to make decisions with authority, to resolve problems, to report statistics, to understand management roles and responsibilities

2.3.8 Group of project characteristics factors

Nguyen & Nguyen (2019) synthesise from some of previous studies that project characteristics are groups of factors bearing indirect relationships, affecting the level of impact of groups of direct factors on the success of a project whose typical project factors contains goals, scale, project value, finance, total investment, specificity of tasks, complexity, urgency, lifecycle and project type As a matter of fact, researching all of the above factors is very complicated and challenging in data collection; hence, based on the characteristics of budget projects in Vietnam which are usually managed according to budget level with two main forms of project management which are the investor self-managing the project or hiring a project management consultant, the

NGUYEN DUC PHUONG HUY - 2270185 23 study proposes budget providing and form of project management are the two representative factors.

Introduction to WDS

The world's WDSs are comprised of many key components such as water source, terrain, and water demand Sources such as rainwater and groundwater from shallow wells often do not need a supply pipe system and are often equipped with hand pumps to collect water and be consumed directly However, treated surface water will be distributed in the WSNs To build an effective network and ensure its continuity, the system will be designed and built on suitable terrain with a combination of different types of pipes, equipment, and materials which are highly durable and quality guaranteed In addition, water demand plays a decisive role in determining the scale of the network system The WDS must be arranged to deliver adequate water to consumers in any situation including in the case of firefighting while ensuring it hygienic Importantly, all pipes in the WSN must be watertight to minimise losses due to leakage and should preferably be installed one metre away from or above the sewage pipeline

Figure 2-2 Distribution of water supply

Accurately determining the flow direction of the water supply system from source to distribution junction will be of great help in finding dangers and risks thanks to the fact that it helps identify risks that consumers may face and take measures to control them It is widely acknowledged that regularly checking the accuracy of the flow direction based on practical experience is also an important data For large systems, dividing flow direction based on network elements, for example water

Water source Treatment Distribution Consumption

NGUYEN DUC PHUONG HUY - 2270185 24 source, treatment and distribution, and consumption, will be very effective in water management and network operation

2.4.2 Water supply network in Ho Chi Minh City

Figure 2-3 Ho Chi Minh City’s water distribution system

Now, HCM’s WSN is being managed and operated by many different units including state and private enterprises: SAWACO and its affiliated units and subsidiaries are responsible for the supply in 21 out of 22 districts, cities while Saigon Water Infrastructure Joint stock company manages the WDS in Cu Chi district

Table 2-1 SAWACO’s subsidiaries and their supervision

Ben Thanh WASUCO District 1, 3 (except Ward 12, 13, 14)

Cho Lon WASUCO District 5, 6, 8, Binh Tan

Gia Dinh WASUCO Binh Thanh, Phu Nhuan, Go Vap (Ward

Thu Duc WASUCO Thu Duc city

Phu Hoa Tan WASUCO District 10,11 and Tan Phu (Phu Trung ward)

Nha Be WASUCO District 4, 7, Nha Be

Tan Hoa WASUCO Tan Binh, Tan Phu district (except Phu

Trung ward) Trung An WASUCO District 12, Go Vap (except Ward 1) Can Gio Water Supply Factory Can Gio District

HCMC Rural Water Supply Factory Binh Chanh district

Saigon Ground Water Company Hoc Mon District

Support vector machine

2.5.1 Introduction to support vector machine

SVM is one of the most popularly used algorithms in machine learning before artificial neural networks made a comeback in deep learning models in light of its function of a supervised algorithm for both classification and regression

Figure 2-4 Hyperplane and support vectors

The goal of SVM is to find a hyperplane in the N-dimensional space (correlating to N features) that divides the data into two parts corresponding to their classes In this algorithm, the data graph is plotted as points in n dimensions (where n is the number of features) with the value of each feature as an associated part, followed by a hyperplane dividing the classes being searched Hyperplane is simply a straight line that can divide layers into two separate parts while support vectors are simply understood as objects on the observed coordinate graph so it can be inclined that SVM is a border to best divide two classes More precisely, a point in vector space can be considered as a vector from the origin to that point In addition, the data points

NGUYEN DUC PHUONG HUY - 2270185 26 situated on or closest to the hyperplane which affect the location and direction of the hyperplane are called support vectors and theoretically, they must be equidistant from the hyperplane (Figure 2-4 Hyperplane and support vectors) These vectors are used to optimise margins and if these points are eliminated, the position of the hyperplane will change

It can be seen in Figure 2-5 Possible hyperplanes and one with the maximum margin that there are obviously a lot of hyperplanes that can divide different layers (graph on the left); however, striking goal is to find the hyperplane with the widest margin, that is, with the greatest distance to the points of the two classes (graph on the right)

Figure 2-5 Possible hyperplanes and one with the maximum margin

2.5.2 How support vector machine works

There are usually a few basic rules to determine the right hyperplane:

- Rule number 1: Choose a hyperplane that best divides the two layers (see line

B in Figure 2-6 Hyperplane in Rule 1)

- Rule number 2: Here, according to rule 1, all three paths A, B, C are satisfied However, the best hyperplane is line C because after determining the largest distance from the closest value of a certain class to the hyperplane (margin), it can be seen in Figure 2-7 Hyperplane with the largest margin that the largest margin distance is divided by line C If we choose a hyperplane with a lower margin, when the data increases later, there will be a high risk of misidentifying the wrong class for the data

Figure 2-7 Hyperplane with the largest margin Source: Internet

- Rule number 3: B can be misunderstood as the suitable hyperplane because it has a higher margin than A, in contrast, this is not true because the first and foremost

NGUYEN DUC PHUONG HUY - 2270185 28 rule will be rule number 1, we need to choose hyperplane to separate classes Therefore, path A is the correct needed hyperplane (see in Figure 2-8)

Figure 2-8 The appropriate hyperplane even though it does not have the largest margin

- Rule 4 (distinguishing two separate layers): In most cases, two layers cannot be separated with a straight line to create space containing only either stars or points

It can also be accepted that a star at the outer end is treated as a more outer star, thanks to the feature of SVM that allows to ignore outliers and find the hyperplane with the maximum boundary Therefore, SVM has strong ability to accept exceptions

NGUYEN DUC PHUONG HUY - 2270185 29 2.5.3 Kernel function in support vector machine

Reality shows that the data is separated non-linearly and it is difficult to determine the hyperplane to classify red and blue data points according to data separation classes like the rules above To solve this problem, SVM uses Kernel method, which is a function that maps data from a lower-dimensional space to a higher-dimensional space, from which the hyperplane that separates the data is determined Visually, this technique is like folding a piece of paper so that people can use scissors to cut a round hole in it

Figure 2-10 Description of the Kernel method

SVM solves this problem by simply adding a feature of a kernel z = x 2 + y 2 Now the data will be transformed along the x and z axes (see Figure 2-11)

Figure 2-11 A kernel added to create z axis

In the diagram above, the points to consider are all data on the z axis will be positive numbers because such calculation is the sum of x and y squares More specifically, on the graph, the red circular points appear closer to the x and y axis so z will be smaller, as a result, located closer to the x axis in the graph (z, x) In SVM, having a linear hyperplane to divide into two layers is adaptable

2.5.4 Margin in support vector machine

Margin is the distance between the hyperplane and the two closest data points corresponding to the classes For example, the margin is the distance between the hyperplane and the two data layers (square and circle) (see Figure 2-5 Possible hyperplanes and one with the maximum margin) The important thing here is that the SVM method always tries to maximise this margin, thereby obtaining a hyperplane that creates the furthest distance from the two data layers Thanks to that, SVM can minimise misclassification of newly introduced data points

2.5.5 Solution to optimisation in support vector machine

Assuming that couples of training data sets are (x1 , y1), (x2 , y2), … , (xN , yN), in which xi ∈ R d illustrates input of a data point and yi is its label while d is the number of dimensions and N is the total number of data points (see Figure 2-12)

Figure 2-12 Data points in two-dimensional space

Presuming that blue squares belong to class 1 in the positive side while red dots are in class -1 in the negative side of the surface 𝑊 𝑇 𝑥 + 𝑏 = 0 , which is dividing

NGUYEN DUC PHUONG HUY - 2270185 31 two classes Overall, in case two classes swop each other’s side compared to the surface, the value needs to be compatible with the positive or negative side of the class The coefficients W and b need to be found considering the importance that for any pair of data (xn, yn), the distance from that point to the dividing surface is:

As afore assumed, is in the same side as (either positive or negative), as a result, yn is also in the same side as 𝑊 𝑇 𝑥 + 𝑏, and the numerator is always a non- negative number consequently

With the division surface as above, margin is calculated as the closest distance from a point to that surface (regardless of the point in the two classes):

||𝑊|| 2 Based on this, the optimisation problem in SVM is the problem of finding W and b for the margin to reach the maximum value Moreover, that the dividing surface does not change when W or b changes means that the distance from each point to the dividing surface does not change and making the margin stable Based on this property, it can be assumed as follows:

𝑦 𝑛 (𝑊 𝑇 𝑥 + 𝑏) = 1 with the points closest to the dividing surface as in Figure 2-12 Data points in two- dimensional space

To optimise so that margin reaches the maximum value in the above equation, the formula can be used as:

2||𝑊|| 2 2 in which the condition is:

Linear regression

Linear regression is one of the basic and very popular predictive analysis techniques It focuses on two main aspects:

- Evaluating the appropriateness of the independent variable: Determine whether the group of independent variables preserves the ability to reliably predict the dependent variable This involves examining the extent to which the independent variables explain variation in the dependent variable

- Impact of independent variables: Analyse which variables have a significant influence on the dependent variable and how that influence occurs The goal is to identify relationships between variables and better understand how they impact each other

Linear regression is a powerful tool that helps understand and predict relationships between variables, aiding in data-driven decision making on account of its three main applications of linear regression including but not limited to:

- Determining the influence of predictor variables: Assess how independent variables affect the dependent variable, helping to understand the relationship between them

- Anticipating the effects of value changes: Predict the change in the outcome of the dependent variable when there is a change in the independent variables This is useful in assessing the impact of changes in inputs

- Forecasting future trends and values: Use regression analysis to predict future changes, such as predicting product prices based on market factors This is a popular application in the field of economics and finance.

Previous studies on leakage pinpointing

2.7.1 Usage of variational autoencoder and support vector machine to establish a framework in water distribution system

In the study of “Domain-informed variational neural networks and support vector machines based leakage detection framework to augment self-healing in water distribution networks”, McMillan et al (2024) summarise different research using machine learning and deep learning in predicting pipe bursts to clarify various methods suitable for their scientific aim The authors recognise that there are a lot of error data due to actual operation and that of maintenance diary may sometimes be insufficient, as a result, Kalman smoothing and isolation forest techniques to process raw data are used to tally the problem Moreover, the exploration strengthens the benefits of variational autoencoder and SVM as well as their combination in building a framework for flow time series to avoid favoritism Precisely, the variational autoencoder allows to reduce number of dimensions but still carries inconstancy while delivering satisfactory information; in contrast, SVM is proven to be powerful with its function of classification in this study The authors also compare the effectiveness of the proposed method to the minimum night flow

In conclusion, the framework suggests a better understanding of a new technique in the water supply industry and its merit regarding operation cost; notwithstanding, the authors also admits that there are some drawbacks about the tool and there must be further research in the future

NGUYEN DUC PHUONG HUY - 2270185 34 2.7.2 Application of artificial neural network to strategically detect water leakage

Research “Machine learning model and strategy for fast and accurate detection of leaks in water supply network” proposed by Fan et al (2021) illustrates an advancement of algorithm in real life practice of water supply industry by adopting the combination of EPANET, which is a hydraulic modelling software and ML to identify leaks In the study, artificial neural network can skillfully pinpoint the leaks and classify them among mixed data while demanding a balance data due to the complicated actual operation of the WDS Therefore, autoencoder neural network is also developed to predict the leak points with imbalance data; this unsupervised ML model proves that the effectiveness of the strategy is remarkable The strategy that suggests the idea of monitoring the WSN based on different coefficients using ML as well as heightening classification efficiency by multiple separate detection shows fruitful outcomes; additionally, this technique can be worked with Internet of Thing technologies including smart water metres

2.7.3 Integration of hydraulic software in machine learning

Mashford et al (2009)’s work “An approach to leak detection in pipe networks using analysis of monitored pressure values by support vector machine” shows a creative approach to anticipate leak points by compiling several pressure data in a water supply network in Australia; and SVM is adopted after collecting simulated results processed by EPANET, which is a reputed hydraulic modelling software Authors summarise the benefits and drawbacks of some other either conventional or advance techniques in predicting leakages prior to developing a ML-oriented method of high efficiency Moreover, the reason to apply EPANET to obtain training data is that the authors do recognise the infeasibility of gathering actual leak records due to some difficulties and for pressure exponent, despite its experimentally assigned for different piping material, the value is set to be as same as 0,5 due to shortage of counter-confirmation, which might lead to insufficiency in data obtained from EPANET as the formula of EC = Q/P Pexp is the primary mean to collect numbers After many adjustments and tests, the output for the model provides promising

NGUYEN DUC PHUONG HUY - 2270185 35 benefits and the even though this study is based on experimental data, the method can be effectively brought into practice

2.7.4 Conventional methods besides machine learning techniques

Obeid et al (2016) with their work of “Towards realisation of wireless sensor network-based water pipeline monitoring systems: a comprehensive review of techniques and platforms” provide deep insights into various non-ML techniques of water leakage detection in the distribution system such as acoustic, ultrasound and pressures methods, through which many platforms are evaluated in accordance with their pros and cons then an optimal chip-based solution is promoted to achieve trustworthy outcome In the research, authors gather couples of different previous studies to clarify advantages and disadvantages of some methods including visual surveillance, hearing technique, ultrasound, electromagnetic, and computer-based supervising Even though the mentioned methods have been run and researched globally, some drawbacks from the deployment that the authors study urge the development of a new model called EARN-PIPE, which uses a single chip to conduct several steps within the architecture However, the authors recognise the demerits of this procedures and suggest external add-ons to comprehensively operate the technique

2.7.5 The revolution of water leak detection and localisation

“Leak detection and localization in water distribution networks: Review and perspective” by Romero-Ben et al (2023) review different model- and data-based methods extracted from 92 past studies and clarify certain pros and cons for each technique The summary of the article based on such as categories and types of data is shown in Table 2-2 and the abbreviation is detailed explained in the research

Table 2-2 Summary of model-based techniques

Sensorisation Validation Type Density Case U Model-based

Poulakis et al (2003) L(SE) ✓ (P,F) A (7/31, 7/50) SN M, S,

Wu et al (2010) L(SE) ✓ P 28/841 RN

Islam et al (2011) D+L(SE) (P,F) (5/27, ?/40) SN M, D

Li, Chu et al (2022) L P 20/491 RN

Daniel et al (2022) D-L(SE) ✓ (P,D) (33,82)/785 RN

Marzola et al (2022) D-L(SE) ✓ (P,D) (33,82)/785 RN

Wang et al (2022) D-L(SE) ✓ (P,D) (33,82)/785 RN

RN (RD) - Rajeswaran et al

FRAMEWORK STRUCTURING

Architecture of proposed framework

The research process includes 7 steps as follows:

- Step 1: Based on prior works identifying delay factors conducted by domestic and foreign authors as well as gathering perspectives from experts in the field of water supply in HCMC

Standardise data with variables after elimination

Adjust parametres for algorithm in Python

Load processed data to ML model for training execution

Analyse, compare, evaluate ML model and conclude

- Step 2: Collect opinions from experts in the field of water supply Generate a questionnaire to conduct a mass survey

- Step 3: Re-evaluate the questionnaire and conduct the survey The survey must be sent to experts employing in the same field Materials of this convenient sampling survey are delivered in hard copy (in person) and online version (google form)

- Step 4: After having the survey data, conduct SPSS analysis, calculate the average value, and check Cronbach's Alpha (α) reliability, eliminate inappropriate variables and rank the factors

- Step 5: Amass data by simulating provided parametres through a software platform to establish input database for the training

- Step 6: Process raw data, put it into SVM model using Python programming language to run

- Step 7: Evaluate the model and proceed to conclude.

Survey data compilation

Questionnaires play a weighty role in selecting data for scientific works thanks to their ability to provide accurate and diverse information from participants The accuracy of the data obtained depends enormously on how the questions are designed, which directly affects the results of the research project The structure of the questionnaire includes four main parts:

- Part 1: A brief introduction of the research topic, which provides interviewees with essential phenomenon for them to deeply comprehend the objectives and researcher

- Part 2: Amass basic information of correspondents, including questions about position, field of work, experience, education, to discontinue with those who are not suitable for the research content

- Part 3: Focuses on obtaining assessments on factors affecting schedule of RRUWPP, using a Likert scale with 5 levels from strongly disagree to strongly agree

- Part 4: Collect personal information of survey participants to serve research purposes solely

NGUYEN DUC PHUONG HUY - 2270185 42 3.2.2 Variables size and methodology of determination

“Sample” in statistics can be defined as a small part, representing a larger whole when it is selected from a population by diverse methods for research as long as this sample is the representative of the population to which it belongs The population here can include a group of people, details or other units related to the study’s objectives There are two main types of population identified, which are theoretical populations and accessible ones

- The theoretical totality includes all relevant subjects for study; it is usually larger and covers the accessible population For example, in study about student, the theoretical population includes all students

- An accessible population is a group of subjects that the researcher could approach and select for a sample For example, not all students can be reached due to their wide distribution, consequently, only a portion of this population belongs to the reachable population

Finally, the sampling frame is defined as the list of accessible populations used to choose the sample for the study This list needs to be comprehensive, complete, and updated regularly or periodically during professional work or compiled from units operating in the surveyed field

Table 3-1 Two sampling methods: probability and non-probability

NGUYEN DUC PHUONG HUY - 2270185 43 Samples selection method

Easy to implement and ensures objectivity In addition, can be flexibly integrated into complex probability sampling techniques

It is necessary to have a complete list of sample units available, which is not appropriate for samples of large or variable sizes Furthermore, the selected samples may be widely dispersed, making collection difficult Finally, there is a risk of missing certain groups within the population targeted by the research method

Fast implementation, high accuracy, and helps select target audience clearly High representativeness

Duplicates sometimes occur, affecting the representativeness of the sample

Remarkably accurate and representative, sample management is easier than a simple random sample

It is necessary to establish a sampling frame for each layer, which is often difficult to do in practice

Suitable for large range with high dispersion, low cost

The population must be large and less accurate or representative than a random sample

Effective in collecting primary data, saving costs and time, extremely flexible

Subjective, not completely representative, requires group- level information

NGUYEN DUC PHUONG HUY - 2270185 44 Samples selection method

Practical and convenient, samples are always available

Lack of representativeness, does not accurately reflect the overall research

Can be done when there is metric data describing the proportions of groups

Continuous data updates are needed to maintain accurate ratios

The most cost- and timesaving, suitable for anthropological research with a limited number of data sources

Errors from researcher assessment, low reliability, high bias, difficult to generalize

Appropriate when there is no available sampling frame

Sampling bias, failure to check who participated

Suitable for hard-to-reach markets or groups

May contain sampling errors and lack of representativeness

Suitable for in-depth research or experiential reference

Challenging to gather experts, requiring solid knowledge

Plays an important role in setting research orientations for the development of investigation

The interest group does not need to be large in number but needs to be highly representative, meaning they must reflect the characteristics and opinions of the larger population

Considering limited time and resources as well as obstacles reaching subjects, convenience sampling of non-probability method is chosen Although the determined technique may not provide an accurate representation of the population and has a high risk of bias, it is still very useful in specific situations, especially when the study is oriented towards those who are working in the field of water supply

Hair, Anderson, Tatham, and Black (1998) show that the number of observed samples in analysis must be greater than 100 and have a ratio of at least 5 to 1 to the variable, best in the ratio range of 5 to 1 and 10 to 1 Accordingly, the expected number of delivered sheets is 200 so that after eliminating invalid ballots, the number of responses gained must be at least 5 times that of observed variables; otherwise, the survey must be proceeded until the minimum amount is reached.

Data investigation technique

The process of analysing data is conducted through the following stages:

Formal research is quantitative study Through the determined sampling method, questionnaires were designed and sent to experts, specialists, engineers, and individuals with experience participating in or directly or indirectly related to the RRUWPP in many different forms such as direct interaction, email, social networks

In this study, the author uses exploratory factor analysis (EFA) technique, so the author determines the research sample size selected according to the rule of 5 observations per measured variable (Hair et al., 1998); thus, to meet such requirements, the minimum of 155 observations for 31 measured variables in this survey must be concentrated to be qualified

After receiving responses, the study is continued to be analysed with two targets, which include description of the nature of the data using descriptive statistics and checking the reliability of the scale using Cronbach's α via IBM SPSS 21 The assessment of the reliability and validity of the scale is performed using Cronbach's α and EFA reliability coefficient methods through SPSS processing software, using

NGUYEN DUC PHUONG HUY - 2270185 46 the Cronbach's α coefficient formula to filter, remove observed variables (junk variables) that do not meet reliability standards It is as follows:

Where is the average correlation coefficient among questions and N is the total number of questions Nguyen (2014) [0] and Hoang & Chu (2008) [0] suggest that in case the context explored is new or unfamiliar to the respondents, a Cronbach's α coefficient of 0,6 or higher is recommended to be acceptable and α  0,70 is of high reliability As a result, this study takes 0,7 as the minimum value for this coefficient

In addition, overall variable correlation coefficient (item-total correlation) is adopted, and variables containing this correlation less than 0,3 are removed

Table 3-2 Evaluation regarding specific value

1 Cronbach’s Alpha (> 0,95) Duplication in measurement

2 Cronbach’s Alpha from 0,8 - 0,95 Excellent trustworthiness

4 Cronbach’s Alpha from 0,6 Accepted scale

- Bartlett criterion for Test of Sphericity and Kaiser-Meyer-Olkin Measure of Sampling Adequacy (KMO) coefficient are used to evaluate the suitability of EFA Therefore, the hypothesis H0 whose variables are not correlated with each other in the population is rejected; besides, EFA is taken as long as 0,5 ≤ KMO ≤ 1 and Sig < 0,05

- Factor extraction standard includes Eigenvalues, which indicate the amount of variation explained by a factor, and Cumulative, which provides insight into how much attrition the factor analysis through the accumulation of variance explains Here, Eigenvalue is > 1 and is taken when Cumulative of variance is ≥ 50%

- Factor loadings illustrate the comparison between variables and factors and are taken as a tool to evaluate the meaning of EFA

+ Factor loading > 0,3 is minimum accepted

+ Factor loading > 0,5 is considered to bear practical value

- Additionally, if variables with factor loadings are extracted as different factors and the weight difference is so inconsiderable that there is no distinction in representing the factors, that variable is removed The rest variables are grouped by their corresponding elements extracted from the pattern matrix In contrast, if a variable is extracted as a variety of factors with low factor loading weights or tiny discrepancies in weights but makes a significant contribution to the key value of the concept being measured, the variable is not necessarily discharged (Nguyen, 2014) The sample size of the study is 163 allows the calculation of EFA, which is followed by that of confirmatory factor analysis (CFA) During checking Cronbach's α coefficient, the author plans to maintain scales with a Cronbach's α value of 0,6 or higher and remove observed variables with an overall variable correlation of less than 0,3 The principal axis factoring method with Promax rotation is applied as well as the elimination of observed variables with factor loadings less than or equal to 0,5 or extraction other factors where the difference in factor loading weights between factors is less than or equal to 0,3

3.3.4 Linear regression analysis in statistics

The linear regression analysis process can be proceeded through the following steps:

- Step 1: Verify the correlation between independent and dependent variables through the correlation coefficient matrix, of which the condition for regression analysis is that there must be a correlation between these two categories of variables However, according to John and Benet (2000), discriminant validity between variables can be secured when the correlation coefficient is < 0,85 In other words, if the correlation coefficient is greater than 0,85, multicollinearity (can one independent

NGUYEN DUC PHUONG HUY - 2270185 48 variable be explained by another variable) may occur, thus, considering the role of the independent variable is essential

- Step 2: Establishing and scrutinising the regression model are accomplished in the following procedure:

𝑌 = 𝛽 1 𝑋 1 + 𝛽 2 𝑋 2 + 𝛽 3 𝑋 3 + 𝛽 4 𝑋 4 + ⋯ + 𝛽 𝑘 𝑋 𝑘 + Selection of variables for the model

+ The coefficient of determination (R 2 ) to assess the fit of the model: Although

R 2 has the property of increasing as more independent variables are added to the model and the more independent variables a model has, the data set cannot be guaranteed to be fit Accordingly, adjusted R 2 does not depend on the number of variables added to the model and is used to replace the multiple regression R 2 to assess the suitability of the model

+ To appraise the appropriateness of the model and finalise an optimal one, the ANOVA analysis method is operated to inspect hypothesis H0 If the Sig witnesses a slight value of < 0,05, hypothesis H0 is rejected The elimination leads to the conclusion that the set of independent variables in the model preserves the ability to explain the variation in the dependent variables, which means that the built model is suitable for the data set and can therefore be used

+ Determination the coefficients of a regression equation which are partial regression coefficients that measure the average change in the dependent variable when the independent variable changes by one unit and the other independent variables remain stable However, since the size of the variable varies depending on the measurement unit of the independent variable, the direct comparison seems to be meaningless Therefore, to correlate regression coefficients to clarify the importance (explanation level) of an independent variable to a dependent variable, the measured values of all independent variables must be expressed in standard Beta weight

- Step 3: Inspect for violations of regression hypothesises

A regression model is appropriate for the overall study if the assumptions are not violated Therefore, after finishing the regression equation, some presumptions must be satisfied:

+ A linear relationship between the independent and dependent variables exist in the survey

+ The residuals of the dependent variable follow a normal distribution

+ The variance of the error remains unchanged

+ No correlation between residuals (error independence)

+ No correlation between independent variables (no multicollinearity)

+ A hypothesis of a linear relationship can be inspected by Scatterplot of Standardised Residuals, which shows the correlation between the Standardised Residuals and the Standardised Predicted Values

+ The histogram frequency plot, or P-Plot frequency plot plays the role of checking the assumption if the residuals bear a normal distribution

+ Noticeable statistical means to check the error presumption of a dependent variable with constant variance are a scatterplot of residuals and predicted values or Spearman's rank correlation coefficient test

+ A tool run to check the assumption that no correlation between residuals occurs is the Durbin - Watson statistic or a scatterplot of standardised residuals whilst by adopting the value of Variance Inflation Factor (VIF) being more than 10, the presence of multicollinearity can be detected (Hoang & Chu, 2008)

The tools used are independent - samples T-test, analysis of variance (ANOVA) or KRUSKAL - WALLIS test, in which the independent - samples T-test is applied in case a demographic factor contains two attributes, take gender as an example, male and female are the two characteristics, to divide the entire study sample into two isolated master groups.

Model operation

SVR is a version of SVM for regression problems and effective for repeated forecasting, namely the anticipating of leak points in WDS Instead of finding a hyperplane to segment the data, SVR tries to find the hyperplane that would result in

NGUYEN DUC PHUONG HUY - 2270185 50 the smallest acceptable error on the actual data points Besides, SVR helps generalise from training data to unknown data, minimising the risk of overfitting

Consolidation of algorithms in research helps leverage the strengths of each method, ranging from performance, resistance to overfitting to interpretability, which provides a comprehensive and powerful approach to leakage forecasting After being processed, the data is put into training model Python has become one of the most widely practiced programming languages in ML model development thanks to various favours:

- Different libraries: Python has prosperous libraries and frameworks for ML such as TensorFlow, PyTorch, Scikit-learn, Pandas, NumPy, and Matplotlib These libraries provide powerful and easy-to-use tools for data processing, deployment, model building and evaluation

- Huge and supportive community: The language contains a society from beginners to top experts in the field whose reinforcement includes numerous learning resources, tutorials, and discussion forums for problem solutions and knowledge sharing

- Uncomplicated and accessible: Python is known for being lean and approachable syntax, bringing beginners opportunities for learning programming due to the deduction in complexity in ML model establishing

- Flexible and versatile: By integrating Python with other systems and programming languages, the flexibility to serve the aims of scientific works and practical development is enhanced

- Good performance: Python is not the fastest programming language; nonetheless, adopting the language provides adequate achievement for most machine learning demands For tasks that require higher performance, Python can use libraries written in C/C++ or be integrated with other languages

The combination of simplicity, strong ecosystem, and large community turns Python into an ideal solution for both academic and pragmatic applications in ML and data science.

Model assessment

Interpretating an ML model mostly requires a diversity of criteria to get insights into its performance, in which three popular indices are root mean square error (RMSE), mean absolute error (MAE), and Coefficient of determination (R²)

𝑛∑ 𝑛 𝑖=1 (𝑦 𝑖 − 𝑦 𝑖 ′ ) 2 RMSE is the square root of the average of the squared errors (MSE) The error here is the difference between the predicted value (yi’) and the actual value (yi) RMSE calculates the error by squaring each error; therefore, large errors whichever is larger in absolute value will have a greater impact on the value of RMSE This turns RMSE into an index sensitive to outliers and large errors Mathematically, RMSE is commonly used in anticipating models with continuous data and is a crucial metric to evaluate model quality A low RMSE indicates that the error between the predicted and actual values is small and the lower RMSE is, a better prediction quality of the model it indicates

𝑛∑ 𝑛 𝑖=1 |𝑦 𝑖 − 𝑦 𝑖 ′ | MAE is the average of the absolute value of the error; and it measures the difference between the predicted and actual values Unlike RMSE, MAE does not distinguish between large and small errors because it does not square the errors before averaging, which makes MAE less sensitive to outliers or very large errors than RMSE Additionally, the MAE value is easy to be understood and interpreted, as it is simply the average degree of deviation The units of MAE are identical to those of the original data, making interpretation intuitive A low MAE value indicates that the model has higher prediction accuracy Nonetheless, by not distinguishing between

NGUYEN DUC PHUONG HUY - 2270185 52 error types, a model with low MAE can still have very biased individual predictions RMSE and MAE are both error metrics, but RMSE is more sensitive to large errors

∑ 𝑛 𝑖=1 (𝑦 𝑖 − 𝑦 ′ ) 2 Where R 2 is the percentage of variation in the dependent variable explained by the model and measures the model’s ability to explain variation in the data while y’ is the mean value from yi The higher the R², the better the model is at explaining the variation in the data That an R² value close to 1 indicates that the model has high predictive accuracy However, R² can be misleading if the model is overfitting

DATA PROCESSING

Surveying

The field of infrastructure construction undergoes continuous updates over time, reflecting the evolution of technology, merging technical demands and legal regulations from the state In almost every country, the reasons making construction projects behind schedule seem to be diversified from nations to nations For example, Motaleb and Kishk (2013), by their study, prove that time overrun ranks the first position of delays regarding numerous causes In this study, the author synthesizes influencing factors from scientific works and finalises contributing factors based on the opinions of experts in the water supply industry in HCMC to have a more thorough understanding about the roots of delays in this municipality Some variables are suggested by the experts thanks to their experience with delayed projects These factors are classified into 8 groups and evaluated through a survey process The factors are shown in Table 4-1:

Table 4-1 Factors affecting the delay of RRUWPP in HCMC

1.1 Shortage of supplies in market VT1 [1], [3], [14], [20], [28]

1.3 Damage of sorted supplies VT3 [5], [14]

1.4 Procurement process of supplies prior to installation VT4 [5], [6], Personal experience

II Design related factors TK

2.1 Delay of design submission TK1 [10], [28], Experts

2.2 Design change during construction TK2 [6], [14], [24], [28], [29],

2.3 Unclear/Defective design details TK3 [1], [3], [15], [20], [26], [28],

2.4 Unrealistic project scheduling and estimation TK4 [3], [10], [15], [19], [20],

3.4 Accidents or unexpected events during construction MT4 [14], [17], Expert

IV Finance related factors TC

4.1 (Annual) budgeting from investor TC1 [7], [17], [30], [31], [32],

4.3 Slow payment progress from investor TC3 [1], [6], [7], [13], [14], [17],

[29], Personal experience 5.2 Shortage of labour NL2 [12], [14], [18], [21], [25]

5.3 Low productivity human resources NL3 [1], [12], [14], [17], [18],

VI Equipment related factors TB

6.1 Shortage of essential equipment TB1 [1], [7], [14], [18], [25]

Equipment differs from bidding documents

VII Regulation related factors PL

Complicated and heavy procedures (such as from local authorities, investor, and consultants)

7.2 Lack of specialised planning by government entities PL2 [4], Personal experience,

7.3 Changes in governmental laws and regulations PL3 [3], [10], [14], [21]

7.4 Conflicts/Dispute related to contracts/projects PL4 [3], [7], [14], [15]

8.2 Hand-over process for construction K2 [8], [26]

8.3 Lack of communication among stakeholders K3 [7], [18], [22], [33], Personal experience

8.4 High pressure of current network at project site K4 [6], [8], [12], [18], [24], [26] 8.5 Diametre of pipelines installed K5 [5], [20], Experts

8.6 Insufficient supervision (from any of stakeholders) K6 [6], [19], Experts

IX Schedule overrun of the project due to surveyed factors TD

9.1 Delay less than 3 months TD1 Experts

9.2 Delay from 3 months to 6 months TD2 Experts

9.3 Delay from 6 months to 1 year TD3 Experts

9.4 Delay more than 1 year TD4 Experts

The variables observed in this study are all evaluated based on 5-point Likert scale surveying respondents’ opinions from strongly disagree to strongly agree (from

1 to 5) For classifying survey participants, the author applies a nominal scale to clearly determine the difference between them The target interviewees of the research are professionals, state agency specialists, engineers, and individuals obtaining experience working in the water supply industry In this survey, there are

31 observations; hence, the minimum number of qualified responses must be 155 (ratio of 5 to 1) To process data after collecting data, the author delivers 190 sheets via online platform and personal interaction and receives 163 after eliminating imperfect responses, which still meet the minimum requirement of observations The disqualification is because of lack of selection or information input, multiple answer for a question and other reasons such as improper answerers, and the same selection for all questions Accordingly, the frequency of interviewee joining the survey can be seen in APPENDIX 2: SPSS STATISTICAL RESULT

Figure 4-1 The percentage of male and female interviewee

The total valid responses of 163 includes 29,5% of respondents are female while the percentage of male reaches 70,5% (see Figure 4-1), among which the age groups between 35 and 45 years old and between 45 and 60 years old account for 19,0%, and 7,4% respectively besides approximately 73,6% those who are younger than 35 years old (see Figure 4-2)

Figure 4-2 Age group of participants

High schoolCollege/VocationUndergraduatePostgraduate

Figure 4-3 Pie chart illustrates the information of education

For Education in Figure 4-3, all interviewed individuals graduate from high school which have qualifications and awareness of the field in the study Roughly 80% of participants are college or vocation education or higher graduates, which ensures the quality of the respondents of the survey

In the perspective of experience, Figure 4-4 illustrates the years which are the length of time interviewee is working in the field of the surveyed subject Overall, the largest percentage of nearly three-tenths belong to the group of less than 3 years whilst the least proportion is 17,2% of correspondents are more than 10-year experienced Despite small number, this figure is kept as the opinions and perspectives from this group is highly valuable The two middle involving groups are from 3 - 5 years and from 5 -10 years take more than a half of 163 participants

Less than 3 yearFrom 3 - 5From 5 - 10Above 10 years

In the pie chart of position (see Figure 4-5), the distribution between positions is uneven, with which staff position overwhelmingly accounts for 73,0%, ranking the first position and the second place belong to group of Team leader with roughly a quarter On the other hand, the remaining category of Director witnesses the proportion of 3,1% indicating the low popularity or scarcity of the title among the groups studied; Nevertheless, the opinions of these experts are very important thanks to enduring professional exposure and deep knowledge to water supply industry.

Cronbach’s alpha

- Supplies related factors (VT) scale receives an acceptable Cronbach's α coefficient of 0,882, additionally, the elimination of any of the item in the group causes the decrease in value All variables are retained, as the overall variable correlation coefficient is more than 0,3, which ensures the reliability of the scale

Table 4-2 Cronbach’s alpha for group of supplies-related factors

Scale mean if item deleted

Scale variance if item deleted

NGUYEN DUC PHUONG HUY - 2270185 60 item deleted Supplies related factors, Cronbach’s α = 0,882

1.1 Shortage of supplies in market VT1 11,82 8,518 ,779 ,835

1.3 Damage of sorted supplies VT3 11,80 8,212 ,734 ,853

Procurement process of supplies prior to installation

- Design related factors (TK) scale holds an acceptable Cronbach's α coefficient of 0,885, additionally, all variables are retained, as the overall variable correlation coefficient is higher than 0,3, which ensures the reliability of the scale

Table 4-3 Cronbach’s alpha for group of design-related factors

Scale mean if item deleted

Scale variance if item deleted

Cron- bach’s α if item deleted Design related factor, Cronbach’s α = 0,885

2.1 Delay of design submission TK1 11,72 9,090 ,751 ,853

2.3 Unclear/Defective design details TK3 11,65 9,945 ,731 ,860

Unrealistic project scheduling and estimation

- The scale of external factor carries Cronbach's α coefficient of 0,848, and the overall variable correlation coefficient is greater than 0,3, which ensures the reliability of the scale

Table 4-4 Cronbach’s alpha for group of external factors

Scale mean if item deleted

Scale variance if item deleted

Cron- bach’s α if item deleted External factors, Cronbach’s α = 0,848

Accidents or unexpected events during construction

- For the finance related factors’ reliability, the Cronbach’s α and the Item-Total correlation are 0,757 and bigger than 0,3 respectively

Table 4-5 Cronbach’s alpha for group of finance-related factors

Scale mean if item deleted

Scale variance if item deleted

Cron- bach’s α if item deleted

NGUYEN DUC PHUONG HUY - 2270185 62 Finance related factors, Cronbach’s α = 0,757

4.1 (Annual) budgeting from investor TC1 7,73 3,519 ,550 ,716

Slow payment progress from investor

- For the labour related factors, the Cronbach’s α and the Item-Total correlation for labour related scale are 0,781 and larger than 0,3 accordingly

Table 4-6 Cronbach’s alpha for group of labour-related factors

Scale mean if item deleted

Scale variance if item deleted

Cron- bach’s α if item deleted Labour related factors, Cronbach’s α = 0,781

5.3 Low productivity human resources NL3 7,90 3,521 ,604 ,720

- For the equipment related factors, the Cronbach’s α and the Item-Total correlation for equipment related scale are 0,800 and surpasses 0,3 accordingly

Table 4-7 Cronbach’s alpha for group of equipment-related factors

Scale mean if item deleted

Scale variance if item deleted

Cron- bach’s α if item deleted Equipment related factors, Cronbach’s α = 0,800

Equipment differs from bidding documents

- For the regulation related factors, the Cronbach’s α and the Item-Total correlation for regulation related scale are 0,859 and outweighs 0,3 accordingly

Table 4-8 Cronbach’s alpha for group of regulation-related factors

Scale mean if item deleted

Scale variance if item deleted

Cron- bach’s α if item deleted Regulation related factors, Cronbach’s α = 0,859

(such as from local authorities, investor, and consultants)

NGUYEN DUC PHUONG HUY - 2270185 64 planning by government entities

Changes in governmental laws and regulations

Conflicts/Dispute regarding contracts/projects

- For other factors group, the Cronbach’s α and the Item-Total correlation for other group scale are 0,891 and superior to 0,3 accordingly

Table 4-9 Cronbach’s alpha for group of other factors

Scale mean if item deleted

Scale variance if item deleted

Cron- bach’s α if item deleted Other factors group, Cronbach’s α = 0,891

Hand-over process for construction

Lack of communication among stakeholders

High pressure of current network at project site

- For the dependent variables of overrun schedule group, the Cronbach’s α and the Item-Total correlation for other group scale are 0,903 and higher than 0,3

Table 4-10 Cronbach’s alpha for dependent variable of overrun schedule

Scale mean if item deleted

Scale variance if item deleted

Cron- bach’s α if item deleted Equipment related factors, Cronbach’s α = 0,903

9.1 Delay less than 3 months TD1 11,43 10,271 ,780 ,876

Delay from 3 months to 6 months

9.3 Delay from 6 months to 1 year TD3 11,17 10,835 ,793 ,871

Factor analysis

Factor analysis is often exploited to check the one-dimensionality of measurement scales during the process of constructing scales to measure various aspects of a research concept (Hoang & Chu, 2008) ; hence, the activity not only helps reduce a huge quantity to a relatively small number of variables but also supports to determine the cohesion or reliability of variables on the same scale Before conducting the study, the expectation is that 31 independent variables are concentrated in 8 different groups and the determination, as like other statistical analysis methods, is if the method is appropriate This check is performed by calculating Bartlett’s Test of Sphericity and Kaiser-Meyer-Olkin Measure of Sampling Adequacy In this case, the values of KMO are quite large, reaching 0,741 and Sig of Bartlett's test is less than 1/1000, which shows that these 31 independent variables are correlated with each other and are fully suitable for factor analysis

Table 4-11 KMO and Barlett’s Test of 31 independent variables

Kaiser-Meyer-Olkin Measure of Sampling

In this study, the principal component analysis method is utilized for extraction with the extracted value of Eigenvalue bigger than 1, which means that components whose Eigenvalue is larger than 1 are kept in the model to be evaluated The result table demonstrate that 8 out of 31 surveyed variables possess the Eigenvalue of more than 1 and therefore are retained to be continued in research The output of EFA inspection proves that the Eigenvalue for 8 extracted factors is 4,178 and the Cumulative of Variance is 71,590% while the KMO measure of sampling adequacy is of 0,741; as a result, factor analysis is suitable, and the AVE is qualified due to the bigger amount than 50% By adopting extraction method of Principal Component

Analysis and rotation method of Varimax with Kaiser Normalisation, the factors converge to 8 main components

Table 4-12 Eigenvalue of the 8 components

Initial Eigenvalues** Extraction Sums of

Rotation Sums of Squared Loadings

**: The table shows only components whose Eigenvalue higher than 1

Additionally, to ensure the reliability and cohesion of factors, investigation of the dependent variables must be performed and according to the theory of the survey, the output guide to a result that is the impacts on RRUWPP by studied risks EFA investigation of dependent variable of “Progress” (dependent variable) depicts the Eigenvalue is 3,104, Cumulative of Variance is 77,595% and KMO is at 0,841 of a sole component; thus, the analysis is suitable because the cumulative % is higher than 50% demanded variance extracted

Table 4-13 KMO and Barlett’s Test of 31 dependent variable

Kaiser-Meyer-Olkin Measure of Sampling

Table 4-14 Eigenvalue of the dependent variable

Initial Eigenvalues** Extraction Sums of

Rotation Sums of Squared Loadings

**: The table shows only components whose Eigenvalue higher than 1

Table 4-15 Analysis of item “Progress”

The calculation is conducted to verify the cohesion and reliability of independent variables and the result is as follows:

Table 4-16 Conducted calculation of independent variables

No Group factor KMO Eigenvalue AVE

Factors convergence

Moreover, 74,10% can be inclined that the variation percentage is quite high in factor analysis (see Table 4-11) In the perspective of component matrix, factor loading coefficients do not lead to conclusion if a factor can be explained by a certain variable, and the factors must be thereby rotated The rotation method chosen is the Varimax procedure, which heighten factor explanatory power by rotating factors to minimise the number of variables with large coefficients on the same factor After rotation, variables whose factor loadings more than 0,5 remained in the research to be used to explain a particular factor

Table 4-17 Rotated Component Matrix of EFA analysis

Linear regression analysis

Table 4-18 depicts the correlation among the independent ones; take variable

PL as an example, the correlation coefficient of the variable supplies (VT) for the project is the smallest at 0,270 while that of the design (TK) variable is the highest at 0,581

PL VT TK MT TC NL TB K

** Correlation is significant at the 0,01 level (2-tailed)

* Correlation is significant at the 0,05 level (2-tailed)

NGUYEN DUC PHUONG HUY - 2270185 72 4.5.2 Inspection for Multicollinearity

The highest value of VIF is equal to 1,374, which can be inclines that the variables are separate to each other and therefore, the statistics does not witness Multicollinearity (see Table 4-21 Outcome of linear regression) As a result of checking the assumptions of the linear regression model, the assumptions are not found to be violated Therefore, the results of the regression model are reliable

Table 4-19 Model summary for regression coefficient inspection

Model R R 2 Adjusted R 2 Standard error of the estimate

In which: a Predictors: (Constant), TB, MT, VT, TC, NL, TK, PL, K b Dependent Variable: TD

On the purpose of seeing if the model is suitable, the coefficients of determination R 2 and adjusted R 2 are calculated Based on the calculation, the outcome of linear regression proves the values of the former parametre is 0,600 and that of the latter is 0,579 The percentage of 57,9% of adjusted R 2 tells readers that the changes in the variable ‘progress’ are explained by eight independent variables above

Table 4-20 Analysis of variance ANOVA in regression investigation

Squares df Mean Square F Sig

In which: a Dependent Variable: TD b Predictors: (Constant), TB, MT, VT, TC, NL, TK, PL, K

ANOVA test (see Table 4-20) proves that F statistics at the meaning value of Sig = 0,000 b in the regression model built is suitable with the surveyed response data, which can be considered that the model delivers trustworthy and practical value Besides, the output in Table 4-21 illustrate that all 8 groups of factors of the proposed research from random sampling make influence on the progress of RRUWPP in HCMC when the Sig calculated shown is less than 0,05; therefore, in accordance with the standardised regression coefficients, the position of the level of impact on RRUWPP by a specific factor group is determined Since all independent variables are measured using a Likert scale (bearing the same units), this regression equation also shows the importance of each factor to the mentioned progress Among them, finance related factors have the greatest influence, followed by design and the remaining risks The equation of regression therefore is TD = 0,305TC + 0,278TK + 0,211TB + 0,190NL + 0,150MT + 0,135VT + 0,133 PL + 0,130K

Table 4-21 Outcome of linear regression

Table 4-21 provides information that all 8 group factors have impact on the process of RRUWPP based on regression coefficients and meaning levels, precisely: Assumption H1 is for “Supplies related factors affect the schedule of RRUWPP” and the outputs for this research presumption show the connection with the progress variable, whose the regression coefficient β1 = 0,135 at the meaning of Sig = 0,027 and the rest 7 assumptions are also accepted, which are H2 for design related factors with β2 = 0,305 and Sig = 0,000; H3 for external factors with β3 = 0,150 and Sig 0,016; H4 for finance related risks with β4 = 0,278 and Sig = 0,000; H5 for labour related risks with β5 = 0,190 and Sig = 0,004; H6 for equipment related group with β6

= 0,211 and Sig = 0,001; H7 for regulation related factors with β7 = 0,211 and Sig 0,001; and H8 for other risks with β8 = 0,211 and Sig = 0,001 As finance-related factors possess the most remarkable impacts on the progress of the RRUWPP, a comprehensive understanding of the mentioned group must be proceeded; additionally, for pragmatic practices, water supply enterprises must be well-prepared and organised for the future projects and allocating means adequately and efficiently for projects of considerable necessity

Table 4-22 Group statistics based on gender

Gender N Mean SPL Deviation SPL Error Mean

Table 4-23 ANOVA test for difference between gender and progress

Levene's Test for Equality of Variances

Table 4-23 proves a typical investigation that the difference in gender does not have influences on the evaluation of RRUWPP (Sig = 0,110 > 0,05) and Table 4-24 provides the outputs demonstrating that the difference does not occurs in the variables

NGUYEN DUC PHUONG HUY - 2270185 75 of Position, Experience and Gender (Sig values are higher than 0,05) while that of Education and Age happen

Table 4-24 ANOVA test for difference

Sum of Squares df Mean

Model training

The data is based on an actual WSN being managed by a water supply company in HCMC and the dataset is comprised of 10 different categories impacting the WDS among which some are provided while some are processed via a modelling software called WaterGEMS from ArcGIS’ actual database In this study, 10 factors affecting the possibility of water leakage associated with roughly 32.940 junctions of about 80 district metred areas are collected from various sources, as shown in Table 1, which

NGUYEN DUC PHUONG HUY - 2270185 76 shows the main causes of water leakage in the current WSN to build a relevant-sized matrix (Nguyễn et al., 2022) [0] The

Table 4-25 Factors that affect the risk of leakage

No Factors Abbr Unit Symbol

7 Mean population which a junction provides C People X7

10 Leaked quantity (dependent variable) A m 3 /day Y

WaterGEMS, developed by Bentley, is a software application used for hydraulic modelling, simulating water quality in WDSs with advanced interactive capabilities, building geospatial models and integrating combination of management tools WaterGEMS provides an approachable working environment that allows users to analyse, design, and optimise WSN; and brings diverse assistances to users such as localise leaks, analyse hydraulic calculations and forecast networks in different scenarios To complete the process of anticipating leakage using SVM at high veracity, the mentioned software is utilised as it is challenging to gather actual data due to couples of reasons The dataset, which is a matrix of 9 independent variables (items) and 32.940 rows (junctions), is divided randomly into one training dataset and a test dataset of a 80 to 20 ratio after being standardised; moreover, since the output prediction data is continuous values, the machine learning model chosen to run is the support vector regression using JupyterLab to be conducted by Python programming language

Figure 4-6 The distribution system in an area in Ho Chi Minh City

A correlation matrix is a squared (the number of rows equals the numbers of columns), symmetric (the matrix is equal to its transpose), with all the principal ਴ Transmission pipelines ਴ Distribution pipelines

NGUYEN DUC PHUONG HUY - 2270185 78 diagonal elements equal to 1 and semidefinite positive (all its eigenvalues are non- negative) matrix The correlation matrix is a very useful statistical technique by which a better understanding of data set and the summary of studied data are obtained; this technique can help select which data and features will be most impactful Moreover, the matrix contains indexes (from -1 to 1) that shows the linear relationship between two random variables X and Y, where:

- Minus 1 (-1) means that the 2 variables have an inverse linear relationship: when X increases, Y decreases

- 0 means no linear correlation between X and Y

- 1 means that the 2 variables have a linear relationship: when X increases, Y increases too

The r coefficient can be determined through several steps Due to the number of

9 independent variables, the process shall be proceeded 9 times to finalise the correlation coefficients; however, the data set provided include huge amount of information so the process can be advanced by JupiterLab using Python language

- Step 1 starts renaming the variables to “x” and “y” and find the sums of x and y

- Following that, calculating x 2 and y 2 and their sums to create two new columns that contain the squares of x and y and the sums of the columns in this case must be taken

- Step 3: Calculate the cross product and the sum by multiplying together x and y (this is called the cross product)

- Eventually, the values of r coefficient are determined through the formula:

Figure 4-7 Correlation Matrix among variables

The correlation matrix displayed under a form a heat map in Figure 4-7 offers insights into the relationship among variables G, H, I, K, F, B, C, D, E, A corresponding to the criteria given in Table 4-25, which visually show the relations of the variables by correlation coefficient between the variables in that row and column More specifically, correlation coefficients close to 1 or -1 indicate a strong positive or negative correlation, respectively, while those close to 0 indicate little or no linear relationship Many duo sets undergoing high coefficients ranging from 0,85 to 1,0 proves a trend that once an item changes, the other variable highly probably endures a transparent phenomenon; in contrast, the low figures such as in the couple of A - E, G - D, and H - D, make the variables less or weakly dependent on each other when a duo is to be investigated The issue may result in some positive conditions:

Firstly, the risk of overfitting is reduced thanks to low correlation coefficients with each other of independent variables because overfitting occurs when a model learns noisy patterns or unrepresentative features in the training data, which leads to poor performance in predicting new data Moreover, a low correlation coefficient between variables contributes to prevent multicollinearity problems in the model, which is a situation where the independent variables are highly correlated with each other, making it problematic to determine the separate effects of each variable on the target variable Independence of features can be clarified by the analysis of correlation coefficient between variables In some cases, this can be advantageous because each feature provides unique information independent of the other features

RESULT

Calculation results of Support Vector Regression model

Theoretically, the support vector regression is inspired by the support vector machine algorithm for binary response variables SVM analysis is a popular machine learning tool for classification and regression; moreover, SVM regression is considered a nonparametric technique because it relies on kernel functions The main idea of the algorithm consists of only using residuals smaller in absolute value than some constant Two sets of points are defined as in binary classification: those falling inside the recognition area, which are the predicted function and thus not penalised, and those falling outside, which are penalised based on their distance from the predicted function, in a way that is like the penalisation used by SVMs in classification Due to the fact that the idea behind support vector regression (SVR) is very similar to SVM, which consists of finding a well-fitting hyperplane in a kernel- induced feature space that will have good generalisation performance using the original features, the outcome is of support vector regression The process of model training can be briefed as following:

Weight values Parametres of SVR

Principle of Support Vector Regression

NGUYEN DUC PHUONG HUY - 2270185 82 Data pre-processing Model training Model evaluation

- Support vector kernel + Linear + Polynomial + Sigmoid + RBF

- Mean population which a junction provides

Figure 5-1 Process of model training and principle of SVR

After the algorithm is run, the result is synthesised in Table 5-1 showing that Support vector regression model possesses R-squared (R 2 ) of approximately 0,926 and RMSE is nearly at 0,11, which means that the prediction method using the support vector-based regression analysis adopting set input data to model the correlation and impact of factors affecting water network leakage achieves high efficiency (see in Table 5-1) While both MSE and RMSE are sensitive to large errors due to the squaring of the residuals, MSE tends to be more sensitive This is because

NGUYEN DUC PHUONG HUY - 2270185 83 the squaring of errors before averaging, followed by taking the square root, magnifies the impact of larger errors more than smaller ones In MSE and RMSE, errors are squared before they are averaged, which gives a disproportionately large weight to large errors (outliers) This can skew the overall error metric if your data has many outliers or is highly variable MAE, by taking the absolute value of errors, treats all deviations from the true values equally, providing a more robust error metric such scenario In some cases, a variety of algorithms can be operated such as Random Forest Regression, Extreme Gradient Boosting Regression, and Light Gradient Boosting Regression to finalise the comparison of which the optimal solution can be identified; moreover, the investigated results shown in Table 5-1 will fluctuate if the division of data into two above mentioned datasets changes, i.e a 70 to 30 ratio More importantly, due to the importance of the leak prediction production, Akaike Information Criterion and Bayesian Information Criterion are adopted in some SVR models to best double-check anticipation capabilities of a certain model Additionally, most of the time, the result from training and models may be affected by the quality of the raw input data, the model and sometimes the targeted and predicted variables

Table 5-1 Results of SVR model

Model MSE MAE RMSE R-squared

Linear Regression 0,000016 0,003293 0,004050 0,999892 Support Vector Regression 0,011306 0,085139 0,106330 0,925584

Analysis between two datasets in kernel function

Although Mean Absolute Error and Root Mean Squared Error are not tremendously high, a slight difference between the predicted and actual values happens; moreover, a considerable value of R Squared (R²) indicates that the model explains a large portion of the variation in the actual leakage According to the analysis, the SVR model establishes reliable anticipations with remarkable veracity, despite inconsiderable imperfections; hence, the acts of fine-tuning the model or checking the quality and completeness of the input data are imperative Specifically, modest differences of kernel function between training and test sets’ existence are

NGUYEN DUC PHUONG HUY - 2270185 84 shown in Table 5-2 The variation in accuracy and distribution between the two data sets depicts that the train set of Polynomial and Radial Basis Function Kernel obtaining higher accuracy while the opposite trend can be detected in the Linear Kernel For further analysis, several charts can be built, such as boxplots and scatterplots, to study whether a strong linear relationship between the actual and predicted leakages in the network remains or not; besides, the description of tendency of data points’ distribution to determine the model’s ability to more accurately predict remarkably leaky WDSs

Table 5-2 Difference of kernel function in two sets

Function RMSE MAE R² RMSE MAE R²

Figure 5-2 Predict Model with testing dataset

Figure 5-3 Predict Model with training dataset

The model contributes to also identify, analyse, evaluate, and describe the impact of factors affecting the leakage-causing risks on an anonymous water supply network through 9 key influencing factors Although the results of predicting the number of leakage points in the water supply network through this research have not yet reached a high level, the results of the study show that machine learning models have great potential in the various fields Firstly, the ability to help identify leaks, prioritise investments in upgrades and repairs, and effectively manage water loss Moreover, the forecast result can be useful information to support current water supply companies in monitoring, management, operation, and quality improvement of water supply services

CONCLUSION

Conclusion

The survey regarding factors affecting the implementation time of RRUWPP obtained by processing and analysing collected data generate inferred statistical results prior to that the data are carefully screened, and coded Determining Cronbach's alpha coefficient and factor analysis helps confirm that the eights factors in the original scale are reliable in measuring factors impacting on the RRUWPP; moreover, linear regression analysis combining with Ordinary Least Squares methodology are also performed to achieve a linear regression equation as well as the strength of the influence of factors on progress The output shows that finance related factors contribute the greatest impacts on the process of RRUWPP, which encourages the stakeholders to be well-prepared such as the investor must concentrate capital source solely on projects in need while avoid mass investment and contractors participate only in projects that financially matched with their capacity

Based on the analysis results in Chapter 5, the Support Vector Regression model shows high prediction accuracy and good generalisation intelligence on both the training and test sets, which is convinced by evaluation indicators such as MSE, RMSE, MAE, and R² Some differences existing between the predicted and actual values, especially in the training set, are neither too large nor too tiny in the overall context while the variability of predictions and the number of notable outliers in the training set require deeper examination In summary, on the purpose of proper management of RRUWPP, timely and adequate investments in the construction of pipelines with a high risk of leakage regardless of the age of installation through leakages forecast or detection based on a model integrating the above SVM model with hypothetical future scenarios from the WaterGEMS hydraulic model can be an optimal solution.

Values of research

Theoretically, the dissertation results contribute to demonstrating the appropriateness of applying statistical methods and operating machine learning in

NGUYEN DUC PHUONG HUY - 2270185 87 project management and water supply networks to appropriately allocate resources to Repairing, renovating, upgrading water pipeline projects in Ho Chi Minh City More specifically, this study built a model to predict potential leakage points in the water distribution system through actual data sources processed by a hydraulic simulation software, which is expected to boost the enhancement of the operational capabilities and saving resources of water supply companies

The thesis contributes to the field of water supply by forecasting leakages in the water distribution system by using an advanced machine learning technique, which is SVR, thanks to the demonstration showing that the model can generate accurate and highly reliable predictions as a primary foundation for better and more efficient financial decision-making and risk management ML-oriented trend in the study is expected to help investors and clients arrange the capital funds for social service projects more effectively Furthermore, this study provides insight into the applicability of machine learning in complex data analysis, which is invaluable in today’s era of big data.

Study’s limitation

Despite encouraging outcome in the model training, some limitations prevail For quantitative survey investigation, sampling method has certain drawbacks that further surveys may be required for further understandings while data collection sometimes witnesses the carelessness and bias from correspondents In the perspective of ML, model is trained and tested on specific datasets that are not representative of all fields due to some reasons such as the willingness to provide actual numbers (instead of simulated one) and the challenges in approaching different types of parametres to decide the orientation for the training In practice, the variety of objective causes limits the generalisation ability of the model and real and sufficient data always requires careful and accurate recording Thus, the model evaluation concentrating on the simulated or reported numbers while ignoring actual ones can be of inadequate Secondly, outliers and noise are among the obstacles because models do not handle noise and outliers well, which results in prediction

NGUYEN DUC PHUONG HUY - 2270185 88 errors Furthermore, research methodology is not diversified considering sole application of a machine learning method without comparing the performance to other techniques or combining multiple models to increase accuracy

The above-mentioned limitations open opportunities for further observations and analysis to foster and broaden the horizon in this field while lifting hands-on application of machine learning-based prediction models.

Future perspectives

On the purpose of boosting demonstration and veracity in the SVR model, several study paths are suggested Firstly, hyperparametre tuning and cross-validation are employed to identify the most optimal set of parametres, while carefully reviewing the quality and completeness of input data and implementing model improvement methods such as finding and processing outliers and conflicts in the data Additionally, in-depth study of outliers and data points that models do not accurately predict are necessary to acknowledge the reasons and clarify directions to minimise them Finally, the model can be applied in a variety of actual projects to test the accuracy and generalisability in a wider range of scenarios Additionally, using new machine learning or deep learning techniques to compare the effectiveness with the current SVR model is a proposed solution By pursuing this research, the leak location prediction reliability and efficiency of the SVR model are rocketed while expanding the applicability in water supply and construction management

Amditis, A., Bimpas, M., & Uzunoglu, N (2010) Detection of water leaks in supply pipes using continuous wave sensor operating at 2.45GHz Journal of Applied

Geophysics, 70(3), 226-236 doi:https://doi.org/10.1016/j.jappgeo.2010.01.003

Al-Shamma'a, A I., Shaw, A., Goh, J H., Cullen, J D., Oliver, M., Vines, M., &

Brockhurst, M (2011) Water pipe leak detection using electromagnetic wave sensor for the water indurestry Paper presented at the 2011 IEEE Symposium on Computers & Informatics, Kuala Lumpur, Malaysia

Abebe, B A., Grum, B., Berihu, L G., & Tariku, Z & (2023) Causes, Effects, and

Mitigation Measures of Time and Cost Overruns in Water Supply Projects: Case of Tigrai Region, Northern Ethiopia Advances in Civil Engineering,

Andersen, B., & Zidane, Y J T (2018) The top 10 universal delay factors in construction projects International Journal of Managing Projects in Business, 11(3), 650-672 doi:10.1108/IJMPB-05-2017-0052

Alves, D., Duviella, E., Blesa, J., & Rajaoarisoa, L (2022) Leak Detection in Water

Distribution Networks Based on Water Demand Analysis IFAC-

PapersOnLine, 55(6), 679-684 doi:https://doi.org/10.1016/j.ifacol.2022.07.206

Alves, D., Duviella, E., Cembrano, G., Blesa, J., Romero-Ben, L., & Puig, V (2023)

Leak detection and localization in water distribution networks: Review and perspective Annual Reviews in Control, 55, 392-419 doi:https://doi.org/10.1016/j.arcontrol.2023.03.012

Alsuliman, J A (2019) Causes of delay in Saudi public construction projects

Alexandria Engineering Journal, 58(2), 801-808 doi:https://doi.org/10.1016/j.aej.2019.07.002

Anusart, K., Khomsay, K., Kitbutrawat, N., Vanijjirattikhan, R., Khomsay, S.,

Udomsuk, S., & Supakchukul, U (2022) AI-based acoustic leak detection in

NGUYEN DUC PHUONG HUY - 2270185 90 water distribution systems Results in Engineering, 15, 100557 doi:https://doi.org/10.1016/j.rineng.2022.100557

Burkov, A (2019), The Hundred-Page Machine Learning Book

Boxall, J., Machell, J., & Mounce, S (2007) An Artificial Neural Network/Fuzzy

Logic system for DMA flow meter data analysis providing burst identification and size estimation Paper presented at the 9th Computing and Control for the Water Industry (CCWI2007) and the Sustainable Urban Water Management (SUWM) conferences, Leicester, UK

British Standard (BS 6079-3:2000), Project Management - Part 3: Guide to the management of bussiness related project risk, BSI 01-2000

Belassi, W., & Tukel, O I (1996) A new framework for determining critical success/failure factors in projects International Journal of Project Management, 14(3), 141-151 doi:https://doi.org/10.1016/0263- 7863(95)00064-X

Cabrera, E., & Cabrera, Jr., & García, V (2008) The Minimum Night Flow Method

Cody, R A., & Narasimhan, S (2020) A field implementation of linear prediction for leak-monitoring in water distribution networks Advanced Engineering Informatics, 45 doi:https://doi.org/10.1016/j.aei.2020.101103

Demissew, A., & Abiy, F (2023) Causes and Impacts of Delays in Ethiopian Public

Construction Projects (Case on Debre Markos University Construction Projects) Advances in Civil Engineering, 2023, 6577676 doi:10.1155/2023/6577676

Enshassi, A., Al-Najjar, J., & Kumaraswamy, M (2010) Significant Factors Causing

Time and Cost Overruns in Construction Projects in the Gaza Strip: Contractors’ Perspective International Journal of Construction Management,

Emam, H., Abdelaal, M., & Farrell, P (2015) Causes of Delay on Large

Infrastructure Projects in Qatar Retrieved from https://www.arcom.ac.uk/- docs/proceedings/d75beb362c997a654af8bb3e742a0891.pdf

Fayaz, J., McMillan, L., & Varga, L (2024) Domain-informed variational neural networks and support vector machines-based leakage detection framework to augment self-healing in water distribution networks Water Research, 249 doi:https://doi.org/10.1016/j.watres.2023.120983

Fan, X., Zhang, X., & Yu, X (2021) Machine learning model and strategy for fast and accurate detection of leaks in water supply network Journal of Infrastructure Preservation and Resilience, 2(1), 10 doi:10.1186/s43065-021-

Gỹndỹz, M., ệzdemir, M., & Nielsen, Y (2013) Quantification of Delay Factors

Using the Relative Importance Index Method for Construction Projects in Turkey Journal of Management in Engineering, 29(2), 133-139 doi:10.1061/(ASCE)ME.1943-5479.0000129

Huang, H., Xin, K., Li, R., & Tao, T (2014) A review of methods for burst/leakage detection and location in water distribution systems Water Supply, 15(3), 429-

Hair, J F., Anderson, R E., Tatham, R L., & Black, W C (1998) Multivariate data analysis 5th ed New Jersey, NJ: Printice-Hall

Hosseini, M R., & Durdyev, S (2020) Causes of delays on construction projects: a comprehensive list International Journal of Managing Projects in Business,

Halwatura, R., & Perera, W (2012) Causes and Effects of Delays in Construction of

Medium Scale Drinking Water Supply Projects in Sri Lanka Annual Transactions of Institution of Engineers, Sri Lanka, 151-159 Retrieved from https://www.academia.edu/32511464/Causes_and_Effects_of_Delays_in_Co nstruction_of_Medium_Scale_Drinking_Water_Supply_Projects_in_Sri_Lan ka

Hoang, T & Chu, N M N (2008), Phân tích dữ liệu nghiên cứu với SPSS, Hà Nội:

Ikudayisi, A E., Oviasogie, A., & Shittu, H (2015) Identifying critical success and failure factors in construction projects in nigeria International Journal of Advanced Academic Research, 6(7), 48-59 doi:10.46654/ij.24889849.e6725

Israngkura Na Ayudhya, B (2011) Evaluation of Common Delay Causes of

Construction Projects in Singapore Journal of Civil Engineering and Architecture, 5 doi:10.17265/1934-7359/2011.11.008

John, O., & Benet, V (2000) Measurement, scale construction, and reliability

Handbook of Research Methods in Social and Personality Psychology, New

Kothari, A., & Marimuthu, B (2019) An Efficient scheme for Water Leakage

Detection using Support Vector Machines (SVM) - Zig Retrieved from https://www.researchgate.net/publication/346309472_An_Efficient_scheme_ for_Water_Leakage_Detection_using_Support_Vector_Machines_SVM_- _Zig

Kazaz, A., Tuncbilekli, N., & Ulubeyli, S (2012) Causes of Delays in Construction

Projects in Turkey Journal of Civil Engineering and Management, 18, 426-

Kishk, M., & Motaleb, O (2013) An Investigation into the Risk of Construction

Projects Delays in the UAE Int J Inf Technol Proj Manag., 4, 50-65 doi:10.4018/jitpm.2013070104

K V, P., & Vasugi, V (2018) Delays in construction projects: A review of causes, need & scope for further research Malaysian Construction Research Journal,

Le, A., Nguyen, B N., & Toan, N (2014) Time Delays Causes in Construction

Projects in Hanoi, Vietnam: Contractors’ Perspectives Paper presented at New Technologies for Urban Safety of Mega Cities in Asia Conference,

Linnhoff-Popoien, C., Platschek, C., Ritz, F., Ochs, J., Müller, R., Illium, S., &

Schrửder, T (2020) Acoustic Leak Detection in Water Networks Retrieved from https://www.scitepress.org/Link.aspx?doi.5220/0010295403060313

Liu, D., Chen, J., Xiong, L., Liu, P., Guo, S., Zhou, W., & Zeng, Y (2024)

Assessment of the impacts of water resources allocation on the reliability, resilience and vulnerability of the water–energy–food–society (WEFS) nexus system Agricultural Water Management, 295 doi:https://doi.org/10.1016/j.agwat.2024.108780

Lenard Dennis, J., Faniran Olusegun, O., & Oluwoye Jacob, O (1998) Interactions between Construction Planning and Influence Factors Journal of Construction Engineering and Management, 124(4), 245-256 doi:10.1061/(ASCE)0733-9364(1998)124:4(245)

Lãm, V Q (2015) Các yếu tố gây chậm tiến độ và vượt dự toán các dự án đầu tư công tại việt nam Tạp chí Phát triển và Hội nhập, 23(33), 24-31

Mohammed, E., Zeleke, E., & Abebe, S (2021) Water leakage detection and localization using hydraulic modeling and classification Journal of Hydroinformatics, 23 doi:10.2166/hydro.2021.164

Marzouk, M M., & El-Rasas, T I (2014) Analyzing delay causes in Egyptian construction projects Journal of Advanced Research, 5(1), 49-55 doi:https://doi.org/10.1016/j.jare.2012.11.005

Marzouk, M., Elshaboury, N (2020) Comparing Machine Learning Models For

Predicting Water Pipelines Condition Paper presented at the 2020 2nd Novel

Intelligent and Leading Emerging Sciences Conference (NILES), Giza, Egypt

Mitchell, T M (2017) Machine Learning New York: McGraw Hill

Mezher, T M., & Tawil, W (1998) Causes of delays in the construction industry in

Lebanon Engineering, Construction and Architectural Management, 5(3),

Menesi, W (2007) Construction Delay Analysis under Multiple Baseline Updates

(Master’s thesis of Applied Science) University of Waterloo, Waterloo, Ontario, Canada

Nhân, C N A (2010) Cải thiện tiến độ giải ngân vốn ngân sách đầu tư xây dựng cơ bản tại Việt Nam Kinh tế Xây dựng, 3/2010, 20-26

Nguyen, D T (2014), Giáo trình phương pháp nghiên cứu khoa học trong kinh doanh, Hà Nội: NXB Tài chính

Nguyễn H H., Nguyễn, V S (2019) Nghiên cứu mức độ ảnh hưởng của các nhân tố gây chậm tiến độ thi công công trình thủy lợi, thủy điện ở Việt Nam Khoa học

Thuỷ lợi và Môi trường, 67, 93-100

Oke, A., Aigbavboa, C., & Dlamini, E (2017) Factors Affecting Quality of

Construction Projects in Swazilland Paper presented at 9th International Conference on Construction in the 21st Century, Dubai, UAE

Obeid, A., Karray, F., Abid, M., Bensaleh, M., Qasim, S M., & Jmal, W (2016)

Toward Realisation of Wireless Sensor Network based Water Pipeline Monitoring Systems: A Comprehensive Review of Techniques and Platforms

IET Science Measurement & Technology, 10 doi:10.1049/iet-smt.2015.0255

Riehle, E., Xiang, H., Fuchun, J., Tao, L., Bowen, N., Singh, R P., & Xue, Z (2020)

Application of acoustic intelligent leak detection in an urban water supply pipe network Journal of Water Supply: Research and Technology-Aqua, 69(5),

Sawhney, A., Iyer, K C., Doloi, H., & Rentala, S (2012) Analysing factors affecting delays in Indian construction projects International Journal of Project

Management, 30(4), 479-489 doi:https://doi.org/10.1016/j.ijproman.2011.10.004

Sichombo, B., Kaliba, C., Muya, M., & Shakantu, W (2013) Cost Escalation,

Schedule Overruns and Quality Shortfalls on Construction Projects: The Case of Zambia International Journal of Construction Management, 13(1), 53-68 doi:10.1080/15623599.2013.10773205

Silva, D D., Marney, D., Mashford, J., & Burn, S (2009) An approach to leak detection in pipe networks using analysis of monitored pressure values by support vector machine Paper presented at the 2009 Third International Conference on Network and System Security, Gold Coast City, QLD, Australia

Soliman, E (2010) Delay causes in kuwait construction projects Retrieved from https://www.researchgate.net/publication/282245511_delay_causes_in_kuwa it_construction_projects

Soliman, E., & Albader, H., & Alrasheed, K (2023) Systematic review of construction project delays in Kuwait Journal of Engineering Research, 11(4), 347-355 doi:https://doi.org/10.1016/j.jer.2023.08.009

Sweis, G (2013) Factors Affecting Time Overruns in Public Construction Projects:

The Case of Jordan International Journal of Business and Management, 8,

Stoianov, I., Nachman, L., Csail, M., Madden, S., & Tokmouline, T (2007)

PIPENET: A wireless sensor network for pipeline monitoring Paper presented at The 6th International Conference on Information Processing in Sensor Networks, Cambridge, MA, USA Doi: 10.1109/IPSN.2007.4379686

Song, J., & Bagaya, O (2016) Empirical Study of Factors Influencing Schedule

Delays of Public Construction Projects in Burkina Faso Journal of Management in Engineering, 32(5) doi:10.1061/(ASCE)ME.1943- 5479.0000443

Triệu A N., Huỳnh D L., Trần, Đ A., & Nguyễn, H T (2022) Dự báo khả năng rò rỉ trên mạng lưới cấp nước bằng một số kỹ thuật học máy: Nghiên cứu điển hình cho hệ thống cấp nước Trung An - Thành phồ Hồ Chí Minh Tạp chí Khoa học Thủy lợi và Môi trường, 78, 44-52

Telvari, A., Lork, A., & Feyzbakhsh, S (2018) Investigating the Causes of Delay in

Construction of Urban Water Supply and Wastewater Project in Water and WasteWater Project in Tehran Civil Engineering Journal, 3 doi:10.28991/cej-030958

Thi, C H (2006) Critical Success Factors in Project Management - An Analysis of

Infrastructure projects in Viet Nam (Master’s thesis of Construction

Managent) School of Management, Bangkok, Thailand

Tổng Công ty Cấp nước Sài Gòn TNHH MTV (2024) Báo cáo mục tiêu tổng quát, kế hoạch kinh doanh năm 2024 Retrieved from https://sawaco.com.vn/post-

NGUYEN DUC PHUONG HUY - 2270185 96 detail/bao-cao-ke-hoach-san-xuat-kinh-doanh-hang-nam-2024-

Tuấn, T H (2014) Các nhân tố ảnh hưởng đến chi phí và thời gian hoàn thành dự án trong giai đoạn thi công trường hợp nghiên cứu trên địa bàn thành phố cần thơ

Tạp chí Khoa học Đại học cần Thơ 30, 26-33 Retrieved from https://ctujsvn.ctu.edu.vn/index.php/ctujsvn/article/view/1784

Varija, K., & Darsana, P (2018) Leakage Detection Studies for Water Supply

Systems - A Review Water Science and Technology Library, 78

Vu Quoc, H., Nguyen, M., & Mai, S (2021) Risk Assessments in Construction of

Water Supply Projects in Hanoi, Vietnam American Journal of Industrial and

Zhou, J., Akogbe, R., & Feng, X (2013) Importance and Ranking Evaluation of

Delay Factors for Development Construction Projects in Benin KSCE Journal of Civil Engineering, 17 doi:10.1007/s12205-013-0446-2

QUESTIONAIRE

My name is Nguyễn Đức Phương Huy and I am a student attending Master of Construction Management program at Bach Khoa University Now, I am writing my master dissertation on “Application of support vector machine (SVM) to predict potential leakages on water supply network” which partly requires me to conduct a survey on factors affecting the delays of projects to repair, renovate and upgrade water supply pipelines in Ho Chi Minh City by obtaining assessments from professionals, specialists, engineers, and those who work in the mentioned field

So as for the thesis to be accomplished and deliver practical values, I sincerely look forward to receiving your respondence regarding this questionaire and thank you for your support

I strongly guarantee that your response to these questions is used solely for this master thesis!

Should you have any question considering this survey, please do not hestitate to contact me via:

My full name: Nguyễn Đức Phương Huy

Like many other construction projects, water supply projects nowadays experience the risks of delay due to different factors, which wastes a huge number of cost and labour means Moreover, ensuring the stable exploitation of this long- standing water distribution system requires a sustainable solution to resources allocation to crucial water supply projects whilst comprehending a machine learning method appear to be more than a trend in this era of globalisation; hence, it could be

NGUYEN DUC PHUONG HUY - 2270185 98 considered as the transition movement toward a 4.0 construction sector where the community is adapting the excellences of AI

Please cross where you find applicable:

 Team leader/Team manager or equivalent

 Director/Project management unit director or equivalent

PART 3: INQUIRIES REGARDING OPINIONS ON FACTORS DELAYING PROJECTS OF REPAIRRING, RENOVATING, UPGRADING WATER SUPPLY PIPELINES IN HO CHI MINH CITY

Please indicate your agreement with the statements below by marking "X" in the corresponding boxes Precisely:

I Supplies, fittings, and materials (supplies)

1.1 Shortage of supplies in market VT1

1.3 Damage of sorted supplies VT3

1.4 Procurement process of supplies prior to installation VT4

II Design related factors TK

2.1 Delay of design submission TK1

2.2 Design change during construction TK2

2.3 Unclear/Defective design details TK3

2.4 Unrealistic project scheduling and estimation TK4

3.4 Accidents or unexpected events during construction MT4

IV Finance-related factors TC

4.1 (Annual) budgeting from investor TC1

4.3 Slow payment progress from investor TC3

5.3 Low productivity human resources NL3

VI Equipment-related factors TB

6.1 Shortage of essential equipment TB1

6.3 Improper equipment or Equiment differs from bidding documents TB3

VII Regulation-related factors PL

Complicated and heavy procedures (such as from local authorities, investor, and consultants)

7.2 Lack of specialised planning by government entities PL2

7.3 Changes in governmental laws and regulations PL3

7.4 Conflicts/Dispute regarding contracts/projects PL4

8.2 Hand-over process for construction K2

8.3 Lack of communication among stakeholders K3

8.4 High pressure of current network at project site K4

8.6 Insufficient supervision (from any of stakeholders) K6

IX Schedule overrun of the project due to surveyed factors TD

9.1 Delay less than 3 months TD1

9.2 Delay from 3 months to 6 months TD2

9.3 Delay from 6 months to 1 year TD3

9.4 Delay more than 1 year TD4

NGUYEN DUC PHUONG HUY - 2270185 101 PART 4: PERSONAL INFORMATION

First and foremost, I wholeheartedly thank you for your time filling this survey

My very goal is to obtain trustworthy and accurate data for my study “Application of support vector machine (SVM) to predict potential leakages on water supply network” In order to do this, I look forward to your basic personal information in order for reference

This information includes full name, age, gender, email, and phone number Knowing that this is sensitive information, I pledge to keep it confidential and only use this information for scientific research purposes All personal data will be handled and preserved carefully, I ensure it is not disclosed to third parties or used for purposes beyond the scope of this research

Please note that providing information is completely voluntary If you feel uncomfortable sharing any information, you have the right to refuse to provide it without affecting your participation in the survey

I greatly appreciate all your support and commit to using information responsibly and respectfully Any questions or concerns can be shared with me directly via the contact information above

Please fill in the form:

Full name: ……… Age: ………Gender: ……… Email: ……… Phone number: ………

SPSS STATISTICAL RESULT

Truong pho phong/ tuong duong 39 23,9 23,9 96,9

Scale Variance if Item Deleted

Cronbach's Alpha if Item Deleted

Scale Variance if Item Deleted

Cronbach's Alpha if Item Deleted

Scale Variance if Item Deleted

Cronbach's Alpha if Item Deleted

Scale Variance if Item Deleted

Cronbach's Alpha if Item Deleted

Scale Variance if Item Deleted

Cronbach's Alpha if Item Deleted

Scale Variance if Item Deleted

Cronbach's Alpha if Item Deleted

Scale Variance if Item Deleted

Cronbach's Alpha if Item Deleted

Scale Variance if Item Deleted

Cronbach's Alpha if Item Deleted

Scale Variance if Item Deleted

Cronbach's Alpha if Item Deleted

NGUYEN DUC PHUONG HUY - 2270185 107 KMO and Bartlett's Test

Kaiser-Meyer-Olkin Measure of Sampling

Initial Eigenvalues Extraction Sums of Squared

Extraction Method: Principal Component Analysis

Extraction Method: Principal Component Analysis a 1 components extracted

PL VT TK MT TC NL TB K

** Correlation is significant at the 0.01 level (2-tailed)

* Correlation is significant at the 0.05 level (2-tailed)

Kaiser-Meyer-Olkin Measure of Sampling Adequacy ,741

Com- po- nent Initial Eigenvalues Extraction Sums of Squared

Rotation Sums of Squared Loadings

Extraction Method: Principal Component Analysis

Extraction Method: Principal Component Analysis

Rotation Method: Varimax with Kaiser Normalization a Rotation converged in 8 iterations

1 TB, MT, VT, TC, NL, PL, K, TK b Enter a Dependent Variable: TD b All requested variables entered

SPL Error of the Estimate

1 ,774 a ,600 ,579 ,64846846 ,600 29,939 8 120 ,000 1,284 a Predictors: (Constant), TB, MT, VT, TC, NL, TK,PL, K b Dependent Variable: TD

Squares df Mean Square F Sig

Total 163,000 163 a Dependent Variable: TD b Predictors: (Constant), TB, MT, VT, TC, NL, TK, PL, K

Ngày đăng: 25/09/2024, 14:38

TÀI LIỆU CÙNG NGƯỜI DÙNG

  • Đang cập nhật ...

TÀI LIỆU LIÊN QUAN