MINISTRY OF EDUCATION TRAINING STATE BANK OF VIET NAM HO CHI MINH CITY UNIVERSITY OF BANKING NGUYEN THI NGOC ANH APPLICATION OF MACHINE LEARNING FOR PREDICTING PROBABILITY OF DEFAULT OF SMALL AND ME.
MINISTRY OF EDUCATION & TRAINING STATE BANK OF VIET NAM HO CHI MINH CITY UNIVERSITY OF BANKING -NGUYEN THI NGOC ANH APPLICATION OF MACHINE LEARNING FOR PREDICTING PROBABILITY OF DEFAULT OF SMALL AND MEDIUM ENTERPRISES GRADUATION THESIS MAJOR: FINANCE – BANKING CODE: 7340201 HO CHI MINH CITY, 2022 MINISTRY OF EDUCATION & TRAINING STATE BANK OF VIET NAM HO CHI MINH CITY UNIVERSITY OF BANKING NGUYEN THI NGOC ANH APPLICATION OF MACHINE LEARNING FOR PREDICTING PROBABILITY OF DEFAULT OF SMALL AND MEDIUM ENTERPRISES GRADUATION THESIS MAJOR: FINANCE – BANKING CODE: 7340201 SUPERVISOR Ph.D NGUYEN MINH NHAT HO CHI MINH CITY, 2022 i ABSTRACT Corporate default predictions play an essential role in each sector of the economy, as highlighted by the Covid - 19 pandemic The recent high incidence of Small and Medium Enterprises bankruptcies has highlighted the necessity of anticipating defaults in many sectors Based on the importance and necessity, this study aims to investigate what appropriate models for predicting the probability of default of SMEs in the Vietnamese Commercial Banks System by Machine Learning approaches; how to choose an appropriate model for predicting the probability of default of SMEs in the Vietnamese Commercial Banks System by Machine Learning approaches; and how to choose an appropriate model for predicting the probability of default of SMEs in the Vietnamese Commercial Banks System by Machine Learning approaches using a unique database of 400 Vietnamese SMEs over the 2019 – 2021 period including 13 independent financial variables The most significant contribution of this research is the application of Machine Learning approaches in the use of financial indicators to anticipate the default likelihood of SMEs, as a result, leading to improved efficiency outcomes in commercial banks' credit risk control in Vietnam in the future This research analyzes the performance of a set of Machine Learning (ML) models in predicting default risk, using a standard statistical model, in particular, the Logistic Regression Model When just a restricted amount of information is provided, such as in the case of financial indicators, ML models (Decision Tree and Random Forest) outperform statistical models was found in terms of discriminatory power and precision Confusion matrix and F1 – Score are used to evaluate which model is the most appropriate to predict the probability of default of SMEs ii DECLARATION I declare that this thesis has been composed solely by myself and that it has not been submitted, in whole or in part, in any previous application for a degree Except where states otherwise by reference or acknowledgment, the work presented is entirely my own The author NGUYEN THI NGOC ANH iii ACKNOWLEDGEMENTS Throughout the writing of this thesis, I have received a great deal of support and assistance First and foremost, I would like to express my heartfelt gratitude and profound gratitude to the professors of the Ho Chi Minh City University of Banking for their passionate teaching and for solidifying the firm foundation of knowledge that enabled me to successfully finish the university program Second, I would like to acknowledge and give my warmest thanks to my supervisor, Ph.D Nguyen Minh Nhat for providing me with thorough instruction and unwavering support in finishing my graduation thesis It would be tough for me to accomplish my thesis without his careful assistance Because of my limited practical experience, the topic of my graduation thesis cannot avoid some faults; nonetheless, I am looking forward to obtaining more advice from lecturers to gain new experiences These experiences, I feel, will be highly beneficial to my future development I sincerely thank you! Nguyen Thi Ngoc Anh iv TABLE OF CONTENT LIST OF ABBREVIATIONS vii LIST OF FIGURES viii LIST OF TABLES ix CHAPTER INTRODUCTION 1.1 The urgency of the research 1.2 Research Objectives 1.3 Research Questions 1.4 Research Subject and Scope 1.4.1 Research Subject 1.4.2 Research Scope 1.5 Research Contributions 1.6 Research Methodology 1.7 The Structure of Research CHAPTER LITERATURE REVIEW 10 2.1 Probability of Default (PD) 10 2.2 Overview of the models used to predict the Probability of Default of SMEs 11 2.2.1 The Structural Models 11 2.2.1.1 Regression Analysis Models 12 2.2.1.2 Discriminant Analysis Models 12 v 2.2.1.3 Logistic Models 13 2.2.2 The Non-Structural Models 14 2.2.2.1 Decision Tree Model (DT) 14 2.2.2.2 Random Forest Model (RF) 15 2.2.2.3 Artificial Neural Network Models (ANNs) 15 2.2.2.4 Ensemble Learning 16 2.3 Previous Related Research 17 CHAPTER DATA AND METHODOLOGY 20 3.1 Methodological Model Framework: 20 3.2 Data collection 21 3.3 Input Variables Selection 22 3.4 The Probability of Default prediction models 25 3.4.1 Logistic Regression Model 25 3.4.2 Decision Tree Model (DT) 26 3.4.3 Random Forest Model (RF) 28 3.4.4 Confusion Matrix 29 3.4.5 F1-Score 31 CHAPTER EMPIRICAL RESULTS 32 4.1 Descriptive statistics results 32 4.2 Correlations 33 4.3 Regression results of a parametric model 34 vi 4.3.1 Logistic Regression Result 34 4.3.2 Confusion matrix of the parametric model 37 4.4 Regression results of non-parametric models 38 4.4.1 Decision Tree 38 4.4.2 Random Forest 40 4.5 Regression result of Ensemble Learning 41 CHAPTER CONCLUSION AND RECOMMENDATION 43 5.1 Applying the model to forecast the likelihood of default for SME customers at Vietnamese Commercial banks 43 5.1.1 Tools that aid in the identification of groups of prospective SMEs customers 43 5.1.2 The model results serve as the foundation for credit policy orientation 44 5.1.3 Applying the model results to improve credit risk management efficiency in Commercial banks 45 5.2 Applying the model to anticipate the likelihood of default for Credit Rating Agencies in Vietnam 46 5.3 Topic limitation and potential research directions 48 5.3.1 Topic limitation 48 5.3.2 Potential research directions 49 REFERENCES i APPENDIX vii vii LIST OF ABBREVIATIONS Number Symbol English PD Probability of Default SMEs Small and Medium Enterprises VCCI Vietnam Chamber of Commerce and Industry ANN Artificial Neural Networks RF Random Forest Model DT Decision Tree Model MDA Multivariate Discriminant Analysis AAFS Annual Audited Financial Statements viii LIST OF FIGURES Figure 1.1: Forecast Insolvency Growth in 2022 compared to 2019 Figure 3.1: A proposed schematic process of the methodology of this study 20 Figure 3.2: Simulation of the decision tree model 27 Figure 3.3: Random Forest Simplified 28 Figure 4.1: Correlation Matrix 34 Figure 4.2: Regression result of the Decision Tree model 38 Figures 1: Six steps with the codes to run Logistic Regression Model xi Figure 2: The usage code to draw Decision Tree model xii Figure 3: The usage code to run the Confusion Matrix of Decision Tree model xii Figure 4: The usage code to run Confusion Matrix of Random Forest model xii Figure 5: The usage code to run the Confusion Matrix of Ensemble Learning model xii 48 the results show that the Random Forest model produces the best results, with a forecast accuracy of up to 100%, which is an important basis for CRAs to choose an appropriate credit rating model Competition, in principle, ensures innovation and serves as a healthy check on product quality The rating business, on the other hand, is wholly built on reputation There is some tension between competition and reputation The leading CRAs have huge reputational capital Investors have faith in CRAs' judgment, and they request a risk premium depending on the issuer's rating Even after the subprime catastrophe, rating judgments remain regularly debated in the financial press, emphasizing their ongoing significance As a result, selecting accurate client information and data is critical for CRAs to anticipate the likelihood of default The data utilized in this study can also help with the process of gathering data to anticipate the default rates of CRA clients 5.3 Topic limitation and potential research directions 5.3.1 Topic limitation Aside from the thesis outcomes, there are several limitations and difficulties The limited data set is the most obvious shortcoming of this investigation Due to time restrictions, only 400 firms were gathered and were restricted to three industries The number of input variables is 14 (13 independent variables and dependent variable), which is suitable for the number of observations However, because this sample was considered tiny, there were no significant findings Furthermore, the quality of the input data is low Although the gathered financial statements have been audited to assure the quality of information sources, the quality of auditing financial statements in Vietnam is not as transparent, clear, and effective as in developed nations In Vietnam, a company can create three or four financial statements for a variety of purposes, including taxation, banking, auditing, and internal control As a result, managing and analyzing the model's input quality is critical for obtaining the most accurate results 49 Additionally, the predicting probability of default models presented in the thesis is based solely on financial data, with no regard for non-financial elements, unlike the internal credit rating models used by commercial banks in Vietnam today In reality, because customers' financial statements not always precisely and completely reflect their business outcomes and financial status, banks must depend on non-financial information to screen and classify customers 5.3.2 Potential research directions This thesis recommends further research in the following industries and areas: Data sets and time intervals are expanded: To obtain more trustworthy findings, the number of firms gathered is increased to 1,000 or even higher Furthermore, instead of collecting yearly financial statements, quarterly corporate financial statements might be gathered for greater accuracy Furthermore, if the acquired data is large enough and can be disaggregated for each set of consumers in different business sectors, it will produce reliable findings that are tailored to each organization's business operations, enhancing the model's applicability REFERENCES Breiman, L., & Ihaka, R (1984) Nonlinear discriminant analysis via scaling and ACE Davis One Shields Avenue Davis, CA, USA: Department of Statistics, University of California Khoshgoftaar, T.M.; Fazelpour, A.; Dittman, D.J.; Napolitano, A Ensemble vs Data Sampling: Which Option Is Best Suited to Improve Classification Performance of Imbalanced Bioinformatics Data? In Proceedings of the IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), Vietri sul Mare, Italy, 9–11 November 2015; pp 705–712 Fan, X.N.; Tang, K.; Weise, T Margin-Based Over-Sampling Method for Learning from Imbalanced Datasets In Advances in Knowledge Discovery and Data Mining; Springer: Berlin/Heidelberg, Germany, 2011; Volume 6635, pp 309– 320 Hamori, S., Kawai, M., Kume, T., Murakami, Y., & Watanabe, C (2018) Ensemble learning or deep learning? Application to default risk analysis Journal of Risk and Financial Management, 11(1), 12 Gupta, J., Gregoriou, A., & Ebrahimi, T (2018) Empirical comparison of hazard models in predicting SMEs failure Quantitative Finance, 18(3), 437-466 McCann, F., & McIndoe-Calder, T (2012) Determinants of SME loan default: the importance of borrower-level heterogeneity (Vol 6) Central Bank and Financial Services Authority of Ireland Mestre, D.; Fonseca, J.M.; Mora, A Monitoring of in-vitro plant cultures using digital image processing and random forests In Proceedings of the 8th International Conference of Pattern Recognition Systems (ICPRS 2017), Madrid, Spain, 11–13 July 2017; pp 1–6 Pompe, P., Bilderbeek, J., 2005 The prediction of bankruptcy of small- and mediumsized industrial firms Journal of Business Venturing, 20(6), 847–868 Abdou, H., & Pointon, J (2011) Credit Scoring, Statistical Techniques, and Evaluation Criteria: A Review of the Literature Intelligent Systems in Accounting, Finance & Management, 59-88 Lundholm, R., & Sloan, R (2004) Equity valuation and analysis Dichev, I D., & Skinner, D J (2002) Large–sample evidence on the debt covenant hypothesis Journal of accounting research, 40(4), 1091-1123 Smith, C W., & Warner, J B (1979) Bankruptcy, secured debt, and optimal capital structure: Comment The Journal of Finance, 34(1), 247-251 Demerjian, P R (2007) Financial ratios and credit risk: The selection of financial ratio covenants in debt contracts AAA Crouhy, M., Galai, D., & Mark, R (2001) Prototype risk rating system Journal of banking & finance, 25(1), 47-95 Yap, B C F., Yong, D G F., & Poon, W C (2010) How well financial ratios and multiple discriminant analysis predict company failures in Malaysia? International Research Journal of Finance and Economics, 54(13), 166-175 Ravi, V., & Pramodh, C (2008) Threshold accepting trained principal component neural network and feature subset selection: Application to bankruptcy prediction in banks Applied Soft Computing, 8(4), 1539-1548 F.M Liou, Fraudulent financial reporting detection and business failure prediction models: a comparison, Manag Audit J 23(2008), 650–662 Perboli, G., & Arabnezhad, E (2021) A Machine Learning-based DSS for mid and long-term company crisis prediction Expert Systems with Applications, 174, 114758 Perboli, G., Tadei, R., & Gobbato, L (2014) The multi-handler Knapsack problem is under uncertainty European Journal of Operational Research, 236, 1000–1007 Baldi, M M., Manerba, D., Perboli, G., & Tadei, R (2019) A Generalized Bin Packing Problem for parcel delivery in last-mile logistics European Journal of Operational Research, 274(3), 990–999 Altman, E I (2014) Predicting financial distress of companies: Revisiting the Z-Score and ZETA models In A R Bell, C Brooks & M Prokopczuk (Eds.), Handbook of research methods and applications in empirical finance (pp 428–456) Edward Elgar Pub Begley, J., Ming, J., & Watts, S (1996) Bankruptcy classification errors in the 1980s: An empirical analysis of Altman‟s and Ohlson‟s models Review of Accounting Studies, (4), 267–284 Schalck, C., & Yankol-Schalck, M (2021) Predicting French SME failures: new evidence from machine learning techniques Applied Economics, 53(51), 59485963 Ciampi, F., and Gordini, N (2013) Small Enterprise Default Prediction Modeling through Artificial Neural Networks: An Empirical Analysis of Italian Small Enterprises Journal of Small Business Management 51(1): 23-45 James, G., Witten, D., Hastie, T., and Tibshirani, R An Introduction to Statistical Learning, 112; Springer: New York, NY, USA, 2013 Barboza, F., H Kimura, and E Altman Machine learning models and bankruptcy prediction, Expert Systems with Applications: An International Journal, 83(c), 2017, 405-417 Brown, I., and C Mues, „An experimental comparison of classification algorithms for imbalanced credit scoring data sets‟, Expert Systems with Applications: An International Journal, 39, 2012, 3446-3453 Resti, A., and A Sironi, Risk Management and Shareholders' Value in Banking: From Risk Measurement Models to Capital Allocation Policies (John Wiley & Sons Ltd, 2007) Chernozhukov, V., D Chetverikov, M Demirer, E Duflo, C Hansen, W Newey, and J Robins, „Double/debiased machine learning for treatment and structural parameters, The Econometrics Journal, 21(1), 2018, C1-C68 Guidotti, R., A Monreale, S Ruggieri, F Turini, D Pedreschi, and F Giannotti, „A Survey of Methods for Explaining Black Box Models‟, ACM computing surveys (CSUR), 51(5), 2019, 93 INE (2014) Empresas em Portugal – 2012 Lisboa: Instituto Nacional de Estatística Psillaki, M., Tsolas, L.E & Margaritis, D (2010) Evaluation of credit risk based on firm performance European Journal of Operational Research, 201(3), 873-881 Back, B., Laitinen, T., Sere, K & Van Wezel, M (1996) Choosing bankruptcy predictors using discriminant analysis, logit analysis, and genetic algorithms Turku Centre for Computer Science Technical Report, 40 Lo, A (1986) Logit versus discriminant analysis: A specification test and application to corporate bankruptcies Journal of Econometrics, 31(2), 151-178 Global Economic Outlook - January 2022 (2022, January 25) Retrieved from the group Atradius: https://group.atradius.com/publications/economic- research/global-economic-outlook-january-2022.html Insolvency increases expected as support ends (2021, October 07) Retrieved from Atradius Collections: https://atradiuscollections.com/global/reports/economic- research-insolvency-increases-expected-as-support-ends.html Report on COVID-19’s impact on Vietnamese businesses released (2021, March 12) Retrieved from NhanDan: https://en.nhandan.vn/business/item/9663802-reporton-covid-19%E2%80%99s-impact-on-vietnamese-businesses-released.html Ince, Huseyin, and Bora Aktan 2009 “A Comparison of Data Mining Techniques for Credit Scoring in Banking: A Managerial Perspective” Journal of Business Economics and Management 10 (3):233-40 https://doi.org/10.3846/16111699.2009.10.233-240 Platt, H D (1991) Predicting corporate financial distress: Reflections on choice-based sample bias Journal of Economics and Finance 2002 Decree No 88/2014/ND-CP dated September 26, 2014, of the Government on credit rating services (2014, November 15) Retrieved from LuatVietnam: https://english.luatvietnam.vn/decree-no-88-2014-nd-cp-dated-september-262014-of-the-government-on-credit-rating-services-89671-Doc1.html Basel Committee on Banking Supervision 2006 International convergence of capital measurement and capital standards www.bis.org Ohlson, J.A (1980), “Financial ratios and the probabilistic prediction of bankruptcy”, Journal of Accounting Research, Vol 18 No 1, pp 109-31 Lennox, C (1999), “Identifying failing companies: a re-evaluation of the logit, probit, and DA approaches”, Journal of Economics and Business, Vol 51, pp 347-64 C Tsai and J Wu, “Using neural network ensembles for bankruptcy prediction and credit scoring,” Expert Systems with Applications, vol 34, no 4, pp 2639–2649, 2008 Olson, D.L.; Delen, D.; Meng, Y Comparative analysis of data mining methods for bankruptcy prediction Decis Support Syst 2012, 52, 464–473 Du Jardin, P Failure pattern-based ensembles applied to bankruptcy forecasting Decis Support Syst 2018, 107, 64–77 Yang, Z., Platt, M.B., and Platt, H.D Probabilistic neural networks in bankruptcy prediction J Bus Res 1999, 44, 67–74 Beaver, W.H (1966) Financial ratios as predictors of failure Journal of Accounting Research, 4, 71-111 Altman, E (1968) Financial ratios discriminant analysis and the prediction of corporate bankruptcy Journal of Finance, 23(4), 589-609 Eisenbeis, R (1977) Pitfalls in the application of discriminant analysis in business, finance, and economics Journal of Finance, 32(3), 875–900 Altman, E., Haldeman, R.G & Narayan, P (1977) Zeta-analysis: A new model to identify bankruptcy on corporations Journal of Banking and Finance, 1(1), 29- 54 Gombola, M., Haskins, M., Ketz, J & Williams, D (1987) Cash flow in bankruptcy prediction Financial Management, 16(4), 55-65 Mossman, Ch.E., Bell, G., Swartz, L & Turtle, H (1998) An empirical comparison of bankruptcy models The Financial Review, 33(2), 35-54 Becchetti, L & Sierra, J (2003) Bankruptcy risk and productive efficiency in manufacturing firms Journal of Banking and Finance, 27(11), 2099-2120 Le, N S V (2013), Investment decisions and bankruptcy risks of companies listed on the Vietnamese stock market, Master's thesis in economics, University of Economics, Ho Chi Minh City Nguyen, T T L (2019) Factors affecting bankruptcy risk of listed companies in the construction industry in Vietnam Journal of Banking Science & Training, No 205 Than dinh Tin dung Blog (2018) Overview of Credit Rating Agency (CRA) in the world Available from Vo, H D and Nguyen, D T (2013a) Credit rating for listed companies in Vietnam using fuzzy theory Journal of Economic Development, No 269 APPENDIX Table 1: Financial Statement Ratios used in RiskCalc Japan Gross Profit to Total Assets PROFITABILITY Previous Year‟s Net Income to Previous Year‟s Net Sales Total Liabilities less Cash to Total Assets LEVERAGE Retained Earnings to Total Liabilities Cash to Total Assets LIQUIDITY Trade Receivables to Net Sales Inventory to Net Sales ACTIVITY INTEREST COVERAGE EBITDA to Interest Expense GROWTH Sales Growth SIZE Real Net Sales Source: Risk, F.C (2004) Moody’s KMV RiskCalc™ v3 model Table 2: The input variable was selected by Gupta, Jairaj, Andros Gregoriou, and Tahera Ebrahimi (2018) Short term debt/equity book value Leverage Total liabilities/tangible total assets Total liabilities/net worth Capital employed/total liabilities Cash and short-term investments/total assets Current Ratio; current assets/current liabilities Liquidity Quick Ratio; (current assets – stocks prepayments)/current liabilities Cash Ratio; (cash + bank + marketable securities)/current liabilities Working Capital Financial expenses/total assets Financial expenses/sales Financing Retained earnings/total assets Earnings before interest taxes depreciation and amortization/interest expense Earnings before interest taxes depreciation and amortization/total assets Profitability Operating profit/capital employed Return on equity; Net profit/equity Net income/sales Operating profit/net income Stock holding period; (stock × 365)/sales Debtor collection period; (trade debtors × 365)/sales Activity Trade creditors payment period; (trade creditors × 365)/sales Working capital/total assets Working capital/sales Sales/tangible assets Capital growth Growth Sales growth Earnings growth Other Control Income taxes/total assets Micro Small Risk Source: Gupta, Jairaj, Andros Gregoriou, and Tahera Ebrahimi (2018) Table 3: The financial indicators of SMEs sample First rows Company Log EBITDA 4.13 4.07 3.99 4.37 4.54 4.52 Log Net Worth 5.12 5.13 5.13 5.17 4.65 4.61 EBITDA/Net Worth 0.1006 0.0884 0.0722 0.1587 0.7741 0.8030 Log Working Capital 4.31 -4.20 4.53 4.84 -3.64 -4.01 3.68 3.55 4.71 4.34 4.31 4.24 EBITDA to IE 1.43 1.36 1.79 4.02 1.97 1.69 Debt to Net Worth 1.74 1.84 2.36 2.58 10.09 12.54 Debt to EBITDA 7.17 3.51 4.78 16.25 3.91 4.56 Current Ratio 1.09 0.94 1.11 1.18 0.99 0.98 Quick Ratio 0.77 0.60 0.76 0.92 0.32 0.36 … Log EBITDA 4.43 4.44 4.54 4.64 Log Net Worth 4.70 4.70 5.55 5.55 Financial Indicator Log Cash and Equivalents EBITDA/Net Worth 0.5458 0.5405 0.0988 0.1233 Log Working Capital 3.84 3.71 5.37 5.85 3.71 3.21 4.29 4.71 EBITDA to IE 2.04 1.56 3.40 5.05 Debt to Net Worth 8.74 8.53 1.71 4.10 Debt to EBITDA 4.03 15.78 4.47 2.06 Current Ratio 1.02 1.01 1.39 1.71 Quick Ratio 0.43 0.36 0.70 0.42 394 395 396 397 398 399 Log EBITDA -1.87 5.01 4.51 4.88 4.47 4.55 Log Net Worth 5.25 5.87 5.24 5.28 5.32 5.35 0.1381 0.1834 0.3919 0.1436 0.1572 4.59 5.92 4.70 4.96 4.90 4.73 4.88 4.86 4.26 5.10 4.95 4.72 Log Cash and Equivalents Last rows Company Financial Indicator EBITDA/Net Worth 0.0004 Log Working Capital Log Cash and Equivalents EBITDA to IE 0.00 115.84 3.87 22.42 20.95 20.42 Debt to Net Worth 0.33 0.38 3.26 1.14 0.93 1.57 Debt to EBITDA -4.15 2.77 2.19 0.18 0.83 9.98 Current Ratio 1.74 5.75 1.25 1.49 1.53 1.26 Quick Ratio 1.63 2.26 0.92 1.38 1.38 1.19 Source: Statistics the author Figures 1: Six steps with the codes to run Logistic Regression Model First, import the collected data with 14 variables as a following: Step 1: Separation of the explanatory and predictor variables: Step 2: Split the data into training and test sets Step 3: Setting up the Logistic Regression model Step 4: Carrying out training Step 5: Making a prediction on the test data set Step 6: Evaluating the effectiveness of the model Source: Statistics the author Figure 2: The usage code to draw Decision Tree model Source: Statistics the author Figure 3: The usage code to run the Confusion Matrix of the Decision Tree model Source: Statistics the author Figure 4: The usage code to run Confusion Matrix of Random Forest model Source: Statistics the author Figure 5: The usage code to run the Confusion Matrix of Ensemble Learning model Source: Statistics the author ... Therefore, the thesis focuses on the issue of ? ?Application of Machine Learning for predicting the probability of default of Small and Medium Enterprises (SMEs)” to provide theoretical basis and. .. choice of model to predict the default probability of small and medium enterprises? ii How Machine Learning approaches greatly influence predicting the PD of SMEs and which model of Machine Learning. .. OF EDUCATION & TRAINING STATE BANK OF VIET NAM HO CHI MINH CITY UNIVERSITY OF BANKING NGUYEN THI NGOC ANH APPLICATION OF MACHINE LEARNING FOR PREDICTING PROBABILITY OF DEFAULT OF