Giới hạn và định hướng nghiên cứu

Mặc dù bài nghiên cứu đã xây dựng được phương pháp và tìm ra các yếu tố ảnh hưởng tới khả năng vỡ nợ của khách hàng cá nhân tại Ngân hàng TMCP Quốc Tế Việt Nam bằng các phương pháp rừng ngẫu nhiên, cây quyết định và hồi quy logistic. Tuy nhiên, với số lượng mẫu thu thập trong bài nghiên cứu là 2560 quan sát chưa phải là lớn vì thế mà có thể chưa phân tích một cách tổng thể cho Ngân hàng TMCP Quốc Tế Việt Nam.

Tổng số lượng khách hàng thu thập được là 2560 khách hàng giai đoạn 2019-2020. Với số lượng trên chưa thể phân tích được đặc điểm riêng của từng Chi nhánh ngân hàng, đặc điểm từng

vùng miền như tỉnh thành, địa phương, loại hình kinh doanh của khách hàng cá nhân. Vấn đề tiếp cận dữ liệu của khách hàng cá nhân trong từng ngân hàng là rào cản đối với tác giả. Vì thế, bài nghiên cứu chỉ phân tích được trên các dữ liệu lấy được một cách hạn chế. Từ giới hạn của nghiên cứu đã được nêu, tác giả cũng khuyến nghị cho các nghiên cứu sau có thể có những chính

CHƯƠNG 5. KẾT LUẬN

Khả năng vỡ nợ của khách hàng cá nhân sẽ làm giảm đi các hoạt động của ngân hàng vì vậy các ngân hàng luôn cố gắng nỗ lực để giải quyết cũng như có thể giảm bớt thấp nhất rủi ro này. Các chính sách được các ngân hàng đưa ra liên quan tới hoạt động tín dụng nhằm giảm tỷ lệ vỡ nợ khi cho vay. Các chính sách về phương án cho vay, cơ chế sàng lọc hồ sơ, tài sản thế chấp, bảo lãnh tín dụng của bên thứ ba, xếp hạng tín dụng được ngân hàng sử dụng để kiểm soát rủi ro hoạt động cho vay đối với khách hàng cá nhân.

Nhóm khách hàng cá nhân là đối tượng rất khó quản lý do tính bảo mật thông tin về nhóm khách hàng này là tương đối cao, khó có thể đánh giá được mức độ tin cậy của các nguồn thông tin và thông tin thường xuyên thay đổi. Đặc biệt là, trong điều kiện môi trường kinh doanh còn thiếu thông tin minh bạch về kinh tế - tài chính như tại thị trường Việt Nam. Do đó, điều này ảnh

hưởng rất lớn đến rủi ro của các ngân hàng khi thực hiện nghiệp vụ cho vay khách hàng cá nhân.

Vậy, việc sử dụng ứng dụng học máy trong dự báo rủi ro vỡ nợ của khách hàng cũng là một trong

những cách giúp giảm rủi ro ở khách hàng cá nhân của ngân hàng. Quá trình nghiên cứu đề tài “Ứng dụng học máy trong dự báo vỡ nợ tại Ngân hàng Thương mại Cổ phần Quốc Tế Việt Nam”

đã đạt được những kết quả có giá trị khoa học và thực tiễn như sau:

Thứ nhất, hệ thống được cơ sở lý thuyết cơ bản về ngân hàng, tín dụng ngân hàng, khái quát về học máy. Đồng thời, trình bày các yếu tố tác động đến khả năng vỡ nợ khách hàng cá nhân thông qua các nghiên cứu trước đây cũng như các phương pháp, mô hình xếp hạng tín dụng

khách hàng cá nhân được xây dựng dựa trên các yếu tố đó.

Thứ hai, bài nghiên cứu đã xây dựng được mô hình nghiên cứu đánh giá khả năng vỡ nợ của khách hàng cá nhân thông qua 12 biến độc lập, 1 biến phụ thuộc thể hiện đặc tính của khách hàng cũng như liên quan tới các khoản vay vốn.

Thứ ba, dựa trên các bài nghiên cứu trước về các biến ảnh hưởng tới khả năng vỡ nợ của khách hàng cá nhân và phân tích dữ liệu, tác giả cũng tìm ra được các yếu tố ảnh hưởng tới khả năng vỡ nợ của khách hàng (thông qua phương pháp rừng ngẫu nhiên): Tuổi, giới tính, tình trạng

hôn nhân, nguồn thu, thu nhập, chi phí, loại khoản vay, số tiền vay, giá trị tài sản đảm bảo, thời gian vay, tổng giá trị tài sản tích luỹ, dự nợ tại các tổ chức tín dụng khác.

Thứ tư, với việc so sánh 3 phương pháp ước lượng khả năng vỡ nợ của khách hàng cá nhân

tại Ngân hàng TMCP Quốc Te Việt Nam, bài nghiên cứu cũng chỉ ra phương pháp rừng ngẫu nhiên có khả năng dự báo tốt trên 76%. Phương pháp hồi quy logistic và phương pháp cây quyết

TÀI LIỆU THAM KHẢO Tài liệu tiếng việt

Bình, Đ. T. (2019). Xây dựng mô hình chấm điểm tín dụng khách hàng cá nhân vay tiêu dùng tại

Việt Nam. Tham khảo tại https://hotroontap.com/wp-

content/uploads/2019/07/X%C3%82Y-D%E1%BB%B0NG-M%C3%94-H%C3%8CNH- CH%E1%BA%A4M-%C4%90I%E1%BB%82M-T%C3%8DN-D%E1%BB%A4NG- KH%C3%81CH-H%C3%80NG-C%C3%81-NH%C3%82N-VAY-TI%C3%8AU- D%C3%99NG-T%E1%BA%A0I-VI%E1%BB%86T-NAM.pdf.

Lan, N. T., Nhâm, Đ. T., Châu, N. M., & Hỗ, L. V. (2018). Ứng dụng một số phương pháp xây dựng hàm phân loại trong cânh báo sớm nguy cơ vỡ nợ của các ngân hàng thương mại cổ

phần Việt Nam. Tham khảo tại http://tapchi.vnua.edu.vn/wp-

content/uploads/2019/01/T%E1%BA%A1p-ch%C3%AD-s%E1%BB%91-7.74-82.pdf.

Tài liệu tiếng anh

Abid, L., Masmoudi, A., & Zouari-Ghorbel, S. (2018). The consumer loan’s payment default predictive model: an application of the logistic regression and the discriminant analysis in a Tunisian commercial bank. Journal of the Knowledge Economy, 9(3), 948-962. Available

at https://doi.org/10.1007/s13132-016-0382-8.

Akindaini, B. (2017). Machine learning applications in mortgage default prediction (Master's thesis). Available at http://urn.fl/URN:NBN:fL:uta-201712122923.

Altman, E. I., Hotchkiss, E., & Wang, W. (2019). Corporate financial distress, restructuring, and bankruptcy: analyze leveraged finance, distressed debt, and bankruptcy. John Wiley & Sons.

Awad, M., & Khanna, R. (2015). Machine learning in action: examples. In Efficient Learning Machines (pp. 209-240). Apress, Berkeley, CA. Available at https://doi.org/10.1007/978- 1-4302-5990-9_11.

Bacham, D., & Zhao, J. (2017). Machine learning: challenges, lessons, and opportunities in credit

risk modeling. Moody’s Analytics Risk Perspectives, 9, 30-35.

Basel ii USA - Definition of Default. Wholesale Default, Retail default. (2021). Basel-Ii-

Association.com. Available at https://www.basel-ii-

Berger, A. N., & Humphrey, D. B. (1997). Efficiency of financial institutions: International survey and directions for future research. European journal of operational research, 98(2), 175-212. Available at https://doi.org/10.1016/s0377-2217(96)00342-6.

Bloem, A. M., & Gorter, C. N. (2001). The treatment of nonperforming loans in macroeconomic

statistics. IMF Working Papers, 2001(209). Available at

https://doi.org/10.5089/9781451874754.001.

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32. Available at https://doi.org/10.1023/a:1010933404324.

Carroll, R. J., & Pederson, S. (1993). On robustness in the logistic regression model. Journal of the Royal Statistical Society: Series B (Methodological), 55(3), 693-706. Available at https://doi.org/10.1111/j.2517-6161.1993.tb01934.x.

Carter, J. R. (2007). An empirical note on economic freedom and income inequality. Public Choice, 130(1-2), 163-177. Available at https://doi.org/10.1007/s11127-006-9078-0.

Cooper, G. F., Aliferis, C. F., Ambrosino, R., Aronis, J., Buchanan, B. G., Caruana, R., ... & Spirtes, P. (1997). An evaluation of machine-learning methods for predicting pneumonia mortality. Artificial intelligence in medicine, 9(2), 107-138. Available at https://doi.org/10.1016/s0933-3657(96)00367-3.

Cox, D. R. (1958). Two further applications of a model for binary regression. Biometrika, 45(3/4), 562-565. Available at https://doi.org/10.2307/2333203.

Cutler, A., Cutler, D. R., & Stevens, J. R. (2012). Random forests. In Ensemble machine learning

(pp. 157-175). Springer, Boston, MA. Available at https://doi.org/10.1007/978-1-4419- 9326-7_5.

Dastile, X., Celik, T., & Potsane, M. (2020). Statistical and machine learning models in credit scoring: A systematic literature survey. Applied Soft Computing, 91, 106263. Available at https://doi.org/10.1016/j.asoc.2020.106263.

De Castro Vieira, J. R., Barboza, F., Sobreiro, V. A., & Kimura, H. (2019). Machine learning models for credit analysis improvements: predicting low-income families’ default. Applied Soft Computing, 83, 105640. Available at https://doi.org/10.1016/j.asoc.2019.105640.

DeMaris, A., & Selman, S. H. (2013). Logistic regression. In Converting Data into Evidence (pp.

115-136). Springer, New York, NY. Available at https://doi.org/10.1007/978-1-4614- 7792-1_7.

Donges, N. (2019). A complete guide to the random forest algorithm. Built In, 16.

Edward, A. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. Journal of Finance, 23(4), 589-607. Available at https://doi.org/10.2307/2978933.

Fawcett, T. (2006). An introduction to ROC analysis. Pattern recognition letters, 27(8), 861-874. Available at https://doi.org/10.1016/j.patrec.2005.10.010.

Fofack, H. (2005). Nonperforming loans in Sub-Saharan Africa.

Galindo, J., & Tamayo, P. (2000). Credit risk assessment using statistical and machine learning: basic methodology and risk modeling applications. Computational Economics, 15(1), 107- 143. Available at https://doi.org/10.1023/A:1008699112516.

Grennepois, N., Alvirescu, M. A., & Bombail, M. (2018). Using Random Forest for credit risk models. Deloitte Risk Advisory.

Jacobson, T., & Roszbach, K. (2003). Bank lending policy, credit scoring and value-at-risk. Journal of banking & finance, 27(4), 615-633. Available at https://doi.org/10.1016/S0378- 4266(01)00254-0.

Jaquette, O., & Hillman, N. W. (2015). Paying for default: Change over time in the share of federal financial aid sent to institutions with high student loan default rates. Journal of

Student Financial Aid, 45(1), 2. Available at

https://ir.library.louisville.edu/jsfa/vol45/iss1/2/.

Karim, M. Z., Chan, S. G., & Hassan, S. (2010). Bank efficiency and non-performing loans: Evidence from Malaysia and Singapore. Prague Economic Papers, 2(1). Available at https://www.academia.edu/728779/Bank_Efficiency_And_Non_Performing_Loans_Evid ence_From_Malaysia_And_Singapore?auto=citations&from=cover_page.

Keenan, S., 1999. Historical default rates of corporate bond issuers 1920-1998, Moody’s Investor Service, Global Credit Research, January 1999.

Khandani, A. E., Kim, A. J., & Lo, A. W. (2010). Consumer credit-risk models via machine- learning algorithms. Journal of Banking & Finance, 34(11), 2767-2787. Available at https://doi.Org/10.1016/j.jbankfin.2010.06.001.

Kim, D. S., & Shin, S. (2021). The economic explainability of machine learning and standard econometric models-an application to the US mortgage default risk. International Journal of Strategic Property Management, 25(5), 396-412. Available at https://doi.org/10.3846/ijspm.2021.15129.

Kocenda, E., & Vojtek, M. (2011). Default predictors in retail credit scoring: Evidence from Czech banking data. Emerging Markets Finance and Trade, 47(6), 80-98. Available at https://doi.org/10.2753/REE1540-496X470605.

Lalkhen, A. G., & McCluskey, A. (2008). Clinical tests: sensitivity and specificity. Continuing education in anaesthesia critical care & pain, 8(6), 221-223. Available at https://doi.org/10.1093/bjaceaccp/mkn041.

Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R news, 2(3), 18-22. Available at https://cogns.northwestern.edu/cbmg/LiawAndWiener2002.pdf.

Lou, K. R., & Wang, W. C. (2013). Optimal trade credit and order quantity when trade credit impacts on both demand rate and default risk. Journal of the Operational Research Society, 64(10), 1551-1556. Available at https://doi.org/10.1057/jors.2012.134.

Maalouf, M. (2011). Logistic regression in data analysis: an overview. International Journal of Data Analysis Techniques and Strategies, 3(3), 281-299. Available at https://doi.org/10.1504/IJDATS.2011.041335.

Makowski, P. (1985). Credit scoring branches out. The Credit World, 75(1), 30-37.

Mensah, C., Raphael, G., Dorcas, O., & Kwadwo, B. Y. (2013). The relationship between loan default and repayment schedule in microfinance institutions in Ghana: A case study of Sinapi Aba Trust. Research Journal of Finance and Accounting, 4(19), 165-75. Available at https://core.ac.uk/download/pdf/234629751.pdf.

Moffatt, P. G. (2005). Hurdle models of loan default. Journal of the operational research society, 56(9), 1063-1071. Available at https://doi.org/10.1057/palgrave.jors.2601922.

Neema, S., & Soibam, B. (2017). The comparison of machine learning methods to achieve most cost-effective prediction for credit card default. Journal of Management Science and Business Intelligence, 2(2), 36-41. Available at https://doi.org/10.5281/zenodo.851527. Ojiako, I. A., & Ogbukwa, B. C. (2012). Economic analysis of loan repayment capacity of

smallholder cooperative farmers in Yewa North Local Government Area of Ogun State, Nigeria. African Journal of Agricultural Research, 7(13), 2051-2062. Available at https://doi.org/10.5897/AJAR11.1302.

Oni, O. A., Oladele, O. I., & Oyewole, I. K. (2005). Analysis of factors influencing loan default among poultry farmers in Ogun State Nigeria. Journal of Central European Agriculture, 6(4), 619-624. Available at https://hrcak.srce.hr/17331.

Paleologo, G., Elisseeff, A., & Antonini, G. (2010). Subagging for credit scoring models. European journal of operational research, 201(2), 490-499. Available at https://doi.org/10.1016/j.ejor.2009.03.008.

Petropoulos, A., Siakoulis, V., Stavroulakis, E., & Vlachogiannakis, N. E. (2020). Predicting bank insolvencies using machine learning techniques. International Journal of Forecasting, 36(3), 1092-1113. Available at https://doi.org/10.1016/j.ijforecast.2019.11.005.

Pregibon, D. (1981). Logistic regression diagnostics. The annals of statistics, 9(4), 705-724. Available at https://doi.org/10.1214/aos/1176345513.

Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106. Available at https://doi.org/10.1007/bf00116251.

Rokach, L., & Maimon, O. (2005). Decision trees. In Data mining and knowledge discovery handbook (pp. 165-192). Springer, Boston, MA. Available at https://doi.org/10.1007/0- 387-25465-x_9.

Rymarczyk, T., Kozlowski, E., Klosowski, G., & Niderla, K. (2019). Logistic regression for machine learning in process tomography. Sensors, 19(15), 3400. Available at https://doi.org/10.3390/s19153400.

Saltelli, A. (2002). Sensitivity analysis for importance assessment. Risk analysis, 22(3), 579- 590.

Available at https://doi.org/10.1111/0272-4332.00040.

Schuermann, T. (2004). What do we know about loss given default?. Available at https://doi.org/10.2139/ssrn.525702.

Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to algorithms. Cambridge university press. Available at

http:// 103.47.12.35/bitstream/handle/1/1069/understanding-machine-learning-theory- algorithms.pdf?sequence=1&isAllowed=y.

Son, Y., Byun, H., & Lee, J. (2016). Nonparametric machine learning models for predicting the credit default swaps: An empirical study. Expert Systems with Applications, 58, 210-220. Available at https://doi.org/10.1016/j.eswa.2019.05.028.

Soofi, A. A., & Awan, A. (2017). Classification techniques in machine learning: applications and issues. Journal of Basic and Applied Sciences, 13, 459-465. Available at

https://pdfs.semanticscholar.org/2678/e213cec548d278879ceaf01582ee8913cc3f.pdf. Srinivasan, V., & Kim, Y. H. (1987). Credit granting: A comparative analysis of classification

procedures. The Journal of Finance, 42(3), 665-681. Available at https://doi.org/10.1111/j.1540-6261.1987.tb04576.x.

Steenackers, A., & Goovaerts, M. (1989). A credit scoring model for personal loans. Insurance: Mathematics & Economics, 8(1), 31-34. Available at https://doi.org/10.1016/0167- 6687(89)90044-9.

Stepanova, M., & Thomas, L. C. (2001). PHAB scores: proportional hazards analysis behavioural scores. Journal of the Operational Research Society, 52(9), 1007-1016. Available at https://doi.org/10.1057/palgrave.jors.2601189.

Svetnik, V., Liaw, A., Tong, C., Culberson, J. C., Sheridan, R. P., & Feuston, B. P. (2003). Random forest: a classification and regression tool for compound classification and QSAR modeling. Journal of chemical information and computer sciences, 43(6), 1947-1958. Available at https://pubs.acs.org/doi/abs/10.1021/ci034160g.

Sy, W. N. (2007). A causal framework for credit default theory. Australian Prudential Regulation Authority Working Paper. Available at https://doi.org/10.2139/ssrn.2389605.

Tam, K. Y., & Kiang, M. Y. (1992). Managerial applications of neural networks: the case of bank failure predictions. Management science, 38(7), 926-947. Available at https://doi.org/10.1287/mnsc.38.7.926.

Thomas, M., Sing, H., Belenky, G., Holcomb, H., Mayberg, H., Dannals, R., ... & Redmond, D. (2000). Neural basis of alertness and cognitive performance impairments during sleepiness.

levels(DATA$Y)[levels(DATA$Y)

= = "0"] <- "Có"

levels(DATA$Y)[levels(DATA$Y)

= = "1"] <- "Không"

I. Effects of 24 h of sleep deprivation on waking human regional brain activity. Journal of sleep research, 9(4), 335-352. Available at https://doi.org/10.1046Zj.1365-

2869.2000.00225.x.

Tiwari, A. K. (2018). Machine learning application in loan default prediction. Machine Learning,

4(5). Available at https://media.neliti.com/media/publications/342392-machine-learning- application-in-loan-def-72ea0cf3.pdf.

Townsend, J. T. (1971). Theoretical analysis of an alphabetic confusion matrix. Perception & Psychophysics, 9(1), 40-50. Available at https://doi.org/10.3758/BF03213026.

Twala, B. (2010). Multiple classifier application to credit risk assessment. Expert systems with applications, 37(4), 3326-3336. Available at https://doi.org/10.1016/j.eswa.2009.10.018. Van Gestel, T., Baesens, B., Suykens, J., Espinoza, M., Baestaens, D. E., Vanthienen, J., & De

Moor, B. (2003, March). Bankruptcy prediction with least squares support vector machine classifiers. In 2003 IEEE International Conference on Computational Intelligence for Financial Engineering, 2003. Proceedings. (pp. 1-8). IEEE. Available at https://doi.org/10.1109/cifer.2003.1196234.

Van Liebergen, B. (2017). Machine learning: a revolution in risk management and compliance?. Journal of Financial Transformation, 45, 60-67. Available at https://ideas.repec. org/a/ri s/j ofitr/1592. html.

Wagner, H. (2017). Default definition under Basel. Intelligent credit scoring: Building and implementing

better credit risk scorecards, 119-130. Available at https://doi.org/10.1002/9781119282396.ch7. Ward, T. J., & Foster, B. P. (1997). A note on selecting a response measure for financial distress. Journal

of Business Finance & Accounting, 24(6), 869-879. Available at https://doi.org/10.1111/1468- 5957.00138.

Widiastuti, J. (2018). KLASIFIKASI PEMBIAYAAN WARUNG MIKRO MENGGUNAKAN

METODE RANDOM FOREST DENGAN TEKNIK SAMPLING KELAS IMBALANCED (Studi Kasus: Data Nasabah Pembiayaan Warung Mikro Bank Syariah Mandiri KC Jambi). Available at https://dspace.uii.ac.id/handle/123456789/7690.

Zuech, R., Hancock, J., & Khoshgoftaar, T. M. (2021). Detecting web attacks using random undersampling and ensemble learners. Journal of Big Data, 8(1), 1-20. Available at https://doi.org/10.1186/s40537-021-00460-8.

PHỤ LỤC - KẾT QUẢ CHẠY MÔ HÌNH

# Thư viện library(caret) library(caTools) library(rpart) library(rpart.plot) library(tidyverse) library(MLmetrics) library(randomForest) library(skimr)

Giới hạn và định hướng nghiên cứu

Phương pháp phân tích dữ liệu

Khu vực dưới đường cong (AUC)