Risk Analysis in Microfinance Using Machine Learning and Potential Integration with Artificial Intelligence Agent

Diego Arriola León; Mohsen Ghodrat

doi:10.21678/jb.2026.2798

Authors

Diego Arriola León Pontificia Universidad Católica del Perú
Mohsen Ghodrat University Canada West

DOI:

https://doi.org/10.21678/jb.2026.2798

Keywords:

Microfinance, credit risk, payment default, small businesses, machine learning, predictive modeling, artificial intelligence agents.

Abstract

Abstract. This study proposes a comprehensive approach for the early detection of default risk in microfinance portfolios, combining machine learning techniques with historical analysis of clients’ payment behavior. A database of more than 50,000 microcredits granted in Peru by a microfinance institution in Huancayo (2019–2021) was used, constructing a risk indicator based on the proportion of days in arrears relative to the agreed payment frequency, with a critical threshold of 25% of the installment period. This criterion differentiates clients with a higher propensity to default without penalizing minor delays, improving analytical accuracy.

The study focuses on microenterprises and informal entrepreneurs, traditionally excluded from formal banking. It provides predictive tools adapted to segments with limited credit history, fostering financial inclusion and strengthening risk management in microfinance institutions.

Four predictive models were evaluated, representing the main families of supervised learning: Gradient Boosting Machine (GBM) for Boosting, Bayesian Additive Regression Trees (BART) for Bayesian ensembles, Random Forest (RF) for Bagging, and Support Vector Machines (SVM) as optimal margin classifiers. This selection allows contrasting methodologies and identifying the most suitable approach for the microfinance context.

The use of supervised learning is justified because the problem has historical labels of default and non-default, enabling predictions directly applicable to credit decision-making. Performance was assessed using metrics such as Cohen’s Kappa, Geometric Mean, and F1-score. Results show that GBM delivers the most consistent performance, BART achieves the best F1-score, and SVM excels in geometric precision. These findings validate the effectiveness of supervised learning in segmenting credit risk, optimizing operational management, and laying the foundation for incorporating artificial intelligence agents to monitor payments in real time and reduce losses from default.

Keywords: Microfinance, credit risk, payment default, small businesses, machine learning, predictive modeling, artificial intelligence agents.

Downloads

Download data is not yet available.

References

Armendáriz, B., & Morduch, J. (2010). The economics of microfinance (2nd ed.). MIT Press.

Brown, I., & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data.

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). ACM. https://doi.org/10.1145/2939672.2939785

Economist Intelligence Unit. (2012). Global microscope on the microfinance business environment 2012. https://www.eiu.com/n/campaigns/microscope2012/

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed.).

He, H., & Garcia, E. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21(9), 1263–1284 10.1109/TKDE.2008.239

Huang, C. L., Chen, M. C., & Wang, C. J. (2007). Credit scoring with a data mining approach based on support vector machines. Expert Systems with Applications, 33(4), 847–856. https://doi.org/10.1016/j.eswa.2006.07.007

J-PAL. (2022). Microcredit: Impacts and promising innovations. Abdul Latif Jameel Poverty Action Lab. https://www.povertyactionlab.org/policy-insight/microcredit

Khandani, A., Kim, A., & Lo, A. (2010). Consumer credit-risk models via ML. Journal of Banking & Finance 34(11), 2767–2787 https://doi.org/10.1016/j.jbankfin.2010.06.001

Lessmann, S., Baesens, B., Seow, H., & Thomas, L. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring. Journal of the Operational Research Society, 66(4), 740–758. https://doi.org/10.1057/jors.2014.22

Malekipirbazari, M., & Aksakalli, V. (2015). Risk assessment in social lending via random forests. Expert Systems with Applications, 42(10), 4621–4631. https://doi.org/10.1016/j.eswa.2015.01.002

Nhung, D. H., & Simioni, M. (2021). A comparison of Random Forest and logistic regression model in credit scoring. HAL Open Archive. https://hal.science/hal-03178971

Rinaldo, A., Passos, L., Lopes, H. F., & Giudici, P. (2018). Application of Bayesian additive regression trees in the development of credit scoring models in Brazil. Brazilian Journal of Probability and Statistics, 32(2), 264–280. https://doi.org/10.1214/17-BJPS354

Sharma, D. (2013). Improving credit scoring with random forests [Masters thesis, San José State University]. SJSU ScholarWorks. https://scholarworks.sjsu.edu/etd_projects/353

Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management. 45(4), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002