A. U. Farouk, A. Abdulkadir, A Y. Abarshi


The study aimed at evaluating the performance of five classification models used in predicting loan approval. A new future selection method was introduced to the loan dataset obtained online from kaggle to remove redundant features that can slow the algorithms and also improve the performance of the models. The dataset used consist of 12 variables and 689 data instances. R package v4.1.1 was used in implementing all the analysis conducted in the research work. The evaluation study proved that the new variable selection technique apart from removing the redundant feature also improves the performance of the models. Credit history proved to be best predictor of loan approval. The three performance evaluation metrics used unanimously showed that Naïve Bayes model outperformed Logistic Regression, Decision tree, support vector machine and Random forest algorithms. The overall accuracy and AUC of Naïve Bayes with 6 predictors are 83.2% and 79.2% respectively. Logistic regression with 6 predictors came second with overall accuracy and AUC of 81.6% and 73.7% respectively. Although Random forest with 9 predictors overall accuracy is higher than that of Logistic regression, Logistic regression is chosen to be the second best due to the number of features in the model and also the AUC. Based on the evaluation metric used the study concluded that Naïve Bayes is the best algorithms for predicting loan approval. It is also recommended that our new future selection method should be compare with other classic methods to validate its performance. Also other classification algorithms not used in this work should be compared with Naïve Bayes to authenticate our claim.

Keywords: Features, Machine learning, Loan, Prediction, Preprocessing

Full Text:



  • There are currently no refbacks.

© IJSAR 2016. All rights reserved