Default Prediction for Real Estate Companies with Imbalanced Dataset

Yuan-Xiang Dong , Zhi Xiao and Xue Xiao
Volume: 10, No: 2, Page: 314 ~ 333, Year: 2014
Keywords: Default prediction, Imbalanced dataset, Real estate listed companies, Minoritysample generation approach
Full Text:

When analyzing default predictions in real estate companies, the number of non-defaulted cases always greatly exceeds the defaulted ones, which creates the twoclass imbalance problem. This lowers the ability of prediction models to distinguish the default sample. In order to avoid this sample selection bias and to improve the prediction model, this paper applies a minority sample generation approach to create new minority samples. The logistic regression, support vector machine (SVM) classification, and neural network (NN) classification use an imbalanced dataset. They were used as benchmarks with a single prediction model that used a balanced dataset corrected by the minority samples generation approach. Instead of using predictionoriented tests and the overall accuracy, the true positive rate (TPR), the true negative rate (TNR), G-mean, and F-score are used to measure the performance of default prediction models for imbalanced dataset. In this paper, we describe an empirical experiment that used a sampling of 14 default and 315 non-default listed real estate companies in China and report that most results using single prediction models with a balanced dataset generated better results than an imbalanced dataset.

Article Statistics
Multiple requests among the same broswer session are counted as one view (or download).
If you mouse over a chart, a box will show the data point's value.

Cite this article
IEEE Style
Y. Dong, Z. Xiao and X. Xiao, "Default Prediction for Real Estate Companies with Imbalanced Dataset," Journal of Information Processing Systems, vol. 10, no. 2, pp. 314~333, 2014. DOI: 10.3745/JIPS.04.0002.

ACM Style
Yuan-Xiang Dong , Zhi Xiao, and Xue Xiao. 2014. Default Prediction for Real Estate Companies with Imbalanced Dataset, Journal of Information Processing Systems, 10, 2, (2014), 314~333. DOI: 10.3745/JIPS.04.0002.