Feature Selection Using Submodular Approach for Financial Big Data


Girija Attigeri, Manohara Pai M. M, Radhika M. Pai, Journal of Information Processing Systems Vol. 15, No. 6, pp. 1306-1325, Dec. 2019  

10.3745/JIPS.04.0149
Keywords: Classification, correlation, Feature Subset Selection, Financial Big Data, Logistic Regression, Submodular Optimization, Support Vector Machine
Fulltext:

Abstract

As the world is moving towards digitization, data is generated from various sources at a faster rate. It is getting humungous and is termed as big data. The financial sector is one domain which needs to leverage the big data being generated to identify financial risks, fraudulent activities, and so on. The design of predictive models for such financial big data is imperative for maintaining the health of the country’s economics. Financial data has many features such as transaction history, repayment data, purchase data, investment data, and so on. The main problem in predictive algorithm is finding the right subset of representative features from which the predictive model can be constructed for a particular task. This paper proposes a correlation-based method using submodular optimization for selecting the optimum number of features and thereby, reducing the dimensions of the data for faster and better prediction. The important proposition is that the optimal feature subset should contain features having high correlation with the class label, but should not correlate with each other in the subset. Experiments are conducted to understand the effect of the various subsets on different classification algorithms for loan data. The IBM Bluemix Big Data platform is used for experimentation along with the Spark notebook. The results indicate that the proposed approach achieves considerable accuracy with optimal subsets in significantly less execution time. The algorithm is also compared with the existing feature selection and extraction algorithms.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.




Cite this article
[APA Style]
Girija Attigeri, Manohara Pai M. M, & Radhika M. Pai (2019). Feature Selection Using Submodular Approach for Financial Big Data. Journal of Information Processing Systems, 15(6), 1306-1325. DOI: 10.3745/JIPS.04.0149.

[IEEE Style]
G. Attigeri, M. P. M. M and R. M. Pai, "Feature Selection Using Submodular Approach for Financial Big Data," Journal of Information Processing Systems, vol. 15, no. 6, pp. 1306-1325, 2019. DOI: 10.3745/JIPS.04.0149.

[ACM Style]
Girija Attigeri, Manohara Pai M. M, and Radhika M. Pai. 2019. Feature Selection Using Submodular Approach for Financial Big Data. Journal of Information Processing Systems, 15, 6, (2019), 1306-1325. DOI: 10.3745/JIPS.04.0149.