Predicting status of Chinese listed companies based on features selected by penalized regression
China’s companies have attracted much attention due to the development of stock market in China. The listing status of listed Chinese companies becomes an important indicator which implies the potential risk of a stock. Thus predicting the status of listed Chinese companies is obviously crucial for stockholders and investors when they make further decisions. According to the four possible listing statuses for Chinese companies, researchers formulate the above issue as a classification problem which is typical in data mining area. Plenty of classification techniques have been implemented to predict the status of the listing Chinese companies based on their financial factors. Usually, there are more than 150 financial factors for each of the listed companies, and feature selection is needed before the implementation of classification methods. In the literature, researcher used t-test with variance inflation factor (VIF) analysis to select relevant factors. However, such method can not be applied in the high dimensional case. In this paper, we apply the idea of penalized regression to select the interested factors based on a logistic regression model, and then apply popular classification methods to predict the companies’ statuses. Our results show that the proposed method can find more representative factors and improves the prediction accuracy of the classification methods.
KeywordsClassification data mining feature selection penalized regression
Unable to display preview. Download preview PDF.
The authors would like to thank the anonymous reviewers for their valuable comments and suggestions that helped improve the quality of this manuscript. Honghao Zhao’s research is supported by the Macau University of Science and Technology and the Macau Foundation. Rui Ma’s research is funded by Premier-Discipline Enhancement Scheme supported by Zhuhai Government and Premier Key-Discipline Enhancement Scheme supported by Guangdong Government Funds.
- Fiedman, J., Hastie, T., Hofling, H. & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33: 1–22.Google Scholar