Abstract
Insider trading is one kind of criminal behaviors in security markets. It has existed since the birth of the security market. Until 2018, the history of the Chinese security market is less than 30 years, nonetheless, insider trading behavior frequently occurred. In this study, we mainly explore the features of insider trading behavior by studying relevant indicators during the sensitive period (time window length before the release of insider information). For this purpose, an intelligent system with an integration method of Principal Component Analysis (PCA) and Random Forest (RF) is proposed to identify insider tradings in Chinese security market. In the proposed method, we first collect twenty-six relevant indicators for insider trading samples that occurred from 2007 to 2017 and corresponding non-insider trading samples in Chinese security market. Next, by using the PCA, indicator dimension is reduced and principal components are extracted. Then, relations between insider trading samples and principal components are learnt by the RF algorithm. In the identification phase, the trained PCA-RF model is applied to classify the insider trading and non-insider trading samples, as well as analyzing the relative importance of indicators for insider trading identification. Experimental results showed us that under the 30-, 60-, and 90-days time window lengths, recall results of the proposed method for the out-of-samples identification were 73.53%, 83.87%, and 79.41%, respectively. We further investigate the voting threshold of RF for the proposed method, and we found when the voting threshold of RF was increased to more than 70%, the proposed method produced identification accuracy up to more than 90%. In addition, the relative importance result of RF indicated that three indicators were crucial for insider trading identification. Moreover, identification accuracy and efficiency of the proposed method were substantially superior to benchmark methods. In summary, experimental results indicated that the proposed method could be efficiently applied to Chinese security market. Thus, the proposed method can provide useful suggestions to market regulators for insider trading investigations.
Similar content being viewed by others
Abbreviations
- PCA:
-
Principal component analysis
- PC:
-
Principal component
- RF:
-
Random forest
- CSRC:
-
China securities regulatory commission
- SVM:
-
Support vector machine
- KNN:
-
K nearest neighbors
- ANN:
-
Artificial neural network
- NB:
-
Naive Bayesian
- DT:
-
Decision tree
- SEC:
-
Securities and exchange commission
- PRA:
-
Prudential regulation authority
- CARTs:
-
Classification and regression trees
- FPR:
-
False positive rate
- ACC:
-
Accuracy
- PRE:
-
Precision
- REC:
-
Recall
References
Aboody, D., & Lev, B. (2000). Information asymmetry, R&D, and insider gains. The journal of Finance, 55(6), 2747–2766.
Agrawal, A., & Nasser, T. (2012). Insider trading in takeover targets. Journal of Corporate Finance, 18(3), 598–625.
Ahern, K. R. (2017). Information networks: Evidence from illegal insider trading tips. Journal of Financial Economics, 125(1), 26–47.
Akashi, T. (1989). Regulation of insider trading in japan. Columbia Law Review, 89(6), 1296.
Ali, U., & Hirshleifer, D. (2017). Opportunism as a firm and managerial trait: Predicting insider trading profits and misconduct. Journal of Financial Economics, 126(3), 490–515.
Amiri, S., Von Rosen, D., & Zwanzig, S. (2009). The SVM approach for Box–Jenkins models. REVSTAT–Statistical Journal, 7(1), 23–36.
Anand, A. I., & Beny, L. N. (2007). Why do firms adopt insider trading policies? Evidence from Canadian firms. In American law & economics association annual meetings (pp. 70).
Bhattacharya, U., & Daouk, H. (2002). The world price of insider trading. Journal of Finance, 57(1), 75–108.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Cernadas, E., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15(1), 3133–3181.
Cheng, L., Gao, B., Tian, G. Y., Woo, W. L., & Berthiau, G. (2014). Impact damage detection and identification using eddy current pulsed thermography through integration of PCA and ICA. IEEE Sensors Journal, 14(5), 1655–1663.
Christophe, S. E., Ferri, M. G., & Hsieh, J. (2010). Informed trading before analyst downgrades: evidence from short sellers. Social Science Electronic Publishing, 95(1), 85–106.
Cohen, L., Malloy, C., & Pomorski, L. (2012). Decoding inside information. Journal of Finance, 67(3), 1009–1043.
Collindufresne, P., & Fos, V. (2016). Insider trading, stochastic liquidity and equilibrium prices. Econometrica, 84(4), 1441–1475.
Du, J., & Wei, S. J. (2004). Does insider trading raise market volatility? Economic Journal, 114(498), 916–942.
Easley, D., Kiefer, N. M., O’Hara, M., & Paperman, J. B. (2012). Liquidity, information, and infrequently traded stocks. Journal of Finance, 51(4), 1405–1436.
Ferreira, E. J. (1995). Insider trading activity, different market regimens, and abnormal returns. Financial Review, 30(2), 193–210.
Fidrmucova, J., Goergen, M., & Renneboog, L. D. R. (2010). Insider trading, news releases and ownership concentration. Social Science Electronic Publishing, 61(6), 2931–2973.
Finnerty, J. E. (1976). Insiders and market efficiency. Journal of Finance, 31(4), 1141–1148.
Herzel, L., & Katz, L. (1987). Insider Trading-Who Loses? Lloyds Bank Annual Review, 165, 15–26.
Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832–844.
Huang, H. (2005). The regulation of insider trading in China: a critical review and proposals for reform. Australian Journal of Corporate Law, 17(3), 281–322.
Jakimowicz, A., & Baklarz, A. (2016). Identification of Insider Trading Using Network Numerical Models. Acta Physica Polonica, A., 129(5).
Jarrell, G. A., Poulsen, A. B., & Annette, B. (1989). Stock trading before the announcement of tender offers: insider trading or market anticipation? Journal of Law Economics and Organization, 5(2), 225–248.
Jee, H., Lee, K., & Pan, S. (2004). Eye and face detection using SVM. In Proceedings of the 2004 intelligent sensors, sensor networks and information processing conference, (pp. 577-580).
Jolliffe, I. (2011). Principal component analysis (pp. 1094–1096). Berlin: Springer.
Kyle, A. S. (1985). Continuous auctions and insider trading. Econometrica, 53(6), 1316–1335.
Langevoort, D. (1991). Insider trading regulation (1991st ed.). New York: Clark Boardman.
Lin, X., Sun, L., Li, Y., Guo, Z., Li, Y., Zhong, K., et al. (2010). A random forest of combined features in the classification of cut tobacco based on gas chromatography fingerprinting. Talanta, 82(4), 1571–1575.
Liu, Y., Hong, Z., Tan, G., Dong, X., Yang, G., Zhao, L., et al. (2014). NMR and LC/MS-based global metabolomics to identify serum biomarkers differentiating hepatocellular carcinoma from liver cirrhosis. International Journal of Cancer, 135(3), 658–668.
Llorente, G., Michaely, S., Saar, G., & Wang, J. (2002). Dynamic volume-return relation of individual stocks. Review of Financial Studies, 15(4), 1005–1047.
Loss, L. (1970). The fiduciary concept as applied to trading by corporate” Insiders” in the United States. The Modern Law Review, 33(1), 34–52.
Lu, C., Zhao, X., & Dai, J. (2018). Corporate social responsibility and insider trading: evidence from china. Sustainability, 10(9), 3163.
Ma, Z. X., & Zhang, W. (2010). Notice of Retraction An discrimination research on insider trading and market manipulation in Chinese security market based on probabilistic neural network. In 2010 IEEE international conference on advanced management science (pp. 116–119).
Malvoni, M., De Giorgi, M. G., & Congedo, P. M. (2016). Photovoltaic forecast based on hybrid PCA–LSSVM using dimensionality reducted data. Neurocomputing, 211, 72–83.
Maug, E. (2002). Insider trading legislation and corporate governance. Social Science Electronic Publishing, 46(9), 1569–1597.
Mcinish, T. H., Frino, A., & Sensenbrenner, F. (2011). Strategic illegal insider trading prior to price sensitive announcements. Social Science Electronic Publishing, 18(3), 247–253.
Minenna, M. (2003). Insider trading, abnormal return and preferential information: supervising through a probabilistic model. Journal of Banking & Finance, 27(1), 59–86.
Peters, J., Baets, B. D., Verhoest, N. E. C., Samson, R., Degroeve, S., Becker, P. D., et al. (2007). Random forests as a tool for ecohydrological distribution modelling. Ecological Modelling, 207(2), 304–318.
Qi, M., & Zhang, G. P. (2001). An investigation of model selection criteria for neural network time series forecasting. European Journal of Operational Research, 132(3), 666–680.
Reid, M. K., & Spencer, K. L. (2009). Use of principal components analysis (pca) on estuarine sediment datasets: the effect of data pre-treatment. Environmental Pollution, 157(8), 2275–2281.
Schwert, G. W. (1999). Markup pricing in mergers & acquisitions. Social Science Electronic Publishing, 50(2), 247–264.
Shao, Q., & Feng, C. J. (2012). Pattern recognition of chatter gestation based on hybrid PCA-SVM. Applied Mechanics and Materials, 120, 190–194.
Shen, B. (2012) Study on formation mechanism and identification mechanism of insider trading in chinese stock market. Doctor Thesis of Chongqing University (in Chinese).
Shi, Y., & Jiang, X. (2004). Insider trading, volatility of stock price and information asymmetry. Social Science Electronic Publishing, 38(5), 581–598.
Su, M. Y. (2011). Using clustering to improve the knn-based classifiers for online anomaly network traffic identification. Journal of Network & Computer Applications, 34(2), 722–730.
Tamersoy, A., Khalil, E., Xie, B., Lenkey, S. L., Routledge, B. R., Chau, D. H., et al. (2014). Large-scale insider trading analysis: patterns and discoveries. Social Network Analysis and Mining, 4(1), 201.
Tian, M., Wang, X. A., Zhang, X., Yang, Z., Huang, J., & Chen, H. (2016). The implementation of a KNN classifier on FPGA with a parallel and pipelined architecture based on Predetermined Range Search. In 2016 13th IEEE international conference on solid-state and integrated circuit technology (pp. 1491–1493).
Vapnik, V. N. (1999). An overview of statistical learning theory. IEEE Transactions on Neural Networks, 10(5), 988–999.
Website source 1: China Securities Regulatory Commission (CSRC). Retrieved January 11, 2018 from www.csrc.gov.cn/.
Website source 2: CSMAR database, Retrieved January 11, 2018 from http://www.gtafe.com/WebShow/ShowDataService/1.
Website source 3: RESSET database, Retrieved January 11, 2018 from http://www.resset.cn/databases.
Zhang, G. P. (2003). Time series forecasting using a hybrid arima and neural network model. Neurocomputing, 50(1), 159–175.
Zhang, X. (2011). Empirical analysis of the insider trading’s characteristics in China stock market. In 2011 2nd international conference on artificial intelligence, management science and electronic commerce (pp. 6628–6631).
Acknowledgements
This work is funded by Hubei Provincial Department of Education (No. Q20171208), Science Foundation of China Three Gorges University (No. KJ2016A001), and Starting Grant of China Three Gorges University (No. 20170907).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
-
(1)
Volatility
It is the degree of variation of a security price time series over time that measured by the standard deviation of logarithmic returns.
-
(2)
ERCSS (Excess Return Compared with Same Scale):
It is the excess rate of return on a security over the average return of the securities with the same scale. The calculation formula is as follows:
$$ {\text{ERCSS}} = {\text{Return}}\;{\text{of}}\;{\text{security}} - {\text{Return}}\;{\text{of}}\;{\text{securities}}\;{\text{with}}\;{\text{same}}\;{\text{company}}\;{\text{scale}} $$ -
(3)
ERCSM (Excess Return Compared with Same Market):
It is the excess rate of return over the average return of the market. The calculation formula is as follows:
$$ {\text{ERCSM}} = {\text{Return}}\;{\text{of}}\;{\text{security}} - {\text{Return}}\;{\text{of}}\;{\text{market}} $$ -
(4)
ERCSR (Excess Return Compared with Same Risk)
It is the excess rate of return over the average return of the investment with same risk. The calculation formula is as follows:
$$ {\text{ERCSM}} = {\text{Return}}\;{\text{of}}\;{\text{security}} - {\text{Return}}\;{\text{of}}\;{\text{the}}\;{\text{investment}}\;{\text{with}}\;{\text{same}}\;{\text{risk}} $$ -
(5)
SC (Sigma Coefficient)
The SC is the standard deviation of security prices in a certain period.
-
(6)
TSTR (Total Share Turnover Rate)
It is the frequency of stock transfer in a certain period. The formula is as follows:
$$ {\text{TSTR}} = ({\text{Volume}}\;{\text{of}}\;{\text{transaction}}/{\text{Total}}\;{\text{stocks}}) \times \, 100\% $$ -
(7)
FSTR (Floating Stock Turnover Rate)
It is used to measure the degree of the frequency of stock transfer in a certain period. The formula is as follows:
$$ {\text{FSTR}} = ({\text{Volume}}\;{\text{of}}\;{\text{transaction}}/{\text{Floating}}\;{\text{stocks}}\;{\text{in}}\;{\text{circulation}}) \times \, 100\% $$ -
(8)
BC (Beta Coefficient)
The BC is a measure of an asset’s risk and return in relation to a market. A security’s beta coefficient is calculated by dividing the product of the covariance of the security’s returns and the benchmark’s returns by the product of the variance of the benchmark’s returns over a certain period.
-
(9)
P/E ratio (Price-Earning ratio)
P/E ratio is the ratio of a company’s stock price to the company’s earnings per share, which is used in valuing companies, which is calculated by:
$$ {\text{P/E}} = {\text{Share}}\;{\text{Price}}/{\text{Earnings}}\;{\text{Per}}\;{\text{Share}} $$ -
(10)
P/B ratio (Price-Book ratio)
The price-to-book ratio, or P/B ratio, is a financial ratio used to compare a company’s current market price to its book value. It is calculated as:
$$ {\text{P/B}} = {\text{Market Price Per Share}}/{\text{Book Value Per Share}}$$ -
(11)
P/S ratio (Price-Sales ratio)
Price–sales ratio, or P/S ratio, is a valuation metric for stocks. It is calculated by dividing the company’s stock price by the revenue per share:
$$ {\text{P/S}} = {\text{Share}}\;{\text{Price}}/{\text{Revenues}}\;{\text{Per}}\;{\text{Share}} $$ -
(12)
DR (Debt Ratio)
The DR is a financial ratio that indicates the percentage of a company’s assets that are provided via debt. It is the ratio of total liabilities and total assets. The calculation formula is as follows:
$$ {\text{DR}} = \left( {{\text{Total}}\;{\text{Liabilities}}/{\text{Total}}\;{\text{Assets}}} \right) \times \, 100\% $$ -
(13)
CR (Current Ratio)
It is a ratio that measures whether a company has enough resources to meet its short-term obligations. It compares a company’s current assets to its current liabilities:
$$ {\text{CR}} = ({\text{Current}}\;{\text{Assets}}/{\text{Current}}\;{\text{Liabilities}}) \times \, 100\% $$ -
(14)
OPR (Operating Profit Ratio)
It refers to the percentage of operating profit from operating income as a percentage of net sales. This percentage is used to comprehensively reflect the business efficiency of a company. The calculation formula is as follows:
$$ {\text{OPR}} = ({\text{Operating}}\;{\text{profit}}/{\text{Net Sales}}\times \, 100\% $$ -
(15)
QR (Quick Ratio)
The QR refers to the ratio of the company’s quick assets and current liabilities. It is calculated as follows:
$$ {\text{QR}} = ({\text{Quick}}\;{\text{assets}}/{\text{Current}}\;{\text{liabilities}}) \times 100\% $$ -
(16)
TAT (Total Asset Turnover)
The TAT is the ratio of the net sales income to the average total assets in a certain period of time. The calculation formula is as follows:
$$ {\text{TAT}} = {\text{Sales}}\;{\text{Revenue}}/{\text{Total}}\;{\text{Assets}} $$ -
(17)
RGR (Revenue Growth Rate)
The RGR refers to the ratio of the increase in operating income of the company to the total operating income of the previous year. The calculation formula is as follows:
$$ {\text{RGR}} = \left( {{\text{Increased}}\;{\text{operating}}\;{\text{income}}/{\text{Total}}\;{\text{operating}}\;{\text{income}}\;{\text{in}}\;{\text{the}}\;{\text{previous}}\;{\text{year}}} \right) \times 100\% $$ -
(18)
TAGR (Total Asset Growth Rate)
The TAGR is the ratio of the total assets growth of the listed company to the total assets at the beginning of the current year, reflecting the growth of the assets of the company in the current year. The calculation formula is as follows:
$$ {\text{TAGR}} = ({\text{Total asset growth in the current year}}/{\text{Total assets at the beginning of the current year}}) \times 100\% $$ -
(19)
ROE (Return On Equity)
The ROE is a ratio of net income and to the Shareholders Equity.
$$ {\text{ROE}} = {\text{Net}}\;{\text{Income}}/{\text{Shareholders Equity}}$$ -
(20)
ROA (Return On Assets)
The ROA is an indicator used to measure how much net profit is generated per unit of assets. The calculation formula is as follows:
$$ {\text{ROA}} = {\text{Net}}\;{\text{Income}}/{\text{Total Value of Assets}} $$ -
(21)
CR5 Index
It is the proportion of shares held by the top five largest shareholders
-
(22)
CR10 Index
It is the Proportion of shares held by the top ten largest shareholders
-
(23)
Z index
The Z index refers to the ratio of the largest shareholder’s stock amount to the second-largest shareholder’s stock amount.
-
(24)
H5 index
It is the sum of squares of the top five largest shareholders’ stock proportion. The closer of the index to 1, the greater the stock proportion difference between the top five largest shareholders.
-
(25)
H10 index
It is the sum of squares of the top ten largest shareholders’ stock proportion. The closer the index to 1, the greater the stock proportion difference between the top ten largest shareholders.
-
(26)
ARAGMS (Attendance Ratio at Annual General Meeting of Shareholders)
The ARAGMS is a ratio of the shareholder number with the total number of shareholders that attending at the annual general meeting.
Rights and permissions
About this article
Cite this article
Deng, S., Wang, C., Fu, Z. et al. An Intelligent System for Insider Trading Identification in Chinese Security Market. Comput Econ 57, 593–616 (2021). https://doi.org/10.1007/s10614-020-09970-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10614-020-09970-8