Examine Manipulated Datasets with Topology Data Analysis: A Case Study
Learning and mining technologies have been broadly applied to reveal the value of tremendous data and impact decision-making. Usually, the correctness of decisions roots in the truth of data for these technologies. Data fraud presents everywhere, and even if data were true, could data be maliciously manipulated by cyber-attackers. Methods have been long exploited to examine data authenticity, but are less effective when only values are manipulated without violating scopes and definitions. Then the decisions made from fraud and manipulated data are wrong or hijacked. It has been concluded that data manipulation is the latest technique in “the art of war in cyberspace.” Examining each data instance from its source is exhaustive and impossible, for example recollecting data for national consensus. In this paper, through a case study on the data of banknotes, we exploit Topological Data Analysis (TDA) for examining manipulated data. A fraction of data records are examined integrally other than individually. The possibility of using TDA to verify data efficiently is then evaluated. We first test the possibility of using TDA for the above detection, and then discuss the limitations of the state of the art. Although TDA is not so matured, it has been reported to be effective in many applications, and now our work evidences its usage for data anomalies.
KeywordsData manipulation Topological features TDA Mapper
This work is supported by the Key Program of National Natural Science Foundation of China with grant No. 61732013, and the Key R&D Project of Zhejiang Province with No. 2017C02036.
- 1.Adcock, A., Carlsson, E., Carlsson, G.: The ring of algebraic functions on persistence bar codes. Mathematics (2013)Google Scholar
- 7.Dey, T., Wang, Y.: Multiscale mapper: topological summarization via codomain covers. In: Twenty-Seventh ACM-SIAM Symposium on Discrete Algorithms, pp. 997–1013 (2016)Google Scholar
- 8.Edelsbrunner, Letscher, Zomorodian: Topological Persistence and Simplification, vol. 28. Discrete and Computational Geometry (2002)Google Scholar
- 10.Gade, S.V.: Credit card fraud detection using hidden Markov model. Indian Streams Res. J. 4(4), 37–48 (2014)Google Scholar
- 11.Ghosh, S., Reilly, D.: Card fraud detection with a neural-network. Twenty-Seventh Hawaii Int. Conf. Syst. Sci. 3, 621–630 (2011)Google Scholar
- 13.Lenz, H.J.: Data fraud detection: a first general perspective data fraud detection: a first general perspective. Int. Conf. Enterp. Inf. Syst. 227, 14–35 (2014)Google Scholar
- 19.Otter, N., Porter, M., Tillmann, U., Grindrod, P., Harrington, H.: A roadmap for the computation of persistent homology. Mathematics 6(1), 17 (2017)Google Scholar
- 24.Singh, G., Mémoli, F., Carlsson, G.: Topological methods for the analysis of high dimensional data sets and 3D object recognition. In: Eurographics Symposium on Point Based Graphics (2007)Google Scholar