# Examine Manipulated Datasets with Topology Data Analysis: A Case Study

## Abstract

Learning and mining technologies have been broadly applied to reveal the value of tremendous data and impact decision-making. Usually, the correctness of decisions roots in the truth of data for these technologies. Data fraud presents everywhere, and even if data were true, could data be maliciously manipulated by cyber-attackers. Methods have been long exploited to examine data authenticity, but are less effective when only values are manipulated without violating scopes and definitions. Then the decisions made from fraud and manipulated data are wrong or hijacked. It has been concluded that data manipulation is the latest technique in “the art of war in cyberspace.” Examining each data instance from its source is exhaustive and impossible, for example recollecting data for national consensus. In this paper, through a case study on the data of banknotes, we exploit *Topological Data Analysis (TDA)* for examining manipulated data. A fraction of data records are examined integrally other than individually. The possibility of using TDA to verify data efficiently is then evaluated. We first test the possibility of using TDA for the above detection, and then discuss the limitations of the state of the art. Although TDA is not so matured, it has been reported to be effective in many applications, and now our work evidences its usage for data anomalies.

## Keywords

Data manipulation Topological features TDA Mapper## Notes

### Acknowledgements

This work is supported by the Key Program of National Natural Science Foundation of China with grant No. 61732013, and the Key R&D Project of Zhejiang Province with No. 2017C02036.

## References

- 1.Adcock, A., Carlsson, E., Carlsson, G.: The ring of algebraic functions on persistence bar codes. Mathematics (2013)Google Scholar
- 2.Bhattacharya, S., Ghrist, R., Kumar, V.: Persistent homology for path planning in uncertain environments. IEEE Trans. Rob.
**31**(3), 578–590 (2015)CrossRefGoogle Scholar - 3.Bolton, R.J., Hand, D.J.: Statistical fraud detection: a review. Stat. Sci.
**17**(3), 235–255 (2002)MathSciNetCrossRefGoogle Scholar - 4.Carlsson, G.: Topological pattern recognition for point cloud data. Acta Numerica
**23**, 289–368 (2014)MathSciNetCrossRefGoogle Scholar - 5.Carlsson, G., Zomorodian, A., Collins, A., Guibas, L.J.: Persistence barcodes for shapes. Int. J. Shape Model.
**11**(2), 149–187 (2008)CrossRefGoogle Scholar - 6.Dewoskin, D., Climent, J., Cruz-White, I., Vazquez, M., Park, C., Arsuaga, J.: Applications of computational homology to the analysis of treatment response in breast cancer patients. Topol. Appl.
**157**(1), 157–164 (2010)MathSciNetCrossRefGoogle Scholar - 7.Dey, T., Wang, Y.: Multiscale mapper: topological summarization via codomain covers. In: Twenty-Seventh ACM-SIAM Symposium on Discrete Algorithms, pp. 997–1013 (2016)Google Scholar
- 8.Edelsbrunner, Letscher, Zomorodian: Topological Persistence and Simplification, vol. 28. Discrete and Computational Geometry (2002)Google Scholar
- 9.Estévez, P., Held, C., Perez, C.: Subscription fraud prevention in telecommunications using fuzzy rules and neural networks. Expert Syst. Appl.
**31**(2), 337–344 (2006)CrossRefGoogle Scholar - 10.Gade, S.V.: Credit card fraud detection using hidden Markov model. Indian Streams Res. J.
**4**(4), 37–48 (2014)Google Scholar - 11.Ghosh, S., Reilly, D.: Card fraud detection with a neural-network. Twenty-Seventh Hawaii Int. Conf. Syst. Sci.
**3**, 621–630 (2011)Google Scholar - 12.Johnson, S.: Hierarchical clustering schemes. Psychometrika
**32**(3), 241–254 (1967)CrossRefGoogle Scholar - 13.Lenz, H.J.: Data fraud detection: a first general perspective data fraud detection: a first general perspective. Int. Conf. Enterp. Inf. Syst.
**227**, 14–35 (2014)Google Scholar - 14.Lum, P.Y., Singh, G., Lehman, A., Ishkanov, T., Vejdemo-Johansson, M., Alagappan, M., Carlsson, J., Carlsson, G.: Extracting insights from the shape of complex data using topology. Sci. Rep.
**3**(1), 1236 (2013)CrossRefGoogle Scholar - 15.Maria, C., Boissonnat, J., Glisse, M., Yvinec, M.: The gudhi library: simplicial complexes and persistent homology, in mathematical software. Int. Congr. Math. Softw.
**8592**, 167–174 (2014)zbMATHGoogle Scholar - 16.Nicolau, M., Levine, A.J., Carlsson, G.: Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proc. Natl. Acad. Sci. USA
**108**(17), 7265–7270 (2011)CrossRefGoogle Scholar - 17.Nicolaua, M., Levineb, A.J., Carlsson, G.: Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proc. Natl. Acad. Sci. USA
**108**(17), 7265–7270 (2011)CrossRefGoogle Scholar - 18.Oentaryo, R., Lim, E.P., Finegold, M., Lo, D., Zhu, F.: Detecting click fraud in online advertising: a data mining approach. J. Mach. Learn. Res.
**15**(1), 99–140 (2014)MathSciNetGoogle Scholar - 19.Otter, N., Porter, M., Tillmann, U., Grindrod, P., Harrington, H.: A roadmap for the computation of persistent homology. Mathematics
**6**(1), 17 (2017)Google Scholar - 20.Pokorny, F., Hawasly, M., Ramamoorthy, S.: Topological trajectory classification with filtrations of simplicial complexes and persistent homology. Int. J. Rob. Res.
**35**(1–3), 204–223 (2016)CrossRefGoogle Scholar - 21.Rahman, M., Rahman, M., Carbunar, B., Chau, D.: Search rank fraud and malware detection in Google Play. IEEE Trans. Knowl. Data Eng. Data Eng.
**PP**(99), 1329–1342 (2017)CrossRefGoogle Scholar - 22.Savic, A., Toth, G., Duponchel, L.: Topological data analysis (TDA) applied to reveal pedogenetic principles of european topsoil system. Sci. Total Environ.
**586**, 1091–1100 (2017)CrossRefGoogle Scholar - 23.de Silva, V., Ghrist, R.: Coverage in sensor networks via persistent homology. Algebraic Geom. Topol.
**PP**, 339–358 (2007)MathSciNetCrossRefGoogle Scholar - 24.Singh, G., Mémoli, F., Carlsson, G.: Topological methods for the analysis of high dimensional data sets and 3D object recognition. In: Eurographics Symposium on Point Based Graphics (2007)Google Scholar
- 25.Viaene, S., Derrig, R., Dedene, G.: A case study of applying boosting naive bayes to claim fraud diagnosis. IEEE Trans. Knowl. Data Eng.
**16**(5), 612–620 (2004)CrossRefGoogle Scholar - 26.Xia, K., Feng, X., Tong, Y., Wei, G.: Persistent homology for the quantitative prediction of fullerene stability. J. Comput. Chem.
**36**(6), 408–422 (2014)CrossRefGoogle Scholar - 27.Xia, K., Wei, G.W.: Persistent homology analysis of protein structure, flexibility, and folding. Int. J. Numer. Methods Biom. Eng.
**30**(8), 814–844 (2014)MathSciNetCrossRefGoogle Scholar - 28.Zomorodian, A., Carlsson, G.: Computing persistent homology. Discret. Comput. Geom.
**33**, 249–274 (2005)MathSciNetCrossRefGoogle Scholar