A Topological Approach to Representational Data Models

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10904)


As data accumulate faster and bigger, building representational models has turned into an art form. Despite sharing common data types, each scientific discipline often takes a different approach. In this work, we propose representational models grounded in the mathematics of algebraic topology to understand foundational data types. We present hypergraphs for multi-relational data, point clouds for vector data, and sheaf models when both data types are present and interrelated. These three models use similar principles from algebraic topology and provide a domain-agnostic framework. We will discuss each method, provide references to their foundational mathematical papers, and give examples of their use.


Relational data Vector data Hypergraph models Topological data models Data-agnostic models 



We wish to thank Prof. Francisco Munoz, University of Chile, for providing the sample power grid data studied in Sect. 3.3, and Will Hutton, Pacific Northwest National Laboratory, and his team for providing the data described in Sect. 3.2. This work was supported in part by (a) the Applied Mathematics Program of the Office of Advanced Scientific Computing Research within the Office of Science of the U.S. Department of Energy (DOE) through the Multifaceted Mathematics for Complex Energy Systems (M2ACS) project, (b) the Asymmetric Resilient Cybersecurity Initiative at Pacific Northwest National Laboratory, and (c) the High Performance Data Analytics program at the Pacific Northwest National Laboratory (PNNL). PNNL is operated by Battelle for the United States Department of Energy under Contract DE-AC05-76RL01830.


  1. 1.
    Klimt, B., Yang, Y.: The enron corpus: a new dataset for email classification research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 217–226. Springer, Heidelberg (2004). Scholar
  2. 2.
    Pržulj, N.: Protein-protein interactions: making sense of networks via graph-theoretic modeling. BioEssays 33(2), 115–123 (2011)CrossRefGoogle Scholar
  3. 3.
    Newman, M.E.J.: Coauthorship networks and patterns of scientific collaboration. Proc. Nat. Acad. Sci. 101(Suppl. 1), 5200–5205 (2004)CrossRefGoogle Scholar
  4. 4.
    Silva, J., Willett, R.: Hypergraph-based anomaly detection of high-dimensional co-occurrences. IEEE Trans. Pattern Anal. Mach. Intell. 31, 563–569 (2009)CrossRefGoogle Scholar
  5. 5.
    Guzzo, A., Pugliese, A., Rullo, A., Saccá, D., Piccolo, A.: Malevolent activity detection with hypergraph-based models. IEEE Trans. Knowl. Data Eng. 29, 1115–1128 (2017)CrossRefGoogle Scholar
  6. 6.
    Hwang, T., Tian, Z., Kuangy, R., Kocher, J.P.: Learning on weighted hypergraphs to integrate protein interactions and gene expressions for cancer outcome prediction. In: International Conference on Data Mining (2008)Google Scholar
  7. 7.
    Winterbach, W., Mieghem, P.V., Reinders, M., Wang, H., de Ridder, D.: Topology of molecular interaction networks. BMC Syst. Biol. 7(1), 90 (2013)CrossRefGoogle Scholar
  8. 8.
    Munkres, J.R.: Topology. Prentice Hall Incorporated, Upper Saddle River (2000)zbMATHGoogle Scholar
  9. 9.
    Hatcher, A.: Algebraic Topology. Cambridge University Press, Cambridge (2002)zbMATHGoogle Scholar
  10. 10.
    Berge, C.: Hypergraphs: Combinatorics of Finite Sets. North Holland, Amsterdam (1989)zbMATHGoogle Scholar
  11. 11.
    Chung, F.: The laplacian of a hypergraph. Expanding graphs (DIMACS series), pp. 21–36 (1993)CrossRefGoogle Scholar
  12. 12.
    Cooper, J., Dutle, A.: Spectra of uniform hypergraphs. Linear Algebra Appl. 436(9), 3268–3292 (2012)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Krivelevich, M., Sudakov, B.: Approximate coloring of uniform hypergraphs. J. Algorithms 49(1), 2–12 (2003)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Rödl, V., Skokan, J.: Regularity lemma for k-uniform hypergraphs. Random Struct. Algorithms 25(1), 1–42 (2004)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Alon, N.: Transversal numbers of uniform hypergraphs. Graphs Comb. 6(1), 1–4 (1990)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Sarma, A.D., Molla, A.R., Pandurangan, G., Upfal, E.: Fast distributed pagerank computation. Theoret. Comput. Sci. 561, 113–121 (2015)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Lu, L., Peng, X.: High-ordered random walks and generalized laplacians on hypergraphs. In: Frieze, A., Horn, P., Prałat, P. (eds.) WAW 2011. LNCS, vol. 6732, pp. 14–25. Springer, Heidelberg (2011). Scholar
  18. 18.
    Wang, J., Lee, T.T.: Paths and cycles of hypergraphs. Sci. China, Ser. A Math. 42(1), 1–12 (1999)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Bermond, J.C., Heydemann, M.C., Sotteau, D.: Line graphs of hypergraphs I. Discret. Math. 18(3), 235–241 (1977)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Pienta, R., Abello, J., Kahng, M., Chau, D.H.: Scalable graph exploration and visualization: sensemaking challenges and opportunities. In: 2015 International Conference on Big Data and Smart Computing (BigComp), pp. 271–278. IEEE (2015)Google Scholar
  21. 21.
    Chau, D.H., Kittur, A., Hong, J.I., Faloutsos, C.: Apolo: interactive large graph sensemaking by combining machine learning and visualization. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 739–742. ACM (2011)Google Scholar
  22. 22.
    Chau, D.H.P.: Data mining meets HCI: making sense of large graphs. Ph.D. thesis, Carnegie Mellon University (2012)Google Scholar
  23. 23.
    Van Ham, F., Perer, A.: “Search, show context, expand on demand”: supporting large graph exploration with degree-of-interest. IEEE Trans. Vis. Comput. Graph. 15(6), 953–960 (2009)CrossRefGoogle Scholar
  24. 24.
    Herman, I., Melançon, G., Marshall, M.S.: Graph visualization and navigation in information visualization: a survey. IEEE Trans. Vis. Comput. Graph. 6(1), 24–43 (2000)CrossRefGoogle Scholar
  25. 25.
    Hoff, P.D., Raftery, A.E., Handcock, M.S.: Latent space approaches to social network analysis. J. Am. Stat. Assoc. 97(460), 1090–1098 (2002)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Bader, B.W., Berry, M.W., Browne, M.: Discussion tracking in enron email using PARAFAC. In: Berry, M.W., Castellanos, M. (eds.) Survey of Text Mining II. Springer, London (2008). Scholar
  27. 27.
    Decherchi, S., Tacconi, S., Redi, J., Leoncini, A., Sangiacomo, F., Zunino, R.: Text clustering for digital forensics analysis. In: Herrero, Á., Gastaldo, P., Zunino, R., Corchado, E. (eds.) Computational Intelligence in Security for Information Systems, pp. 29–36. Springer, Heidelberg (2009). Scholar
  28. 28.
    Diesner, J., Carley, K.M.: Exploration of communication networks from the enron email corpus. In: Proceedings of Workshop on Link Analysis, Counterterrorism and Security, SIAM International Conference on Data Mining 2005, pp. 3–14 (2005)Google Scholar
  29. 29.
    Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6(1), 20–123 (2010)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Chapanond, A., Krishnamoorthy, M.S., Yener, B.: Graph theoretic and spectral analysis of enron email data. Comput. Math. Organ. Theory 11(3), 265–281 (2005)CrossRefGoogle Scholar
  31. 31.
    Priebe, C.E., Conroy, J.M., Marchette, D.J., Park, Y.: Scan statistics on enron graphs. Comput. Math. Organ. Theory 11(3), 229–247 (2005)CrossRefGoogle Scholar
  32. 32.
    KONECT: Enron Network Dataset, April 2017.
  33. 33.
    Aksoy, S.G., Kolda, T.G., Pinar, A.: Measuring and modeling bipartite graphs with community structure. J. Complex Netw. 5, 581–603 (2017)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Takens, F.: Detecting strange attractors in turbulence. In: Rand, D., Young, L.S. (eds.) Dynamical Systems and Turbulence. Springer, Heidelberg (1981). Scholar
  35. 35.
    Khasawneh, F.A., Munch, E.: Chatter detection in turning using persistent homology. Mech. Syst. Sig. Process. 70–71, 527–541 (2016)CrossRefGoogle Scholar
  36. 36.
    Bruillard, P., Nowak, K., Purvine, E.: Anomaly detection using persistent homology. In: Cybersecurity Symposium 2016. IEEE (2016)Google Scholar
  37. 37.
    Edelsbrunner, H., Harer, J.: Persistent homology-a survey. In: Surveys on Discrete and Computational Geometry: Twenty Years Later. AMS (2007)Google Scholar
  38. 38.
    Ghrist, R.: Barcodes: the persistent topology of data. Bull. AMS 45(1), 61–75 (2008)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Cohen-Steiner, D., Edelsbrunner, H., Harer, J., Mileyko, Y.: Lipschitz functions have L p-stable persistence. Found. Comput. Math. 10(2), 127–139 (2010)MathSciNetCrossRefGoogle Scholar
  40. 40.
    Chazal, F., Cohen-Steiner, D., Glisse, M., Guibas, L.J., Oudot, S.Y.: Proximity of persistence modules and their diagrams. In: Proceedings of the Twenty-fifth Annual Symposium on Computational Geometry, SCG 2009, pp. 237–246. ACM, New York (2009)Google Scholar
  41. 41.
    Chazal, F., de Silva, V., Oudot, S.: Persistence stability for geometric complexes. Geom. Dedicata 173(1), 193–214 (2014)MathSciNetCrossRefGoogle Scholar
  42. 42.
    Chazal, F.: High-Dimensional Topological Data Analysis. CRC Press, Boca Raton (2017)Google Scholar
  43. 43.
    Singh, G., Mémoli, F., Carlsson, G.E.: Topological methods for the analysis of high dimensional data sets and 3D object recognition. In: SPBG, pp. 91–100 (2007)Google Scholar
  44. 44.
    Lum, P.Y., Singh, G., Lehman, A., Ishkanov, T., Vejdemo-Johansson, M., Alagappan, M., Carlsson, J., Carlsson, G.: Extracting insights from the shape of complex data using topology. Sci. Rep. 3, srep01236 (2013)CrossRefGoogle Scholar
  45. 45.
    Munch, E.: A user’s guide to topological data analysis. J. Learn. Anal. 4, 47–61 (2017)CrossRefGoogle Scholar
  46. 46.
    Korolov, M., Myers, L.: What is the cyber kill chain? Why it’s not always the right approach to cyber attacks. CSO Online, November 2017Google Scholar
  47. 47.
    Padhy, N.P.: Unit commitment-a bibliographical survey. IEEE Trans. Power Syst. 19(2), 1196–1205 (2004)MathSciNetCrossRefGoogle Scholar
  48. 48.
    Price, J.E., Goodin, J.: Reduced network modeling of WECC as a market design prototype. In: Power and Energy Society General Meeting, pp. 1–6. IEEE (2011)Google Scholar
  49. 49.
    Yu, N.P., Liu, C.C., Price, J.: Evaluation of market rules using a multi-agent system method. IEEE Trans. Power Syst. 25(1), 470–479 (2010)CrossRefGoogle Scholar
  50. 50.
    Jung, J., Liu, C.C., Tanimoto, S., Vital, V.: Adaptation in load sheddding under vulnerable operating conditions. IEEE Trans. Power Syst. 17(4), 1199–1205 (2002)CrossRefGoogle Scholar
  51. 51.
    Munoz, F.D., Hobbs, B.F., Ho, J.L., Kasina, S.: An engineering-economic approach to transmission planning under market and regulatory uncertainties: WECC case study. IEEE Trans. Power Syst. 29(1), 307–317 (2014)CrossRefGoogle Scholar
  52. 52.
    Robinson, M.: Sheaves are the canonical datastructure for information integration. Inf. Fusion 36, 208–224 (2017)CrossRefGoogle Scholar
  53. 53.
    Joslyn, C.A., Hogan, E.A., Robinson, M.: Towards a topological framework for integrating semantic information sources. In: Semantic Technology for Intelligence, Defense and Security (2014)Google Scholar
  54. 54.
    Dowker, C.: Homology groups of relations. Ann. Math. 56, 84–95 (1952)MathSciNetCrossRefGoogle Scholar
  55. 55.
    Robinson, M.: Sheaf and duality methods for analyzing multi-model systems. In: Pesenson, I., Gia, Q.L., Mayeli, A., Mhaskar, H., Zhou, D.X. (eds.) Novel Methods in Harmonic Analysis. Birkhäuser (2017, in press)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Pacific Northwest National LaboratorySeattleUSA
  2. 2.Pacific Northwest National LaboratoryRichlandUSA
  3. 3.American UniversityWashington, DCUSA

Personalised recommendations