Advertisement

Information Measures of Frequency Distributions with an Application to Labeled Graphs

  • Cliff Joslyn
  • Emilie Purvine
Conference paper
Part of the Association for Women in Mathematics Series book series (AWMS, volume 6)

Abstract

The problem of describing the distribution of labels over a set of objects is common in many domains. Cyber security, social media, and protein interactions all care about the manner in which labels are distributed among different objects. In this paper we present three interacting statistical measures on label distributions, thought of as integer partitions, inspired by entropy and information theory. Of central concern to us is how the open- versus closed-world semantics of one’s problem leads to different ways that information about the support of a distribution is accounted for. In particular, we can consider the number of labels seen in a particular data set in relation to both the number of items and the number of labels available, if known. This will lead us to consider both two alternate entropy normalizations, and a new measure specifically of support size, based not on entropy but on nonspecificity measures as used in nontraditional information theory. The entropy- and nonspecificity-based measures are related in their ability to index integer partitions within Young’s lattice. Labeled graphs are discussed as a specific case of labels distributed over a set of edges. We describe a use case in cyber security using a labeled directed multigraph of IPFLOW. Finally, we show how these measures respond when labels are updated in certain ways corresponding to particular changes of the Young’s diagram of an integer partition.

Keywords

Information measures Distributions Entropy Nonspecificity Labeled graph 

Mathematics Subject Classification

94A17 05C90 05A17 

Notes

Acknowledgments

The research described in this paper is part of the Asymmetric Resilient Cybersecurity Initiative at Pacific Northwest National Laboratory. It was conducted under the Laboratory Directed Research and Development Program at PNNL, a multi-program national laboratory operated by Battelle for the U.S. Department of Energy.

References

  1. 1.
    G. Birkhoff, Lattice Theory, vol. 25, 3rd edn. (American Mathematical Society, Providence, 1940)CrossRefzbMATHGoogle Scholar
  2. 2.
    G. de Cooman, D. Ruan, E. Kerre (eds.), Foundations and Applications of Possibility Theory (World Scientific, Singapore, 1995)Google Scholar
  3. 3.
    D. Dubois, H. Prade, Possibility Theory (Plenum Press, New York, 1988)CrossRefzbMATHGoogle Scholar
  4. 4.
    C. Joslyn, W. Cowley, E. Hogan, B. Olsen, Discrete mathematical approaches to graph-based traffic analysis, in 2014 International Workshop on Engineering Cyber Security and Resilience (ECSaR14) (2014)Google Scholar
  5. 5.
    G. Klir, Uncertainty and Information: Foundations of Generalized Information Theory (Wiley, Hoboken, 2006)zbMATHGoogle Scholar
  6. 6.
    R.P. Stanley, Enumerative Combinatorics, vol. 1 (Cambridge UP, Cambridge, 1997)CrossRefzbMATHGoogle Scholar
  7. 7.
    Visual Analytics Science and Technology (VAST) Challenge (2013). http://vacommunity.org/VAST+Challenge+2013
  8. 8.
    O. Wolkenhauer, Possibility Theory with Applications to Data Analysis (Wiley, New York, 1998)zbMATHGoogle Scholar
  9. 9.
    G.M. Ziegler, On the poset of partitions of an integer. J. Comb. Theory, Ser. A 42(2), 215–222 (1986)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Pacific Northwest National LaboratorySeattleUSA

Personalised recommendations