Skip to main content

Marginality: A Numerical Mapping for Enhanced Exploitation of Taxonomic Attributes

  • Conference paper
Modeling Decisions for Artificial Intelligence (MDAI 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7647))

Abstract

Hierarchical attributes appear in taxonomic or ontology- based data (e.g. NACE economic activities, ICD-classified diseases, animal/plant species, etc.). Such taxonomic data are often exploited as if they were flat nominal data without hierarchy, which implies losing substantial information and analytical power. We introduce marginality, a numerical mapping for taxonomic data that allows using on those data many of the algorithms and analytical techniques designed for numerical data. We show how to compute descriptive statistics like the mean, the variance and the covariance on marginality-mapped data. Also, we define a mathematical distance between records including hierarchical attributes that is based on marginality-based variances. Such a distance paves the way to re-using on taxonomic data clustering and anonymization techniques designed for numerical data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Domingo-Ferrer, J., Mateo-Sanz, J.M.: Practical data-oriented microaggregation for statistical disclosure control. IEEE Transactions on Knowledge and Data Engineering 14(1), 189–201 (2002)

    Article  Google Scholar 

  2. Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous k-anonymity through microaggregation. Data Mining and Knowledge Discovery 11(2), 195–212 (2005)

    Article  MathSciNet  Google Scholar 

  3. Domingo-Ferrer, J., Sánchez, D., Rufian-Torrell, G.: Anonymization of clinical data based on semantic marginality (manuscript, 2012)

    Google Scholar 

  4. Domingo-Ferrer, J., Solanas, A.: A measure of nominal variance for hierarchical nominal attributes. Information Sciences 178(24), 4644–4655 (2008); Erratum in Information Sciences 179(20), 3732 (2009)

    Article  MathSciNet  Google Scholar 

  5. Duncan, G.T., Elliot, M., Salazar-González, J.-J.: Statistical Confidentiality: Principles and Practice. Springer, New York (2011)

    Book  MATH  Google Scholar 

  6. Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Lenz, R., Longhurst, J., Schulte-Nordholt, E., Seri, G., DeWolf, P.-P.: Handbook on Statistical Disclosure Control (version 1.2). ESSNET SDC Project (2010), http://neon.vb.cbs.nl/casc

  7. Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Schulte Nordholt, E., Spicer, K., De Wolf, P.P.: Statistical Disclosure Control. Wiley, New York (2012)

    Google Scholar 

  8. ICD9 - International Classification of Diseases, 9th Revision, Clinical Modification, 6th edn., October 1 (2008), http://icd9cm.chrisendres.com/

  9. ISIC Rev. 4 - International Standard Industrial Classification of All Economic Activities, United Nations Statistics Division, http://unstats.un.org/unsd/cr/registry/regcst.asp?Cl=27&prn=yes

  10. Lenz, R.: Methoden der Geheimhaltung wirtschaftsstatistischer Einzeldaten und ihre Schutzwirkung. Statistik und Wissenschaft, vol. 18. Statistisches Bundesamt, Wiesbaden (2010)

    Google Scholar 

  11. McNeill, J., et al. (eds.): International Code of Botanical Nomenclature (Vienna Code). International Association for Plant Taxonomy (2006), http://ibot.sav.sk/icbn/main.htm

  12. NACE Rev. 2 - Statistical Classification of Economic Activities in the European Community, Rev. 2. Eurostat, European Commission (2008), http://epp.eurostat.ec.europa.eu/cache/ITY_OFFPUB/KS-RA-07-015/EN/KS-RA-07-015-EN.PDF

  13. Reid, K.B.: Centrality measures in trees. In: Kaul, H., Mulder, H.M. (eds.) Advances in Interdisciplinary Applied Discrete Mathematics, pp. 167–197. World Scientific eBook (2010)

    Google Scholar 

  14. Ride, W.D.L., et al. (eds.): International Code of Zoological Nomenclature, 4th edn., January 1. International Union of Biological Sciences (2000), http://www.nhm.ac.uk/hosted-sites/iczn/code/

  15. Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13(6), 1010–1027 (2001)

    Article  Google Scholar 

  16. Sánchez, D., Batet, M., Isern, D., Valls, A.: Ontology-based semantic similarity: a new feature-based approach. Expert Systems with Applications 39(9), 7718–7728 (2012)

    Article  Google Scholar 

  17. Willenborg, L., DeWaal, T.: Elements of Statistical Disclosure Control. Springer, New York (2001)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Domingo-Ferrer, J. (2012). Marginality: A Numerical Mapping for Enhanced Exploitation of Taxonomic Attributes. In: Torra, V., Narukawa, Y., López, B., Villaret, M. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2012. Lecture Notes in Computer Science(), vol 7647. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34620-0_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34620-0_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34619-4

  • Online ISBN: 978-3-642-34620-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics