Enterprise Data Classification Using Semantic Web Technologies

  • David Ben-David
  • Tamar Domany
  • Abigail Tarem
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6497)

Abstract

Organizations today collect and store large amounts of data in various formats and locations. However they are sometimes required to locate all instances of a certain type of data. Good data classification allows marking enterprise data in a way that enables quick and efficient retrieval of information when needed. We introduce a generic, automatic classification method that exploits Semantic Web technologies to assist in several phases in the classification process; defining the classification requirements, performing the classification and representing the results. Using Semantic Web technologies enables flexible and extensible configuration, centralized management and uniform results. This approach creates general and maintainable classifications, and enables applying semantic queries, rule languages and inference on the results.

Keywords

Semantic Techniques RDF Classification modeling 

References

  1. 1.
    Beliefnetworks - Semantically Secure Unstructured Data Cassification. White paper (2008), http://www.beliefnetworks.net/docs/classunstructdata.pdf
  2. 2.
    Autonomy - Meaning-Based Computing Technology (2009) (press release), http://www.autonomy.com/content/News/Releases/2009/0817.en.html
  3. 3.
    Ben-Dor, A., Friedman, N., Yakhini, Z.: Class discovery in gene expression data. In: Annual Conference on Research in Computational Molecular Biology, pp. 31–38 (2001)Google Scholar
  4. 4.
    Berners-Lee, T., Connolly, D., Kagal, L., Scharf, Y., Hendler, J.A.: N3Logic: A logical framework for the World Wide Web. TPLP 8(3), 249–269 (2008)MathSciNetMATHGoogle Scholar
  5. 5.
    Bizer, C., Cyganiak, R.: D2RQ V0.2 - Treating Non-RDF Relational Databases as Virtual RDF Graphs. Tech. rep., School of Business & Economics at the Freie Universität Berlin (2004)Google Scholar
  6. 6.
    Boley, H., Forschungszentrum, D., Gmbh, K.I.: Relationships between logic programming and RDF. In: 14th Workshop Logische Programmierung (2000)Google Scholar
  7. 7.
    Butler, M., Reynolds, D., Dickinson, I., McBride, B., Grosvenor, D., Seaborne, A.: Semantic Middleware for E-Discovery. In: IEEE International Conference on Semantic Computing (2009)Google Scholar
  8. 8.
    Carroll, J.J., Dickinson, I., Dollin, C., Seaborne, D.R.A., Wilkinson, K.: Jena: Implementing the Semantic Web Recommendations. Tech. Rep., HP Laboratories (2003)Google Scholar
  9. 9.
    Castells, P., Foncillas, B., Lara, R., Rico, M., Alonso, J.L.: Semantic Web Technologies for Economic and Financial Information Management. In: Bussler, C.J., Davies, J., Fensel, D., Studer, R. (eds.) ESWS 2004. LNCS, vol. 3053, pp. 473–487. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  10. 10.
    Holger, P.H., Studer, L.R., Tran, T.: The NeOn Ontology Engineering Toolkit. In: ISWC (2009)Google Scholar
  11. 11.
    Jenkins, C., Jackson, M., Burden, P., Wallis, J.: Automatic RDF Metadata Generation for Resource Discovery. The International Journal of Computer and Telecommunications Networking 32, 1305–1320 (1999)Google Scholar
  12. 12.
    Kagal, L.: Rei: A Policy Language for the Me-Centric Project. Tech. rep., HP Labs (2002)Google Scholar
  13. 13.
    Kagal, L., Hanson, C., Weitzner, D.J.: Using Dependency Tracking to Provide Explanations for Policy Management. In: 2008 IEEE Workshop on Policies for Distributed Systems and Networks, pp. 54–61 (2008)Google Scholar
  14. 14.
    Krishnamurthy, R., Li, Y., Raghavan, S., Reiss, F., Vaithyanathan, S., Zhu, H.: SystemT: a system for declarative information extraction. ACM SIGMOD Record 37, 7–13 (2008)CrossRefGoogle Scholar
  15. 15.
    de Laborda, C.P., Conrad, S.: RelationalOWL: a data and schema representation format based on OWL. In: Conferences in Research and Practice in Information Technology, pp. 89–96 (2005)Google Scholar
  16. 16.
    Langegger, A., Wöß, W., Blöchl, M.: A Semantic Web Middleware for Virtual Data Integration on the Web. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 493–507. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  17. 17.
    Maedche, A., Motik, B., Stojanovic, L., Studer, R., Volz, R.: Ontologies for enterprise knowledge management. IEEE Intelligent Systems 18, 26–33 (2003)CrossRefGoogle Scholar
  18. 18.
    Miller, D.J.: A Mixture Model and EM-Based Algorithm for Class Discovery, Robust Classification, and Outlier Rejection in Mixed Labeled/Unlabeled Data Sets. IEEE Transactions on Pattern Analysis and Machine Intelligence (2003)Google Scholar
  19. 19.
    Prud’hommeaux, E., Seaborne, A.: SPARQL query language for RDF. W3C recommendation, W3C (2008)Google Scholar
  20. 20.
    Seaborne, A.: RDQL query language for RDF. W3C member submission, W3C (2004)Google Scholar
  21. 21.
    Song, Y., Zhou, D., Huang, J., Zha, I.G.C.Z., Giles, C.L.: Boosting the feature space: Text classification for unstructured data on the web. In: ICDM, pp. 1064–1069 (2006)Google Scholar
  22. 22.
    Sperberg-McQueen, C.M., Miller, E.: On mapping from colloquial XML to RDF using XSLT. In: Extreme Markup Languages (2004)Google Scholar
  23. 23.
    Staab, S., Studer, R., Schnurr, H.P., Sure, Y.: Knowledge Processes and Ontologies. IEEE Intelligent Systems 16, 26–34 (2001)CrossRefGoogle Scholar
  24. 24.
    Taghva, K., Borsack, J., Coombs, J., Condit, A., Lumos, S., Nartker, T.: Ontology-based Classification of Email. In: International Conference on Information Technology: Computers and Communications, p. 194 (2003)Google Scholar
  25. 25.
    Uszok, A., Bradshaw, J.M., Johnson, M., Jeffers, R., Tate, A., Dalton, J., Aitken, S.: KAoS Policy Management for Semantic Web Services. IEEE Intelligent Systems 19, 32–41 (2004)CrossRefGoogle Scholar
  26. 26.
    Warren, P.W., Davies, N.J.: Managing the risks from information - through semantic information management. BT Technology Journal 25, 178–191 (2007)CrossRefGoogle Scholar
  27. 27.
    Weitzner, D.J., Abelson, H., Berners-lee, T., Hanson, C., Hendler, J., Kagal, L., Mcguinness, D.L., Sussman, G.J., Waterman, K.K.: Transparent accountable data mining: New strategies for privacy protection. Tech. Rep., MIT-CSAIL (2006)Google Scholar
  28. 28.
    Xiaoyue, W., Rujiang, B.: Applying RDF Ontologies to Improve Text Classification. In: International Conference on Computational Intelligence and Natural Computing, vol. 2, pp. 118–121 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • David Ben-David
    • 1
  • Tamar Domany
    • 2
  • Abigail Tarem
    • 2
  1. 1.Technion – Israel Institute of TechnologyHaifaIsrael
  2. 2.IBM Research – HaifaHaifaIsrael

Personalised recommendations