An Innovative Framework for Securing Unstructured Documents

  • Flora Amato
  • Valentina Casola
  • Antonino Mazzeo
  • Sara Romano
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6694)


The coexistence of both structured and unstructured data represents a huge limitation for documents management in public and private contexts. In order to identify and protect specific resources within monolithic documents we have exploited the adoption of different techniques aiming to analyze texts and automatically extract relevant information. In this paper we propose an innovative framework for data transformation that is based on a semantic approach and can be adapted in many different contexts; in particular, we will illustrate the applicability of such a framework for the formalization and protection of e-health medical records.


Knowledge extraction document transformation fine-grain document protection 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Yang, Y., Webb, G.I.: Discretization for naive-Bayes learning: managing discretization bias and variance. In: Machine Learning, vol. 74(1), pp. 39–74. Springer, Heidelberg (2009)Google Scholar
  2. 2.
    Safavian, S.R., Landgrebe, D.: A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man and Cybernetics 21(3), 660–674 (2002)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Kim, B.S., Park, S.B.: A Fast k Nearest Neighbor Finding Algorithm Based on the Ordered Partition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 761–766 (2009)Google Scholar
  4. 4.
    Dagan, I., Termight, C.K.: Identifying and translating technical terminology. In: Proceedings of the fourth conference on applied natural language processing, pp. 34–40. Morgan Kaufmann Publishers Inc., San Francisco (1994)CrossRefGoogle Scholar
  5. 5.
    Manning, C., Schtze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)Google Scholar
  6. 6.
    Amato, F., Casola, V., Mazzeo, A., Romano, S.: A semantic based methodology to classify and protect sensitive data in medical records. In: IEEE Proc. of IAS 2010, Atlanta, USA, pp. 240–246 (2010)Google Scholar
  7. 7.
    Amato, F., Casola, V., Mazzocca, N., Romano, S.: A semantic-based document processing framework: a security perspective. Accepted in: Complex, Intelligent, and Software Intensive Systems 2011, Seoul, Korea (June 2011)Google Scholar
  8. 8.
    The Medical Subject Headings comprise the National Library of Medicine’s
  9. 9.
    Berthold, M.R., Cebron, N., Dill, F., Di Fatta, G., Gabriel, T.R., Georg, F., Meinl, T., Ohl, P.: KNIME: The Konstanz Information Miner. In: Proceedings of the 4th Annual Industrial Simulation Conference, Workshop on Multi-agent Systems and Simulations, Palermo (2006)Google Scholar
  10. 10.
    HL7 CDA Release 2.0 2005. The HL7 Version 3 Standard: Clinical Data Architecture, Release 2.0, ANSI StandardGoogle Scholar
  11. 11.
    Bolasco, S.: Statistica testuale e text mining: alcuni paradigmi applicativi, Quaderni di Statistica, Liguori Ed., 7, p. 17-53 (2005)Google Scholar
  12. 12.
    The OASIS technical commitee: Xacml: extensible access control markup language (2005),

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Flora Amato
    • 1
  • Valentina Casola
    • 1
  • Antonino Mazzeo
    • 1
  • Sara Romano
    • 1
  1. 1.Dipartimento di Informatica e SistemisticaUniversity of Naples Federico IINapoliItaly

Personalised recommendations