Text Mining Through Semi Automatic Semantic Annotation

  • Nadzeya Kiyavitskaya
  • Nicola Zeni
  • Luisa Mich
  • James R. Cordy
  • John Mylopoulos
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4333)


The Web is the greatest information source in human history. Unfortunately, mining knowledge out of this source is a laborious and error-prone task. Many researchers believe that a solution to the problem can be founded on semantic annotations that need to be inserted in web-based documents and guide information extraction and knowledge mining. In this paper, we further elaborate a tool-supported process for semantic annotation of documents based on techniques and technologies traditionally used in software analysis and reverse engineering for large-scale legacy code bases. The outcomes of the paper include an experimental evaluation framework and empirical results based on two case studies adopted from the Tourism sector. The conclusions suggest that our approach can facilitate the semi-automatic annotation of large document bases.


semantic annotation large-scale document analysis conceptual schemas software analysis 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Isakowitz, T., Bieber, M., Vitali, F.: Web Information Systems. Communications of the ACM 41(1), 78–80 (1998)CrossRefGoogle Scholar
  2. 2.
    Cordy, J., Dean, T., Malton, A., Schneider, K.: Source transformation in software engineering using the TXL transformation system. Information and Software Technology Journal 44, 827–837 (2002)CrossRefGoogle Scholar
  3. 3.
    Kiyavitskaya, N., Zeni, N., Cordy, J.R., Mich, L., Mylopoulos, J.: Applying Software Analysis Technology to Lightweight Semantic Markup of Document Text. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005. LNCS, vol. 3686, pp. 590–600. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  4. 4.
    Cordy, J.: TXL – a language for programming language tools and applications. In: Proc. of the 4th Int. Workshop on Language Descriptions, Tools and Applications. Electronic Notes in Theoretical Computer Science, vol. 110, pp. 3–31 (2004)Google Scholar
  5. 5.
    Dean, T., Cordy, J., Schneider, K., Malton, A.: Experience using design recovery techniques to transform legacy systems. In: Proc. 17 Int. Conf. on Software Maintenance, pp. 622–631 (2001)Google Scholar
  6. 6.
    Cordy, J., Schneider, K., Dean, T., Malton, A.: HSML: Design-directed source code hotspots. In: Proc. of the 9th Int. Workshop on Program Comprehension, pp. 145–154 (2001)Google Scholar
  7. 7.
    Yang, Y.: An evaluation of statistical approaches to text categorization. Journal of Information Retrieval 1(1/2), 67–88 (1999)Google Scholar
  8. 8.
    Feldman, R., Fresko, M., Hirsh, H., Aumann, Y., Liphstat, O., Schler, Y., Rajman, M.: Knowledge Management: A Text Mining Approach. In: Proc. of the 2nd International Conference on Practical Aspects of Knowledge Management (PAKM 1998), pp. 29–30 (1998)Google Scholar
  9. 9.
    Nahm, U.Y., Mooney, R.J.: Text Mining with Information Extraction. In: Proc. of the Spring Symposium on Mining Answers from Texts and Knowledge Bases, Stanford, CA, pp. 60–67 (2002)Google Scholar
  10. 10.
    Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., McCurley, K.S., Rajagopalan, S., Tomkins, A., Tomlin, J.A., Zien, J.Y.: A Case for Automated Large-Scale Semantic Annotation. Journal of Web Semantics 1(1), 115–132 (2003)Google Scholar
  11. 11.
    Kiryakov, A., Popov, B., Terziev, I., Manov, D., Ognyanoff, D.: Semantic Annotation, Indexing, and Retrieval. Journal of Web Semantics 2(1), 49–79 (2005)Google Scholar
  12. 12.
    Sheth, A., Bertram, C., Avant, D., Hammond, B., Kochut, K., Warke, Y.: Managing Semantic Content for the Web. IEEE Internet Computing 6(4), 80–87 (2002)CrossRefGoogle Scholar
  13. 13.
    Nobata, C., Sekine, S.: Towards automatic acquisition of patterns for information extraction. In: Proc. of Int. Conf. on Computer Processing of Oriental Languages (1999)Google Scholar
  14. 14.
    Etzioni, O., Cafarella, M.J., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165, 91–134 (2005)CrossRefGoogle Scholar
  15. 15.
    Wessman, A., Liddle, S.W., Embley, D.W.: A generalized framework for an ontology-based data-extraction system. In: Proc. of the 4th Int. Conf. on Information Systems Technology and its Applications, pp. 239–253 (2005)Google Scholar
  16. 16.
    Muslea, I., Minton, S., Knoblock, C.A.: Active learning with strong and weak views: A case study on wrapper induction. In: Proc. of the 18th Int. Joint Conf. on Artificial Intelligence, pp. 415–420 (2003)Google Scholar
  17. 17.
    Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: Proc. of the 17th National Conf. on Artificial Intelligence, pp. 577–583 (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Nadzeya Kiyavitskaya
    • 1
  • Nicola Zeni
    • 1
  • Luisa Mich
    • 2
  • James R. Cordy
    • 3
  • John Mylopoulos
    • 4
  1. 1.Dept. of Information and Communication TechnologyUniversity of TrentoItaly
  2. 2.Dept. of Computer and Management SciencesUniversity of TrentoItaly
  3. 3.School of ComputingQueens UniversityKingstonCanada
  4. 4.Dept. of Computer ScienceUniversity of TorontoOntarioCanada

Personalised recommendations