Skip to main content

Text Mining Through Semi Automatic Semantic Annotation

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4333))

Abstract

The Web is the greatest information source in human history. Unfortunately, mining knowledge out of this source is a laborious and error-prone task. Many researchers believe that a solution to the problem can be founded on semantic annotations that need to be inserted in web-based documents and guide information extraction and knowledge mining. In this paper, we further elaborate a tool-supported process for semantic annotation of documents based on techniques and technologies traditionally used in software analysis and reverse engineering for large-scale legacy code bases. The outcomes of the paper include an experimental evaluation framework and empirical results based on two case studies adopted from the Tourism sector. The conclusions suggest that our approach can facilitate the semi-automatic annotation of large document bases.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Isakowitz, T., Bieber, M., Vitali, F.: Web Information Systems. Communications of the ACM 41(1), 78–80 (1998)

    Article  Google Scholar 

  2. Cordy, J., Dean, T., Malton, A., Schneider, K.: Source transformation in software engineering using the TXL transformation system. Information and Software Technology Journal 44, 827–837 (2002)

    Article  Google Scholar 

  3. Kiyavitskaya, N., Zeni, N., Cordy, J.R., Mich, L., Mylopoulos, J.: Applying Software Analysis Technology to Lightweight Semantic Markup of Document Text. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005. LNCS, vol. 3686, pp. 590–600. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  4. Cordy, J.: TXL – a language for programming language tools and applications. In: Proc. of the 4th Int. Workshop on Language Descriptions, Tools and Applications. Electronic Notes in Theoretical Computer Science, vol. 110, pp. 3–31 (2004)

    Google Scholar 

  5. Dean, T., Cordy, J., Schneider, K., Malton, A.: Experience using design recovery techniques to transform legacy systems. In: Proc. 17 Int. Conf. on Software Maintenance, pp. 622–631 (2001)

    Google Scholar 

  6. Cordy, J., Schneider, K., Dean, T., Malton, A.: HSML: Design-directed source code hotspots. In: Proc. of the 9th Int. Workshop on Program Comprehension, pp. 145–154 (2001)

    Google Scholar 

  7. Yang, Y.: An evaluation of statistical approaches to text categorization. Journal of Information Retrieval 1(1/2), 67–88 (1999)

    Google Scholar 

  8. Feldman, R., Fresko, M., Hirsh, H., Aumann, Y., Liphstat, O., Schler, Y., Rajman, M.: Knowledge Management: A Text Mining Approach. In: Proc. of the 2nd International Conference on Practical Aspects of Knowledge Management (PAKM 1998), pp. 29–30 (1998)

    Google Scholar 

  9. Nahm, U.Y., Mooney, R.J.: Text Mining with Information Extraction. In: Proc. of the Spring Symposium on Mining Answers from Texts and Knowledge Bases, Stanford, CA, pp. 60–67 (2002)

    Google Scholar 

  10. Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., McCurley, K.S., Rajagopalan, S., Tomkins, A., Tomlin, J.A., Zien, J.Y.: A Case for Automated Large-Scale Semantic Annotation. Journal of Web Semantics 1(1), 115–132 (2003)

    Google Scholar 

  11. Kiryakov, A., Popov, B., Terziev, I., Manov, D., Ognyanoff, D.: Semantic Annotation, Indexing, and Retrieval. Journal of Web Semantics 2(1), 49–79 (2005)

    Google Scholar 

  12. Sheth, A., Bertram, C., Avant, D., Hammond, B., Kochut, K., Warke, Y.: Managing Semantic Content for the Web. IEEE Internet Computing 6(4), 80–87 (2002)

    Article  Google Scholar 

  13. Nobata, C., Sekine, S.: Towards automatic acquisition of patterns for information extraction. In: Proc. of Int. Conf. on Computer Processing of Oriental Languages (1999)

    Google Scholar 

  14. Etzioni, O., Cafarella, M.J., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165, 91–134 (2005)

    Article  Google Scholar 

  15. Wessman, A., Liddle, S.W., Embley, D.W.: A generalized framework for an ontology-based data-extraction system. In: Proc. of the 4th Int. Conf. on Information Systems Technology and its Applications, pp. 239–253 (2005)

    Google Scholar 

  16. Muslea, I., Minton, S., Knoblock, C.A.: Active learning with strong and weak views: A case study on wrapper induction. In: Proc. of the 18th Int. Joint Conf. on Artificial Intelligence, pp. 415–420 (2003)

    Google Scholar 

  17. Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: Proc. of the 17th National Conf. on Artificial Intelligence, pp. 577–583 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kiyavitskaya, N., Zeni, N., Mich, L., Cordy, J.R., Mylopoulos, J. (2006). Text Mining Through Semi Automatic Semantic Annotation. In: Reimer, U., Karagiannis, D. (eds) Practical Aspects of Knowledge Management. PAKM 2006. Lecture Notes in Computer Science(), vol 4333. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11944935_13

Download citation

  • DOI: https://doi.org/10.1007/11944935_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49998-5

  • Online ISBN: 978-3-540-49999-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics