Skip to main content

Using Document Dimensions for Enhanced Information Retrieval

  • Conference paper
Applied Computing (AACC 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3285))

Included in the following conference series:

  • 1430 Accesses

Abstract

Conventional document search techniques are constrained by attempting to match individual keywords or phrases to source documents. Thus, these techniques miss out documents that contain semantically similar terms, thereby achieving a relatively low degree of recall. At the same time, processing capabilities and tools for syntactic and semantic analysis of language have advanced to the point where an index-time linguistic analysis of source documents is both feasible and realistic. In this paper, we introduce document dimensions, a means of classifying or grouping terms discovered in documents. Using an enhanced version of Jakarta Lucene[1], we demonstrate that supplementing keyword analysis with some syntactic and semantic information can indeed enhance the quality of information retrieval results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jakarta Lucene, http://jakarta.apache.org/lucene/docs/index.html

  2. van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths, London (1980)

    Google Scholar 

  3. Salton, G., Y.C.: On the specification of term values in automatic indexing. Journal of Documentation 29, 351–372 (1973)

    Google Scholar 

  4. Brin, S., Page, L.: Anatomy of a hypertextual web search engine. In: WWW7 (1998)

    Google Scholar 

  5. Brooks, T.: The semantic distance model of relevance assessment. In: Proceedings of the 61 st Annual Meeting of ASIS, Pittsburgh, PA. Information Access in the Global Information Economy, vol. 35, pp. 33–44 (1998)

    Google Scholar 

  6. Budanitsky, A.: Semantic distance in wordnet: An experimental, applicationoriented evaluation of five measures. In: Workshop on WordNet and Other Lexical Resources, in NAACL 2000, Pittsburgh, PA, June 2001 (2000)

    Google Scholar 

  7. Dixon, M.: (An overview of document mining technology)

    Google Scholar 

  8. Rijke, M.V.: Beyond document retrieval. In: Trento, Nice (2003)

    Google Scholar 

  9. Yang, K.: Combining Text-, Link-, and Classification-based Retrieval Methods to Enhance Information Discovery on the Web. PhD thesis, University of North Carolina (2002)

    Google Scholar 

  10. Modelling and mining of network information systems, http://www.mathstat.dal.ca/~mominis/

  11. Lawrence, S., Giles, C.: Indexing and retrieval of scientific literature. In: Eighth International Conference on Information and Knowledge Management (1999)

    Google Scholar 

  12. Lawrence, S.: Context in web search. In: IEEE Data Engineering Bulletin (2000)

    Google Scholar 

  13. Hu, W.: An overview of world wide web search technologies. In: International Conference on Information Systems, Analysis and Synthesis, vol. 12 (2001)

    Google Scholar 

  14. Etzioni, O.: On the instability of search engines. In: Content-Based Multimedia Information Access (RIAO), Paris, France (2000)

    Google Scholar 

  15. WebFountain, http://www.almaden.ibm.com/webfountain/

  16. Eder, J., Koncilia, C.: Evolution of dimension data in temporal datawarehouses. Springer, Heidelberg (1998)

    Google Scholar 

  17. Roellke, T.: The accessibility dimension for structured document retrieval. Journal of Documentation (1998)

    Google Scholar 

  18. Mothé, J.: Information mining: using document dimensions to analyse a document set interactively. In: European Colloquium on IR Research: ECIR, pp. 66–77 (2001)

    Google Scholar 

  19. Mothé, J.: Doccube: Multi-dimensional visualization and exploration of large document sets. In: JASIST (Journal of American Society for Information Science and Technology) (2003)

    Google Scholar 

  20. Tsang, V., Stevenson, S.: Calculating semantic distance between word sense probability distributions. In: Proceedings of CoNLL 2004, Boston, MA, USA (2004)

    Google Scholar 

  21. Heydon, A., Najork, M.: Mercator: A scalable, extensible web crawler. World Wide Web 2, 219–229 (1999)

    Article  Google Scholar 

  22. Mailing list archives of nutch.org, http://sourceforge.net/mailarchive/forum.php?forum_id=13068&viewmonth=%200404&viewday=26

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jayasooriya, T., Manandhar, S. (2004). Using Document Dimensions for Enhanced Information Retrieval. In: Manandhar, S., Austin, J., Desai, U., Oyanagi, Y., Talukder, A.K. (eds) Applied Computing. AACC 2004. Lecture Notes in Computer Science, vol 3285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30176-9_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30176-9_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23659-7

  • Online ISBN: 978-3-540-30176-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics