Skip to main content

Effectively Searching Maps in Web Documents

  • Conference paper
Book cover Advances in Information Retrieval (ECIR 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5478))

Included in the following conference series:

Abstract

Maps are an important source of information in archaeology and other sciences. Users want to search for historical maps to determine recorded history of the political geography of regions at different eras, to find out where exactly archaeological artifacts were discovered, etc. Currently, they have to use a generic search engine and add the term map along with other keywords to search for maps. This crude method will generate a significant number of false positives that the user will need to cull through to get the desired results. To reduce their manual effort, we propose an automatic map identification, indexing, and retrieval system that enables users to search and retrieve maps appearing in a large corpus of digital documents using simple keyword queries. We identify features that can help in distinguishing maps from other figures in digital documents and show how a Support-Vector-Machine-based classifier can be used to identify maps. We propose map-level-metadata e.g., captions, references to the maps in text, etc. and document-level metadata, e.g., title, abstract, citations, how recent the publication is, etc. and show how they can be automatically extracted and indexed. Our novel ranking algorithm weights different metadata fields differently and also uses the document-level metadata to help rank retrieved maps. Empirical evaluations show which features should be selected and which metadata fields should be weighted more. We also demonstrate improved retrieval results in comparison to adaptations of existing methods for map retrieval. Our map search engine has been deployed in an online map-search system that is part of the Blind-Review digital library system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Samet, H., Soffer, A.: Magellan: Map acquisition of geographic labels by legend analysis. IJDAR 1, 89–101 (1998)

    Article  Google Scholar 

  2. Lim, E.P., Goh, D.H.L., Liu, Z., Ng, W.K., Khoo, C.S.G., Higgins, S.E.: G-portal: a map-based digital library for distributed geospatial and georeferenced resources. In: JCDL, pp. 351–358 (2002)

    Google Scholar 

  3. Samet, H., Soffer, A.: Marco: Map retrieval by content. Trans. Pattern Anal. Mach. Intell. 18, 783–798 (1996)

    Article  Google Scholar 

  4. Futrelle, R.P., Kakadiaris, I.A., Alexander, J., Carriero, C.M., Nikolakis, N., Futrelle, J.M.: Understanding diagrams in technical documents. Computer 25, 75–78 (1992)

    Article  Google Scholar 

  5. Futrelle, R.P.: The diagram understanding system - strategies and results. Technical report, College of Computer and Information Science, Northeastern University (2007)

    Google Scholar 

  6. Martins, B., Borbinha, J., Pedrosa, G., Jo, a.G., Freire, N.: Geographically-aware information retrieval for collections of digitized historical maps. In: GIR 2007: Proceedings of the 4th ACM workshop on Geographical information retrieval, pp. 39–42 (2007)

    Google Scholar 

  7. Gelernter, J., Lesk, M.: Creating a searchable map library via data mining. In: JCDL 2008: Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries, pp. 452–452 (2008)

    Google Scholar 

  8. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes-Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann, San Diego (1999)

    MATH  Google Scholar 

  9. Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. Trans. Pattern Anal. Mach. Intell. 22, 1349–1380 (2000)

    Article  Google Scholar 

  10. Mitra, M., Chaudhuri, B.: Information retrieval from documents: A survey. Information retrieval 2 (2000)

    Google Scholar 

  11. Wilkinson, R.: Effective retrieval of structured documents. In: SIGIR 1994, pp. 311–317. Springer, New York (1994)

    Google Scholar 

  12. Myaeng, S.H., Jang, D.H., Kim, M.S., Zhoo, Z.C.: A flexible model for retrieval of sgml documents. In: SIGIR 1998, pp. 138–145. ACM, New York (1998)

    Google Scholar 

  13. Lalmas, M.: Uniform representation of content and structure for structured document retrieval. Technical report, Queen Mary and Westfield College (2000)

    Google Scholar 

  14. Wu, H.C., Luk, R.W.P., Wong, K.F., Kwok, K.L.: A retrospective study of a hybrid document-context based retrieval model. Inf. Process. Manage. 43, 1308–1331 (2007)

    Article  Google Scholar 

  15. Robertson, S., Zaragoza, H., Taylor, M.: Simple bm25 extension to multiple weighted fields. In: CIKM 2004, pp. 42–49 (2004)

    Google Scholar 

  16. Ogilvie, P., Callan, J.: Combining document representations for known-item search. In: SIGIR 2003, pp. 143–150. ACM, New York (2003)

    Google Scholar 

  17. Councill, I.G., Li, H., Zhuang, Z., Debnath, S., Bolelli, L., Lee, W.C., Sivasubramaniam, A., Giles, C.L.: Learning metadata from the evidence in an on-line citation matching scheme. In: JCDL 2006, pp. 276–285 (2006)

    Google Scholar 

  18. Abramson, N.: Information Theory and Coding. McGraw-Hill, New York (1963)

    Google Scholar 

  19. Krovetz, R., Ugurel, S., Giles, C.L.: Classification of source code archives. In: SIGIR 2003, pp. 425–426 (2003)

    Google Scholar 

  20. Devijver, P.A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice-Hall, Englewood Cliffs (1982)

    MATH  Google Scholar 

  21. Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20, 273–297 (1995)

    MATH  Google Scholar 

  22. Joachims, T.: Making large-scale support vector machine learning practical, pp. 169–184 (1999)

    Google Scholar 

  23. Schelfler, W.: Statistics: Concepts and Applications. Benjamin/Cummings Publishing Company (1988)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tan, Q., Mitra, P., Giles, C.L. (2009). Effectively Searching Maps in Web Documents. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00958-7_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00957-0

  • Online ISBN: 978-3-642-00958-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics