Abstract
Maps are an important source of information in archaeology and other sciences. Users want to search for historical maps to determine recorded history of the political geography of regions at different eras, to find out where exactly archaeological artifacts were discovered, etc. Currently, they have to use a generic search engine and add the term map along with other keywords to search for maps. This crude method will generate a significant number of false positives that the user will need to cull through to get the desired results. To reduce their manual effort, we propose an automatic map identification, indexing, and retrieval system that enables users to search and retrieve maps appearing in a large corpus of digital documents using simple keyword queries. We identify features that can help in distinguishing maps from other figures in digital documents and show how a Support-Vector-Machine-based classifier can be used to identify maps. We propose map-level-metadata e.g., captions, references to the maps in text, etc. and document-level metadata, e.g., title, abstract, citations, how recent the publication is, etc. and show how they can be automatically extracted and indexed. Our novel ranking algorithm weights different metadata fields differently and also uses the document-level metadata to help rank retrieved maps. Empirical evaluations show which features should be selected and which metadata fields should be weighted more. We also demonstrate improved retrieval results in comparison to adaptations of existing methods for map retrieval. Our map search engine has been deployed in an online map-search system that is part of the Blind-Review digital library system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Samet, H., Soffer, A.: Magellan: Map acquisition of geographic labels by legend analysis. IJDAR 1, 89–101 (1998)
Lim, E.P., Goh, D.H.L., Liu, Z., Ng, W.K., Khoo, C.S.G., Higgins, S.E.: G-portal: a map-based digital library for distributed geospatial and georeferenced resources. In: JCDL, pp. 351–358 (2002)
Samet, H., Soffer, A.: Marco: Map retrieval by content. Trans. Pattern Anal. Mach. Intell. 18, 783–798 (1996)
Futrelle, R.P., Kakadiaris, I.A., Alexander, J., Carriero, C.M., Nikolakis, N., Futrelle, J.M.: Understanding diagrams in technical documents. Computer 25, 75–78 (1992)
Futrelle, R.P.: The diagram understanding system - strategies and results. Technical report, College of Computer and Information Science, Northeastern University (2007)
Martins, B., Borbinha, J., Pedrosa, G., Jo, a.G., Freire, N.: Geographically-aware information retrieval for collections of digitized historical maps. In: GIR 2007: Proceedings of the 4th ACM workshop on Geographical information retrieval, pp. 39–42 (2007)
Gelernter, J., Lesk, M.: Creating a searchable map library via data mining. In: JCDL 2008: Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries, pp. 452–452 (2008)
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes-Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann, San Diego (1999)
Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. Trans. Pattern Anal. Mach. Intell. 22, 1349–1380 (2000)
Mitra, M., Chaudhuri, B.: Information retrieval from documents: A survey. Information retrieval 2 (2000)
Wilkinson, R.: Effective retrieval of structured documents. In: SIGIR 1994, pp. 311–317. Springer, New York (1994)
Myaeng, S.H., Jang, D.H., Kim, M.S., Zhoo, Z.C.: A flexible model for retrieval of sgml documents. In: SIGIR 1998, pp. 138–145. ACM, New York (1998)
Lalmas, M.: Uniform representation of content and structure for structured document retrieval. Technical report, Queen Mary and Westfield College (2000)
Wu, H.C., Luk, R.W.P., Wong, K.F., Kwok, K.L.: A retrospective study of a hybrid document-context based retrieval model. Inf. Process. Manage. 43, 1308–1331 (2007)
Robertson, S., Zaragoza, H., Taylor, M.: Simple bm25 extension to multiple weighted fields. In: CIKM 2004, pp. 42–49 (2004)
Ogilvie, P., Callan, J.: Combining document representations for known-item search. In: SIGIR 2003, pp. 143–150. ACM, New York (2003)
Councill, I.G., Li, H., Zhuang, Z., Debnath, S., Bolelli, L., Lee, W.C., Sivasubramaniam, A., Giles, C.L.: Learning metadata from the evidence in an on-line citation matching scheme. In: JCDL 2006, pp. 276–285 (2006)
Abramson, N.: Information Theory and Coding. McGraw-Hill, New York (1963)
Krovetz, R., Ugurel, S., Giles, C.L.: Classification of source code archives. In: SIGIR 2003, pp. 425–426 (2003)
Devijver, P.A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice-Hall, Englewood Cliffs (1982)
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20, 273–297 (1995)
Joachims, T.: Making large-scale support vector machine learning practical, pp. 169–184 (1999)
Schelfler, W.: Statistics: Concepts and Applications. Benjamin/Cummings Publishing Company (1988)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tan, Q., Mitra, P., Giles, C.L. (2009). Effectively Searching Maps in Web Documents. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-00958-7_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00957-0
Online ISBN: 978-3-642-00958-7
eBook Packages: Computer ScienceComputer Science (R0)