1 Introduction

The notion of geographic relevance and the role of geographic space in information access have been recognized for a long time [15]. For example, the PERSEUS digital library aimed to make humanities documents accessible spatially, while e.g. the SEQUOIA and SPIRIT projects [8], as well as the GeoCLEF shared task [1] aimed to study geographic information retrieval. More recently, the pervasiveness of mobile computing devices [11] and other developments associated to the Internet of Things (IoT) all necessitate reflection on the role of geographic space in making information collected and stored accessible, not just indexed using words and numbers but also spatially. However, to date, not ECIR nor other IR conferences have offered a tutorial for interested researchers and practitioners, making the body of research that make up the state of the art accessible.

To this end, we propose a half-day to address this gap. We will introduce or recap the core concepts from geography and its intersection with IR, and survey existing techniques to (a) construct spatial representations from textual documents and queries (typically exploiting geographic knowledge from gazetteers [7] in doing so), and to (b) utilize geographic knowledge (prior and extracted from data) to better access document collections in which geographic space place a substantial roles. We will also cover example applications [5], e.g. in fields such as the digital humanities [12], and discuss possible avenues for future work in the area.

2 Goals and Objectives

In this tutorial, we aim to give a survey of the concepts and methods used to make implicit spatial evidence contained in text collections accessible. We cover selected early and seminal attempts [3, 8, 10, 13] and more recent Machine Learning (ML) methods [6, 16,17,18], hoping to inspire students and fellow researchers to get interested in conducting their own research in this area. Bringing two seemingly disparate worlds like geographic space and text documents together is exciting!

By the end of the tutorial session, the attendees will have a clear sense of the key concepts in Geo-NLP and Geographical Information Retrieval (GIR), and they will understand some seminal methods as well as open problems.

3 Description and Structure of the Tutorial

This one day tutorial will be divided into five sessions:

  • Geography and text: an introduction to the ways in which geographic concepts are reflected in natural language and in text;

  • Toponym recognition and resolution [9]: key to most geographically inspired analysis are the use of place-names in text, their identification, disambiguation, and resolution to unique locations;

  • Geographic relevance and ranking [4]: methods for incorporating geographic information in IR indexes and ranking algorithms. Discuss what is geographic relevance, and how it varies with context and application domain;

  • Applications: Concrete examples for the application of the introduced methods, in fields ranging from Digital Humanities to Web search, together with a discussion on requirements and their implications on algorithmic and data choices;

  • Future challenges: Where are the most likely applications of GIR in the future, and what are key societal and methodologically driven challenges;

The first four sessions will each present fundamental challenges, a selection of examples from the state of the art, and include interactive exercises (computer and/or paper based) to illustrate basic concepts to participants.

4 Prerequisites

In terms of prerequisites, some knowledge of basic IR and ML concepts will be helpful. However, the tutorial is designed for a broad audience, introducing key high level concepts, and providing participants with material to deepen knowledge subsequently.

5 Target Audience

The target audience for this tutorial includes the following three groups:

  • students of computer science, especially in information retrieval, who want to learn about mobility-relevant spatial computation around search/IR (e.g. [2]);

  • practicing IR engineers who would like to expand their areas of expertise so as to include geographic search;

  • information retrieval researchers interested in and introduction and state-of-the-art review [14] on GIR and Geo-NLP;

  • geographers or GIS experts who have not yet worked with text, and who would like to learn how the spatial knowledge implicit in text collections can be used to support geospatial analysis.

Beyond these directly targeted groups, the tutorial could be of interest to anyone who would like to understand better how the world of geographic space relates to the world of unstructured textual documents.

6 Presenters and Their Experience

Jochen L. Leidner is a computer scientist and research manager. He is Director of Research at Refinitiv Labs (formerly Thomson Reuters F&R) in London where he leads the Research & Development function and team. A computational linguist by training, he holds Master’s degrees (Erlangen and Cambridge) and a Ph.D. (Edinburgh). His 2007 Ph.D. thesis “Toponym Resolution in Text” (published in book form in 2008) attracted over 200 hundred citations. He is a Fellow of the Royal Geographical Society and currently also the Academy of Engineering Visiting Professor of Data Analytics in the Department of Computer Science at the University of Sheffield to instill industry practice into engineering training.

Bruno Martins is an assistant professor at the Department of Computer Science and Engineering of Instituto Superior Técnico in the University of Lisbon and a researcher at INESC-ID, where he works on problems related to the general areas of information retrieval, text mining, and the geographical information sciences. He has been involved in several research projects related to geospatial aspects in information access and retrieval, and he has accumulated a significant expertise in addressing challenges at the intersection of information retrieval, machine learning, and the geographical information sciences.

Katherine McDonough is a Senior Research Associate at The Alan Turing Institute with the Living with Machines project and a Research Fellow at Queen Mary, University of London. She has formerly taught and worked on digital humanities projects at Stanford University, Western Sydney University, and Bates College. With a background in eighteenth-century French history, her early research focused on the politics of infrastructure. She has written on GIR challenges for humanities research and is a member of the GéoDisco project, which examines geographic discourse in historical French encyclopedias. Her current work explores new approaches to GIR informed by humanistic source criticism.

Ross Purves is a professor at the University of Zurich. His research focuses on the geographic analysis of text, exploring both methodological issues (e.g. gazetteer quality and representation of vernacular names) and analysis of text to better understand landscape. He collaborated on the SPIRIT project, which investigated a number of concepts fundamental to geographic information retrieval. Together with Chris Jones, he organises the workshop on Geographic Information Retrieval which has been hosted by CIKM, SIGIR and ACM SIGSPATIAL, and which has been an important incubator of many ideas related to GIR. He recently co-authored a comprehensive review of GIR [14].

7 Previous Events

This is a new tutorial, and therefore was never presented before. All of the presenters are experienced teachers and have given seminars at a range of international conferences on related material.

8 Summary and Conclusion

We have presented a tutorial proposal for geospatial content processing and retrieval. Geographic aspects in information access and retrieval have been increasing in relevance, given the interest in analysing huge volumes of unstructured data in fields such as the digital humanities or the computational social sciences, and given the pervasiveness of networked sensors, GPS-enabled mobile devices, and in-car navigation systems. Modern information systems need to spatially enable text to make it accessible to a variety of use cases that contain a notion of “geographic relevance”. This suggests that our novel tutorial would be likely to be of interest to most attendees of ECIR 2020.