Encyclopedia of GIS

2017 Edition
| Editors: Shashi Shekhar, Hui Xiong, Xun Zhou

Ontology-Based Geospatial Data Integration

Reference work entry
DOI: https://doi.org/10.1007/978-3-319-17885-1_917

Synonyms

Definition

Information integration is the combination of different types of information in a framework so that it can be queried, retrieved, and manipulated. This integration is usually done through an interface that acts as the integrator of information originating from different places. For integration to be efficient and to deliver the kind of information that the user is expecting, it is necessary to have an agreement on the meaning of the information. In a broader scope, it is necessary to reach an agreement about the meaning of the entities of the geographic world.

In order for information sharing to happen among different communities in a effective and meaningful way some preconditions are necessary. The concepts that people have about the real world must be explicitly formalized; such an explicit formalization of mental models is called an ontology. Ontology is often seen as an engineering artifact that describes a certain reality with a specific vocabulary, using a set of assumptions regarding the intended meaning of the vocabulary words. In philosophy ontology has a different meaning. For philosophy, ontology is the science that studies what exists. The fact that philosophers have been studying ontology since Aristotle may mean that their field can help information scientists learn to build good ontologies, but philosophy itself has different branches that have different assumptions about the world and how it is understood.

Historical Background

The effective integration of multiple resources and domains is known as interoperation. Interoperability is formally defined by the Open GIS Consortium as the “capability to communicate, execute programs, or transfer data among various functional units in a manner that requires the user to have little or no knowledge of the unique characteristics of those units.” Efforts towards geographic information systems (GIS) interoperation are well documented. In the past, exchanging geographic information was as simple as sending paper maps or raw data tapes through the mail. Today, computers throughout the world are connected and the use of GIS has become widespread. The scope of interoperability has changed from static data exchange using flat files to global systems, interconnected using sophisticated protocols to exchange information on-line. In the future, computers are expected to be able to share not only information but also knowledge. Although GIS have been characterized as an integration tool, GIS interoperability is far from being fully operational.

Research on the integration of databases can be traced back to the mid 1980s, and today it is widespread among the GIS community. The complexity and richness of geographic information and the difficulty of its modeling raise specific issues for geographic information integration, such as the integration of different models of geographic entities (i.e., objects and fields) and different computer representations of these entities (i.e., raster and vector).

In GIS, the focus is changing from format integration to semantic interoperability. The first attempts to obtain GIS interoperability involved the direct translation of geographic data from one vendor format into another. A variation of this practice is the use of a standard file format. These formats can lead to information loss, as is often the case with the popular CAD-based (Computer Aided Design) format DXF (Drawing Exchange Format). Alternatives that avoid this problem are usually more complex, such as the Spatial Data Transfer Standard (SDTS) and the Spatial Archive and Interchange Format (SAIF).

The literature shows many proposals for the integration of information, ranging from federated databases with schema integration to ontologies. The new generation of information systems needs to handle semantic heterogeneity in making use of the amount of information available with the arrival of the internet and distributed computing. The support and use of multiple ontologies should be a basic feature of modern information systems if they are to support semantics in the integration of information. Ontologies can capture the semantics of information, can also be represented in a formal language, and can be used to store the related metadata, thus enabling a semantic approach to information integration. Sophisticated structures, such as ontologies, are good candidates for abstracting and modeling geographic information with the final objective of information integration.

This new generation of systems is characterized by the use of multiple ontologies and contexts to achieve semantic interoperability. Since Aristotle’s theory of substances (objects, things, and persons) and accidents (qualities, events, and processes), ontology has been used as the foundation for theories and models of the world. Since ontology was first introduced, current research on ontology use can be found throughout the computer science community in areas such as computational linguistics and database theory. The areas that are being researched range from knowledge engineering, information integration, and object-oriented analysis to applications in medicine, mechanical engineering, and geographic information systems. The use of explicit ontologies contributes to the improvement of GIS. Since every information system is based on an implicit ontology, when the ontology is made explicit, conflicts are avoided between the common-sense ontology of the user and the mathematical concepts in the software, and conflicts between the ontological concepts and the implementation.

Scientific Fundamentals

According to Guarino “an ontology refers to an engineering artifact, constituted by a specific vocabulary used to describe a certain reality, plus a set of explicit assumptions regarding the intended meaning of the vocabulary words. This set of assumptions has usually the form of a first-order logical theory, where vocabulary words appear as unary or binary predicate names, respectively called concepts and relations. In the simplest case, an ontology describes a hierarchy of concepts related by subsumption relationships; in more sophisticated cases, suitable axioms are added in order to express other relationships between concepts and to constrain their intended interpretation.”

The purpose of ontologies is to bring more meaning (semantics) to the way data is collected, stored, and integrated. According to Guarino, an explicit ontology plays a central role in an ODIS (Ontology-Driven Information Systems) and drives all aspects and components of the system. Ontologies can be used at development time or at run time. In the geographic field, Fonseca proposed the use of ontologies in the development and use of GIS to enhance the integration of geographic information. He called those systems ontology-driven GIS.

Ontologies are usually created with one of two purposes. The first is to facilitate information integration, and the second is to facilitate communication between software agents. Both problems require what is usually called ontology integration, the mapping of concepts from one ontology to another, or as some approaches suggest, the creation of a common ontology from previously independent ontologies. From the technological point of view, for agents to be able to carry on a successful negotiation it is necessary that not only the ontologies be formally expressed, but also that they be expressed in a computer-readable language. Therefore, ontology languages are necessary.

The idea of ontology integration leads to great challenges. A fundamental barrier in the way of the development of fully general and reusable ontologies is what is called the Tower of Babel problem. The difficulty is that insofar as database engineers attempt to accommodate, with the same database, groups of users possessing distinct ontologies (in the sense of preconceived theories), they must address the problem of integrating information in ways that are compatible with the perspectives of all significant potential users. This is a problem. It might be possible to integrate a limited number of alternative ontologies working out correspondences among them for a limited domain of data on a case-by-case, ad-hoc basis. However, such solutions are, by their nature, incompatible with the technological imperative behind the development of ontologies. They will be idiosyncratic, and are not general and reusable.

Accordingly, in order to achieve more general and reusable solutions, the use of techniques of logical and analytic philosophy to develop formal ontological structures with terminological consistency and subject to certain computationally convenient and efficient organizational principles was suggested. The problem of integrating databases derived from distinct ontologies is to be solved by requiring designers to conform, from the beginning, to an ontology. That is, the Tower of Babel problem is resolved by eliminating ontological differences at the outset, requiring all database designers to submit to first-order logic and/or whatever other formal and substantive constraints are compatible with a consistent ontology. Consequently, the complexity, subtlety and possibly surprising multidimensionality of the data, and the categories that organize them, must be limited in order to fit the needs of the database engineers. This kind of solution to the Tower of Babel problem is called the Newspeak Solution after George Orwell’s introduction of the term in his novel 1984. In order to meet the demands of the technological society envisaged by Orwell, there was a continual effort to create a reformed English, Newspeak, which was simpler, and less capable of expressing the ambiguity inherent in different points of view than traditional English. The consequence was that it became less expressive, and thus reduced the complexity of thought of those using it.

The difficulties associated with constructing a more complex alternative to Newspeak ontologies on a general scale are overwhelming to say the least. How, for example, could one provide a common or neutral framework for organizing and integrating all of the distinct descriptions that have been offered for any reasonably complex conceptual realm? The answer is, of course, that one cannot provide such a common ontology. If there is something like a common framework, it does not lie at the level of computational ontologies at all, but at the level at which users from different communities (paradigms) may learn to communicate with one another.

The main question is what is information used for? Formal approaches to ontology integration have to incorporate a hermeneutic dimension to information integration and interoperability issues. The three aspects of the conventions that structure human knowledge, analysis, synthesis, and application, are precisely the dimensions that are central to hermeneutics. A hermeneutic contextualization of ontology creation and integration can make room for communication among users who hold different points of view. Representation of diverse ontologies can be a setting within which users with differing conceptualizations of the world can learn to understand each other. Staying strictly within the ontological level of analysis, the problem of full-fledged information integration is insuperable. It is possible, however, to design a hermeneutic context: a place where users may come to learn from one another in a way much more fundamental than merely exchanging information within a mutually accepted paradigm. In order to do this, however, it is necessary first to explicitly recognize the hermeneutic context that is always present, though largely invisible when there are no disagreements about ontologies. For it is in this context that the adjudication of disagreements must go on. The key is to see that a database, as well as the world to which it refers, is itself an object of interpretation, and that, as such, those who use it are engaging in hermeneutic activity. Moreover, this activity of interpretation is strongly constrained by the applications users have in view. The use of hermeneutics in information integration provides a context from which it is possible to address the various problems facing ontologists and users: choice of ontological categories, ontology integration, and communication among users coming from different perspectives.

From such a perspective, information is seen as a process that is dependent on a certain preknowledge which the user of the data brings with him/her. Thus, the concept of the preunderstanding of a user of information and its extension to the preunderstanding of a community (and its ontology) is very close to the central role of presuppositions, or prejudices, in framing and guiding the emergence of experience in the work philosophical hermeneutics. Hence, attempts to develop frameworks aiming at information integration that will satisfy both the formalism and the practical and intuitive issues will have to deal with a hermeneutic approach.

Key Applications

Ontology Integration to Share Geographic Data About the Environment

Environmental researchers need improved integration tools and methods for a global sharing of scientific information. Historically, humans all over the world in every civilization have gathered information on the environment for a utilitarian purpose. This information is required to develop effective strategies for the conservation of the environment and design better policies to address common concerns about the environment. Environmental phenomena do not respect borders or frontiers, but the information resources, local knowledge and strategies used to combat them are all constructed independently and are not shared or directly sharable. Unfortunately, although sufficient information may exist to solve a particular problem, the existing data and metadata are inaccessible or not readily usable because they have been collected by different agents with a diversity of purposes. The type and quality of these data are also greatly influenced by the culture and language of the investigators. Thus, in order to be useful to other scientists in the world these diverse purposes need to be made explicit, with differences in meaning resolved, so that data can be understood by those among whom it is shared. However, the global character of environmental issues is a source of many impediments to the synthesis and utility of global data about the Earth. The three main factors are (1) language, (2) semantics and (3) culture, because although the researchers are studying the same subject they use different ontologies when collecting their data.

Future Directions

The use of semantic tools such as ontologies still has a long way to go. It is necessary to make the current technology easier to use. Meaning is ultimately established by the interpretation at the user level, and the tools available today are still too user unfriendly to enable a more effective use.

Another challenge is the availability of good ontologies. Although there are some ontologies available, it is currently difficult to search and understand them, and thus difficult to effectively evaluate their usefulness.

Cross-References

Recommended Reading

  1. Berners-Lee T, Hendler J, Lassila O (2001) The semantic web: a new form of web content that is meaningful to computers will unleash a revolution of new possibilities. Sci Am 284:34–43CrossRefGoogle Scholar
  2. Egenhofer MJ (2002) Toward the semantic geospatial web. In: Proceedings of the 10th ACM international symposium on advances in geographic information systems, McLean, 8–9 Nov 2002Google Scholar
  3. Fonseca F, Egenhofer M, Agouris P, Câmara G (2002) Using ontologies for integrated geographic information systems. Trans GIS 6:231–257CrossRefGoogle Scholar
  4. Goodchild M (2002) Geographical data modeling. Comput Geosci 18:401–408CrossRefGoogle Scholar
  5. OpenGIS: The OpenGIS Guide-Introduction to Interoperable Geoprocessing and the OpenGIS Specification. Open GIS Consortium, Wayland (1996)Google Scholar
  6. Smith B, Mark D (1998) Ontology and geographic kinds. In: Proceedings of the 8th international symposium on spatial data handling, Vancouver, 12–15 July 1998Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.College of Information Sciences and TechnologyThe Pennsylvania State UniversityUniversity ParkUSA