# Provenance for Explaining Taxonomy Alignments

• Mingmin Chen
• Shizhuo Yu
• Parisa Kianmajd
• Nico Franz
• Shawn Bowers
• Bertram Ludäscher
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8628)

## Abstract

Derivations and proofs are a form of provenance in automated deduction that can assist users in understanding how reasoners derive logical consequences from premises. However, system-generated proofs are often overly complex or detailed, and making sense of them is non-trivial. Conversely, without any form of provenance, it is just as hard to know why a certain fact was derived.

Derivations and proofs are a form of provenance in automated deduction that can assist users in understanding how reasoners derive logical consequences from premises. However, system-generated proofs are often overly complex or detailed, and making sense of them is non-trivial. Conversely, without any form of provenance, it is just as hard to know why a certain fact was derived.

We study provenance in the application of Euler/X [1], a logic-based toolkit for aligning multiple biological taxonomies. We propose a combination of approaches to explain both, logical inconsistencies in the input alignment, and the derivation of new facts in the output taxonomies.

Taxonomy Alignment. Given taxonomies $$T_1,T_2$$ and a set of articulations $$A$$, all modeled as monadic, first-order constraints, the taxonomy alignment problem is to find “merged” taxonomies that satisfy $$\varPhi = T_1\cup T_2\cup A$$. An alignment can be inconsistent ($$\varPhi$$ is unsatisfiable), unique ($$\varPhi$$ has exactly one minimal model), or ambiguous ($$\varPhi$$ has more than one minimal model). For example, let $$T_1$$ be given by isa (subset) constraints $$\mathsf {b \subseteq a}$$, $$\mathsf {c \subseteq a}$$, coverage constraint $$\mathsf {a = b\cup c}$$, and sibling disjointness $$\mathsf {b\cap c=\emptyset }$$. Similarly, $$T_2$$ is given by $$\mathbin {\mathrm {isa}}$$ constraints $$\mathsf {e \subseteq d}$$, $$\mathsf {f \subseteq d}$$, coverage $$\mathsf {d=e\cup f}$$, and sibling disjointness $$\mathsf {e}\cap \mathsf {f}=\emptyset$$.

An expert aligns $$T_1$$ and $$T_2$$ using articulations $$\mathsf {a=d}$$, $$\mathsf {b\subsetneq e}$$, $$\mathsf {c\subsetneq f}$$, and $$\mathsf {b\subsetneq d}$$; see Fig. 1. We would like to “apply” all of these relations between the two taxonomies, and output a merged taxonomy.

Inconsistency Explanation. Usually $$T_1$$ and $$T_2$$ are considered immutable or correct by definition, whereas $$A$$ might contain modeling errors. Euler/X applied to Fig. 1 finds that the constraints are unsatisfiable, and performs a model-based diagnosis. The result lattice (Fig. 2) highlights minimal inconsistent subsets (MIS) and maximal consistent subsets (MCS). The MIS $$\{\mathsf {A}_1, \mathsf {A}_2, \mathsf {A}_3\}$$ indicates which articulations are inconsistent with $$T_1,T_2$$. To further explore the inconsistency, the system-derived MCS can be employed: Fig. 3 shows the merged taxonomies (a.k.a. “possible worlds”) obtained from the MCS. Here, each MCS corresponds to one possible world.1

Using expert knowledge or further constraints2 a preferred merge result can be selected to further analyze and then repair the inconsistency. Here, suppose the user chose the first maximal consistent subset $$\{\mathsf {A}_1, \mathsf {A}_2, \mathsf {A}_4\}$$. It follows from $$\mathsf {A}_1, \mathsf {A}_2$$ and the input taxonomies $$T_1,T_2$$ that $$\mathsf {f}\subsetneq \mathsf {c}$$. However, $$\mathsf {A}_3$$ is $$\mathsf {c}\subsetneq \mathsf {f}$$ yielding a contradiction. Now the problem is to explain why $$\mathsf {f} \subsetneq \mathsf {c}$$ is inferred.

Derivation Explanation. To understand how $$\mathsf {f}\subsetneq \mathsf {c}$$ is inferred, we may need to inspect its logical derivation or an abstraction of it. We obtain this provenance in Euler/X by keeping track of the rules $$r_1,\dots , r_8$$ and input alignments $$\mathsf {A}_1,\dots , \mathsf {A}_4$$ used by the reasoner. Figure 4 depicts the resulting provenance overview.

Related Work. Data provenance is an actively researched area and is closely related to proofs and derivations in logical reasoning. Our inconsistency explanation is based on Reiter’s model-based diagnosis [6], which has been studied extensively and applied to many areas, e.g., type error debugging, circuit diagnosis, OWL debugging, etc. We have adapted the HST algorithm in [4] to compute all MIS and MCS for inconsistency explanation. The problem was shown to be Trans-Enum-complete by Eiter and Gottlob [2]. Inspired by the ideas of a provenance semirings [3] and Datalog debugging [5], our approach explains the derivation of the inferred relations.

## Footnotes

1. 1.

In general, a MCS can yield many possible worlds. Such ambiguities arise when the alignment input is underspecified.

2. 2.

E.g., the output for MCS $$\{\mathsf {A}_2, \mathsf {A}_3, \mathsf {A}_4\}$$ might be less desirable since it is not a tree.

## Notes

### Acknowledgments

Supported in part by NSF IIS-1118088 and DBI-1147273.

## References

1. 1.
Chen, M., Yu, S., Franz, N., Bowers, S., Ludäscher, B.: Euler/X: A toolkit for logic-based taxonomy integration. In: 22nd International Workshop on Functional and (Constraint) LogicProgramming (WFLP), Kiel, Germany (2013)Google Scholar
2. 2.
Eiter, T., Gottlob, G.: Hypergraph transversal computation and related problems in logic and AI. In: Flesca, S., Greco, S., Leone, N., Ianni, G. (eds.) JELIA 2002. LNCS (LNAI), vol. 2424, pp. 549–564. Springer, Heidelberg (2002)
3. 3.
Green, T., Karvounarakis, G., Tannen, V.: Provenance semirings. In: ACM Symposium on Principles of Database Systems (PODS), pp. 31–40 (2007)Google Scholar
4. 4.
Horridge, M., Parsia, B., Sattler, U.: Explaining inconsistencies in OWL ontologies. In: Godo, L., Pugliese, A. (eds.) SUM 2009. LNCS, vol. 5785, pp. 124–137. Springer, Heidelberg (2009)
5. 5.
Köhler, S., Ludäscher, B., Smaragdakis, Y.: Declarative datalog debugging for mere mortals. In: Barceló, P., Pichler, R. (eds.) Datalog 2.0 2012. LNCS, vol. 7494, pp. 111–122. Springer, Heidelberg (2012)
6. 6.
Reiter, R.: A theory of diagnosis from first principles. Artif. Intell. 32(1), 57–95 (1987)

© Springer International Publishing Switzerland 2015

## Authors and Affiliations

• Mingmin Chen
• 1
• Shizhuo Yu
• 1
• Parisa Kianmajd
• 1
• Nico Franz
• 2
• Shawn Bowers
• 3
• Bertram Ludäscher
• 1
1. 1.Department of Computer ScienceUC DavisDavisUSA
2. 2.School of Life SciencesArizona State UniversityTempeUSA
3. 3.Department of Computer ScienceGonzaga UniversitySpokaneUSA