A 1NF temporal relational model and algebra coping with valid-time temporal indeterminacy

Anselma, Luca; Piovesan, Luca; Terenziani, Paolo

doi:10.1007/s10844-015-0367-2

A 1NF temporal relational model and algebra coping with valid-time temporal indeterminacy

Published: 24 June 2015

Volume 47, pages 345–374, (2016)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Luca Anselma¹,
Luca Piovesan¹ &
Paolo Terenziani²

283 Accesses
7 Citations
Explore all metrics

Abstract

In the real world, many phenomena are time related and in the last three decades the database community has devoted much work in dealing with “time of facts” in databases. While many approaches incorporating time in the relational model have been already devised, most of them assume that the exact time of facts is known. However, this assumption does not hold in many practical domains, in which temporal indeterminacy of facts occurs. The treatment of valid-time indeterminacy requires in-depth extensions to the current relational approaches. In this paper, we propose a theoretically grounded approach to cope with this issue, overcoming the limitations of related approaches in the literature. In particular, we present a 1NF temporal relational model and propose a new temporal relational algebra to query it. We also formally study the properties of the new data model and algebra, thus granting that our approach is interoperable with pre-existent temporal and non-temporal relational approaches, and is implementable on top of them. Finally, we consider computational complexity, showing that only a limited overhead is added when moving from determinate to indeterminate time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Aspects of Dealing with Imperfect Data in Temporal Databases

Converting Temporal Relational Database into Temporal Object Relational Database

Temporal Data Management – An Overview

Notes

In Example 1, the temporal indeterminacy stems from the fact that the history of an infectious disease can be described under two different points of view: a clinical history and a bacteriological one. Roughly speaking, the clinical history corresponds to the evolution of the symptoms in the patient, and can be reasonably determined observing both subjective and objective parameters. On the other hand, the bacteriological history describes the entire life cycle of the presence of the pathogen in the host organism, starting from contagion and ending with the complete elimination of the agent. Taking into account the bacteriological history is of fundamental importance because, for example, patients can be infectious also in the absence of symptoms, or because they run the risk of suffering from a relapse also after the disappearance of the symptoms. However, unlike the clinical history, the bacteriological one often is not completely observable. In this case the physician can only make reasonable assumptions about it. For instance, at the time of diagnosis the physician can determine (observing, e.g., a pulmonary infiltrate) that Bill suffered from pneumonia from the appearance of the symptoms, occurred on March 17^th, until the disappearance of the symptoms on April 2^nd. These times represent the clinical history of the disease. Considering the incubation period of pneumonia and its remission period, the physician can also reasonably assume that the contagion started from 1 to 20 days before the symptom appearance and that Bill’s body will completely eliminate the pneumonia pathogen in a time that may vary from the symptom disappearance to four weeks later.
In Example 2, a homogeneous group of patients is given a chemotherapeutic drug A and they exhibit severe (debilitating) nausea. In order to demonstrate the improvements that the drug A brings to the quality of life of patients, the analyst considers the presence of the nausea side effect also in a comparable group of patients treated with a different drug, drug B.
In many TDB approaches two independent time dimensions have been identified, namely transaction time and valid time. Valid time represents the time when the fact described by a tuple holds in the modeled world. Transaction time represents the time when a tuple is present in the database. Temporal indeterminacy may only concern valid time, since transaction time (i.e., the database insertion/deletion time) is always known in an exact way. As a consequence, in this paper, we just focus on valid time. Extensions to cope also with transaction time are easy since transaction time can be coped with as in the other TDB approaches, e.g., as in TSQL2.
In the relation D I S E A S E S ^DET we consider determinate time only. Thus, the valid time of the tuple regarding Bill’s pneumonia will be referred only to the clinical history of the disease.
As in many TDB approaches, for the sake of convenience time intervals [c _s,c _e) are closed on the left and open on the right (i.e., their left bound is included and their right bound is excluded). Notice, however, that our approach is not dependent on such a choice.
Of course, more efficient implementations of determinate-time relations (with just two temporal attributes) can be easily provided.
The min and max functions have the obvious meanings. The increment function can be defined as $ c+1 = c^{\prime }\in T^{C} \setminus c^{\prime }> c \wedge \nexists c^{\prime \prime }\in T^{C} (c<c^{\prime \prime }<c^{\prime }) $.

References

Allen, J.F. (1991). Time and time again: the many ways to represent time. International Journal of Intelligent Systems, 6(4), 341–355.
Article Google Scholar
Anselma, L., Bottrighi, A., Montani, S., & Terenziani, P. (2013a). Extending BCDM to cope with proposals and evaluations of updates. IEEE Transactions on Knowledge and Data Engineering, 25(3), 556–570. doi:10.1109/TKDE.2011.170.
Article Google Scholar
Anselma, L., Stantic, B., Terenziani, P., & Sattar, A. (2013b). Querying now-relative data. Journal of Intelligent Information System, 41(2), 285–311. doi:10.1007/s10844-013-0245-8.
Article Google Scholar
Anselma, L., Terenziani, P., & Snodgrass, R.T. (2013c). Valid-time indeterminacy in temporal relational databases: Semantics and representations. IEEE Transactions on Knowledge and Data Engineering, 25(12), 2880–2894. doi:10.1109/TKDE.2012.199.
Article Google Scholar
Brusoni, V., Console, L., Terenziani, P., & Pernici, B. (1999). Qualitative and quantitative temporal constraints and relational databases: Theory, architecture, and applications. IEEE Transactions on Knowledge and Data Engineering, 11(6), 948–968.
Article Google Scholar
Chomicki J, & Toman D (2009). Temporal relational calculus. In L. Liu, & M.T. Özsu (Eds.), Encyclopedia of database systems. doi:10.1007/978-0-387-39940-9_1531 (pp. 3015–3016). US: Springer.
Codd, E.F. (1971). Further normalization of the data base relational model. San Jose: IBM Research Report.
Google Scholar
Codd, E.F. (1972). Relational completeness of data base sublanguages. In R. Rustin (Ed.), Database systems: 65-98. San Jose: Prentice Hall and IBM Research Report RJ987.
Combi, C., Cucchi, G., & Pinciroli, F. (1997). Applying object-oriented technologies in modeling and querying temporally oriented clinical databases dealing with temporal granularity and indeterminacy. IEEE Transactions on Information Technology in Biomedicine, 1(2), 100–127.
Article Google Scholar
Das, A.K., & Musen, M.A. (1994). A temporal query system for protocol-directed decision support. Methods of Information in Medicine, 33(4), 358–370. PMID:7799812.
Google Scholar
Dekhtyar, A., Ross, R.B., & Subrahmanian, V.S. (2001). Probabilistic temporal databases, i: algebra. ACM Transactions on Database Systems, 26(1), 41–95.
Article MATH Google Scholar
Dunn, J., Davey, S., Descour, A., & Snodgrass, R.T. (2002). Sequenced subset operators: Definition and implementation. In R. Agrawal, & K.R. Dittrich (Eds.), ICDE (pp. 81–92): IEEE Computer Society.
Dutta S (1989). Generalized events in temporal databases. In Proceedings Fifth International Conference on Data Engineering, 1989. doi:10.1109/ICDE.1989.47207 (pp. 118–125).
Dyreson, C.E. (2009). Temporal indeterminacy. In L. Liu, & M.T. Özsu (Eds.), Encyclopedia of database systems (pp. 2973–2976). US: Springer.
Dyreson, C.E., & Snodgrass, R.T. (1998). Supporting valid-time indeterminacy. ACM Transactions on Database Systems, 23(1), 1–57.
Article Google Scholar
Emerson, E.A. (1990). Temporal and modal logic. In J. van Leeuwen (Ed.), Handbook of theoretical computer science, volume B: Formal models and sematics (B) (pp. 995–1072): The MIT Press.
Gadia, S.K., Nair, S.S., & Poon, Y.C. (1992). Incomplete information in relational temporal databases. In L.Y. Yuan (Ed.), VLDB (pp. 395–406): Morgan Kaufmann.
Jensen, C., & Snodgrass, R. (2008). Temporal Database Entries for the Springer Encyclopedia of Database Systems, TimeCenter Technical Report, Timecenter.
Jensen, C.S., & Snodgrass, R.T. (1996). Semantics of time-varying information. Information Systems, 21(4), 311–352.
Article Google Scholar
Jensen, C.S., & Snodgrass, R.T. (1999). Temporal data management. IEEE Transactions on Knowledge and Data Engineering, 11(1), 36–44.
Article Google Scholar
McKenzie, L.E., & Snodgrass, R.T. (1991). Evaluation of relational algebras incorporating the time dimension in databases. ACM Computing Surveys, 23(4), 501–543.
Article Google Scholar
Özsoyoglu, G., & Snodgrass, R.T. (1995). Temporal and real-time databases: A survey. IEEE Transactions on Knowledge and Data Engineering, 7(4), 513–532.
Article Google Scholar
Snodgrass, RT. (1982). Monitoring distributed systems: A relational approach, PhD thesis, Computer Science Department. Pittsburgh: Carnegie Mellon University.
Google Scholar
R.T. Snodgrass (Ed.) (1995). The TSQL2 Temporal Query Language: Kluwer.
Snodgrass, R.T. (1999). Developing time-oriented database applications in SQL: Morgan Kaufmann.
Stantic, B., Terenziani, P., Governatori, G., Bottrighi, A., & Sattar, A. (2012). An implicit approach to deal with periodically repeated medical data. Artificial Intelligence in Medicine, 55(3), 149–162.
Article Google Scholar
A.U. Tansel, J. Clifford, S.K. Gadia, S. Jajodia, A. Segev, & R.T. Snodgrass (Eds.) (1993). Temporal Databases: Theory, Design, and Implementation: Benjamin/Cummings.
Terenziani, P. (2003). Symbolic user-defined periodicity in temporal relational databases. IEEE Transactions on Knowledge and Data Engineering, 15(2), 489–509. doi:10.1109/TKDE.2003.1185847.
Article Google Scholar
Terenziani, P. (2012). Temporal aggregation on user-defined granularities. Journal of Intelligent Information System, 38(3), 785–813. doi:10.1007/s10844-011-0179-y.
Article Google Scholar
Terenziani, P. (2013). Coping with events in temporal relational databases. IEEE Transactions on Knowledge and Data Engineering, 25(5), 1181–1185. doi:10.1109/TKDE.2011.265.
Article Google Scholar
Terenziani, P., & Snodgrass, R. (2004). Reconciling point-based and interval-based semantics in temporal relational databases: a treatment of the telic/atelic distinction. IEEE Transactions on Knowledge and Data Engineering, 16(5), 540–551. doi:10.1109/TKDE.2004.1277816.
Article Google Scholar
Terenziani, P., Snodgrass, R.T., Bottrighi, A., Torchio, M., & Molino, G. (2007). Extending temporal databases to deal with telic/atelic medical data. Artificial Intelligence in Medicine, 39(2), 113–126.
Article Google Scholar
Vila, L. (1994). A survey on temporal reasoning in artificial intelligence. AI Communications, 7(1), 4–28.
Google Scholar
Wu, Y., Jajodia, S., & Wang, X.S. (1997). Temporal database bibliography update. In Temporal Databases, Dagstuhl (pp. 338–366).

Download references

Acknowledgments

The authors are very much indebted to R.T. Snodgrass for many enlightening suggestions and invaluable support he gave us in the preliminary stages of this work.

The work described in this paper was partially supported by Compagnia di San Paolo in the Ginseng project.

Author information

Authors and Affiliations

Dipartimento di Informatica, Università degli Studi di Torino, Torino, Italy
Luca Anselma & Luca Piovesan
DISIT, Università del Piemonte Orientale “Amedeo Avogadro”, Alessandria, Italy
Paolo Terenziani

Authors

Luca Anselma
View author publications
You can also search for this author in PubMed Google Scholar
Luca Piovesan
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Terenziani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luca Piovesan.

Appendices

Appendix A: Proofs

Let us recall from the text the following notation.

Given a tuple $ t = (v_{1}, \dots , v_{n} | d_{s}, d_{e}, i_{s}, i_{e}) $, 〈d,i〉 represents its temporal component, where d stands for the determinate time interval [d _s,d _e) and i stands for the indeterminate time interval [i _s,i _e).

Proof (Property 4)

We consider the relational operators of Cartesian product and of difference. The proof for the other operators is easy.

Cartesian Product

In the case of determinate ITEs, the ITE intersection results in

$$\langle d,d\rangle \cap^{ITE} \langle d^{\prime},d^{\prime}\rangle = \langle d\cap d^{\prime}, d\cap d^{\prime}\rangle $$

which, for the property of consistent extension on ITEs, is equivalent to the determinate temporal element

$$d\cap d^{\prime}$$

Therefore, the definition of temporally indeterminate Cartesian product

$$\begin{array}{ll}r \times^{TI} s& = \{ (v_{r} \cdot v_{s}|\langle d,i\rangle ) \setminus\\ &\exists \langle d_{r}, i_{r}\rangle ,\langle d_{s}, i_{s}\rangle ((v_{r}|\langle d_{r}, i_{r}\rangle) \in r \wedge (v_{s}| \langle d_{s}, i_{s}\rangle )\in s \wedge\\ &\langle d, i\rangle = \langle d_{r}, i_{r}\rangle \cap^{ITE} \langle d_{s}, i_{s}\rangle \wedge i\neq \emptyset ) \} \end{array} $$

is equivalent to the definition of TSQL2 Cartesian product considering valid time only:

$$\begin{array}{ll}r \times^{T} s& = \{ (v_{r} \bullet v_{s}|t ) \setminus\\ &\exists t_{r}, t_{s} ((v_{r}|t_{r} ) \in r \wedge (v_{s}|t_{s} ) \in s \wedge t = t_{r} \cap t_{s} \wedge t \neq \emptyset\} \end{array} $$

Difference

Now we examine the relational difference in our approach, taking in consideration the case where only determinate time is dealt with. The definition of temporally indeterminate relational difference, in case we deal with determinate times (represented with a same determinate and indeterminate interval) can be written as:

$$\begin{array}{ll}r -^{TI} s& = \{ (v|\langle d, d\rangle ) \setminus (\exists \langle d_{r},d_{r}\rangle ((v|\langle d_{r}, d_{r}\rangle)\in r \wedge\\ &\nexists \langle d_{s}, d_{s}\rangle ((v|\langle d_{s}, d_{s}\rangle )\in s \wedge \langle d, d\rangle = \langle d_{r}, d_{r}\rangle )) ) \vee\\ &(\exists \langle d_{r}, d_{r}\rangle ((v|\langle d_{r},d_{r}\rangle )\in r \wedge \exists ! (v|\langle d_{1}, d_{1}\rangle ), \dots,\\ &(v|\langle d_{k},d_{k}\rangle ) ((v|\langle d_{1},d_{1}\rangle )\in s, \dots, (v|\langle d_{k},d_{k}\rangle )\in s \wedge\\ &\langle d,d\rangle = \langle d_{r}, d_{r}\rangle -^{ITE} \{\langle d_{1}, d_{1}\rangle , \dots, \langle d_{k}, d_{k}\rangle \} \wedge\\ &d\neq \emptyset)) ) \} \end{array} $$

In the case of determinate ITEs, the ITE difference can be written as:

$$\begin{array}{ll}\langle d,d\rangle -^{ITE}& \{\langle d^{\prime}_{1}, d^{\prime}_{1}\rangle , \dots, \langle d^{\prime}_{k}, d^{\prime}_{k}\rangle \} =\\ &cover(chr(d) - (chr(d^{\prime}_{1}) \cup {\dots} \cup chr(d^{\prime}_{k})),\\ &chr(d) - (chr(d^{\prime}_{1}) \cup {\dots} \cup chr(d^{\prime}_{k}))) \end{array} $$

The cover function returns the ITEs $ \left \langle d^{\prime \prime }_{j}, d^{\prime \prime }_{j}\right \rangle $ where each $ d^{\prime \prime }_{j} $ corresponds to the maximal convex set of chronons in $chr(d) - \left (chr(d^{\prime }_{1}) \cup \dots \cup chr(d^{\prime }_{k})\right )$.

Substituting in the difference, we have:

$$\begin{array}{ll} r -^{TI} s& = \{ (v|\langle d, d\rangle ) \setminus (\exists \langle d_{r},d_{r}\rangle ((v|\langle d_{r}, d_{r}\rangle )\in r \wedge\\ &\nexists \langle d_{s}, d_{s}\rangle ((v|\langle d_{s}, d_{s}\rangle )\in s \wedge \langle d, d\rangle = \langle d_{r}, d_{r}\rangle )) ) \vee \\ &(\exists \langle d_{r}, d_{r}\rangle ((v|\langle d_{r},d_{r}\rangle )\in r \wedge \exists ! (v|\langle d_{1}, d_{1}\rangle ), \dots,\\ &(v|\langle d_{k},d_{k}\rangle ) ((v|\langle d_{1},d_{1}\rangle )\in s, \dots, (v|\langle d_{k},d_{k}\rangle )\in s \wedge \\ &\langle d,d\rangle = cover(chr(d_{r}) - (chr(d_{1}) \cup {\dots} \cup chr(d_{k})),\\ &chr(d_{r}) - (chr(d_{1}) \cup {\dots} \cup chr(d_{k}))) \wedge d\neq \emptyset)) ) \}. \end{array} $$

Now we report the definition of relational difference of TSQL2.

$$\begin{array}{ll} r -^{B} s &= \{z \setminus \exists x \in r (z[A] = x[A] \wedge\\ &\exists t \in cover^{B}(bi\_chr(x[TT], x[VT])-\\ &\{bi\_chr(y[TT], y[VT]) \setminus y \in s \wedge y[A] = x[A]\}) \wedge \\ &z[TT_{s}] = min\_1(t) \wedge z[TT_{e}] = max\_1(t) \wedge \\ &z[VT_{s}] = min\_2(t) \wedge z[VT_{e}] = max\_2(t))\} \end{array} $$

where A, TT, VT represent the non-temporal, transaction-time and valid-time attributes, respectively, and the subscripts s and e represent the starting and ending chronons of the interval.

Considering relations with valid time only, the definition may be simplified as:

$$\begin{array}{ll} r -^{V} s& = \{z \setminus \exists x \in r (z[A] = x[A] \wedge \exists t \in cover^{V}(chr(x[VT]) -\\ &\{chr(y[VT]) \setminus y \in s \wedge y[A] = x[A]\}) \wedge\\ &z[VT_{s}] = min(t) \wedge z[VT_{e}] = max(t))\} \end{array} $$

Now we prove that −^TI and −^V are equivalent. In the definition of −^TI, we provide for two cases.

The first disjunct of −^TI corresponds to the case where there is no value-equivalent tuple in s as the tuple (v|〈d _r,d _r〉) in r; in this case the tuple (v|〈d _r,d _r〉) is included in the result. Also −^V, since in this case the set {y[V T]y∈s∧y[A]=x[A]} is empty, includes in the result the tuples in r with no value-equivalent tuples in s.

The second disjunct of −^TI corresponds to the case where a tuple (v|〈d _r,d _r〉) in r has the value-equivalent tuples $(v|\langle d_{1}, d_{1}\rangle ), \dots , (v|\langle d_{k}, d_{k}\rangle )$ in s. In the definition of −^V, the set {y[V T]y∈s∧y[A]=x[A]} corresponds to the same value-equivalent tuples $(v|\langle d_{1}, d_{1}\rangle ), \dots , (v|\langle d_{k}, d_{k}\rangle )$ in s. Thus, both −^TI and −^V perform set difference between the same sets of chronons. In −^TI the c o v e r function returns the minimum and maximum chronons in the convex sets of the set difference, whereas in −^V the c o v e r ^V function returns only the convex sets and the minimum and maximum chronons are determined in the definition of −^V. For the consistent extension property on ITEs, an ITE 〈d,d〉 is equivalent to a determinate temporal element d. It is worth noticing that neither −^TI nor −^V return tuples with empty temporal elements because of the clause d≠∅ for −^TI and because of the existential quantification ∃t for −^V. □

Proof (Property 5)

For the sake of brevity, we prove the property considering the Cartesian product operator. The proofs for the other operators are similar. Let r and s be ITE relations with schemas (A|T) and (B|T) respectively, where A, B and T stand for the attributes $ \{A_{1},\dots , A_{l}\} $, $\{B_{1}, \dots , B_{m}\}$ and {D _s,D _e,I _s,I _e} respectively, then

$$\rho^{TI}_{t}\left( r \times^{TI} s\right) = \rho^{TI}_{t}(r) \times \rho^{TI}_{t}(s) $$

where ×^TI is the ITE Cartesian product, × is the standard non-temporal Cartesian product and $\rho _{t}^{TI}$ is the timeslice operator. We show the equivalence by proving the two inclusions separately, i.e., we prove that the left-hand side of the formula (henceforth lhs) implies the right-hand side (henceforth rhs) and that the rhs implies the lhs.

$\mathbf {(x^{\prime \prime } \in lhs \Rightarrow x^{\prime \prime } \in rhs)}$

Let $x^{\prime \prime } \in lhs$. Then, by the definition of $\rho _{t}^{TI}$, there exists a tuple $x^{\prime } \in \left (r \times ^{TI} s\right )$ such that $x^{\prime }[A, B]= x^{\prime \prime }[A, B]$ and $t\in x^{\prime }[D]$.

By the definition of ×^TI, there exist tuples x ₁∈r and x ₂∈s such that $x_{1}[A] = x^{\prime }[A], x_{2}[B] = x^{\prime }[B]$ and $x_{1}[T]\cap x_{2}[T]= x^{\prime }[T]$.

Then, by the definition of $\rho _{t}^{TI}$, there exists a tuple $x_{1}^{\prime }\in \rho _{t}^{TI}(r)$ such that $x_{1}^{\prime }[A] = x_{1}[A] = x^{\prime }[A]$, and there exists a tuple $x_{2}^{\prime }\in \rho _{t}^{TI}(s)$ such that $x_{2}^{\prime }[B] = x_{2}[B] = x^{\prime }[B]$.

Therefore, by the definition of ×, there exists $x_{12}^{\prime \prime } \in rhs$ such that $x_{12}^{\prime \prime }[A] = x_{1}^{\prime }[A]$ and $x_{12}^{\prime \prime }[B] =x_{2}^{\prime }[B]$.

By construction, $x_{12}^{\prime \prime } = x^{\prime \prime }$.

$\mathbf {(x^{\prime \prime } \in rhs \Rightarrow x^{\prime \prime } \in lhs)}$

Now assume $x^{\prime \prime }\in rhs$. Then, by definition of ×, there exist tuples $x_{1}^{\prime }\in \rho _{t}^{TI}(r)$ and $x_{2}^{\prime }\in \rho _{t}^{TI}(s)$ such that $x_{1}^{\prime }[A] = x^{\prime \prime }[A]$ and $x_{2}^{\prime }[B] = x^{\prime \prime }[B]$.

By the definition of $\rho _{t}^{TI}$, there exists a tuple x ₁∈r such that $x_{1}[A] = x_{1}^{\prime }$ and $t\in x_{1}^{\prime }[D]$ and there exists a tuple x ₂∈r such that $x_{2}[B] = x_{2}^{\prime }$ and $t\in x_{2}^{\prime }[D]$.

Then by definition of ×^TI there must exist a tuple

$x^{\prime } \in (r \times ^{TI} s)$ such that $x^{\prime }[A] = x_{1}[A]$, $x^{\prime }[B] = x_{2}[B]$, $x^{\prime }[T] = x_{1}[T]\cap x_{2}[T]$ and $t\in x^{\prime }[D]$.

Then, by definition of $\rho _{t}^{TI}$, there exists a tuple $x_{12}^{\prime \prime } \in lhs$ such that $x_{12}^{\prime \prime }[A, B] = x^{\prime }[A, B]$.

By construction, $x_{12}^{\prime \prime } = x^{\prime \prime }$. □

Appendix B: Algorithms

In Section 4 we have proposed a definition of the relational difference between two temporally indeterminate relations based on an abstract definition of the difference between an ITE (henceforth minuend) and a set of ITEs (henceforth subtrahends). For the sake of clarity, the ITE difference in Section 4 was based on a conversion from ITEs to sets of chronons and back. The definition is very general, covering all the possible alternative solutions (since there are, in general, multiple equivalent ways of converting the chronons in the result into a set of ITEs).

On the other hand, in this Appendix we propose an actual algorithm to perform ITE difference. The algorithm is based on the abstract definition of Section 4, but it is more efficient, since it directly operates on time intervals instead of sets of chronons. Also, it is based on a specific partitioning policy. Before detailing the algorithm, we need to introduce some useful concepts.

The basic issue with temporal relational difference (independently of whether determinate or indeterminate time intervals are adopted) is that interval difference must, in general, be performed between sets of intervals. Even in the determinate case, the difference between an interval [s ₁,e ₁) and an interval [s ₂,e ₂) contained in it (i.e., such that s ₁<s ₂<e ₂<e ₁) results in two intervals [s ₁,s ₂) and [e ₂,e ₁). Thus, even if the operation starts with the difference between one interval and a set of intervals (one for each value-equivalent tuple), intermediate computational steps must consider difference between two sets of intervals. In general, such an operation would require quadratic time. However, such a complexity can be reduced by exploiting ordering (the ordering between intervals can be trivially defined on the basis of the temporal ordering of their endpoints). We exploit such an idea also in our case, in which ITEs are considered (instead of determinate intervals). To do so, we introduce the notion of (ordered) list of “Typed Intervals”.

A Typed (Temporal) Interval (henceforth TY) represents a convex set of chronons, which are all “labeled” either as determinate (DET) or as indeterminate (INDET). A TY is completely described by the triple 〈s t a r t,e n d,t y p e〉, where s t a r t,e n d∈T ^C are the starting and ending points of the TY (as in the ITE representation, a TY interval [s t a r t,e n d) includes the starting chronon and excludes the ending one) and t y p e∈{D E T,I N D E T}. Hereinafter, for the sake of brevity, we will use the dot notation for TY (e.g., if ty is a TY, t y.s t a r t is the starting point of ty).

We use three relations between TYs. In particular, given two TYs t y ₁ and t y ₂,

before(t y ₁, t y ₂) stands for t y ₁.e n d≤t y ₂.s t a r t
meets(t y ₁, t y ₂) stands for t y ₁.e n d=t y ₂.s t a r t
overlaps(t y ₁, t y ₂) stands for $ty_{1} \cap ty_{2} \neq \emptyset $

In the algorithms below, we will also use the notion of List of TYs. A List of TYs is a collection of TYs such that it is:

maximal in the sense that $ty_{1}, ty_{2} \in l \wedge meets(ty_{1},\, ty_{2}) \Rightarrow ty_{1}.type \neq ty_{2}.type$;
without intersections, i.e., $ty_{1}, ty_{2} \in l \Rightarrow \neg overlaps(ty_{1}\, ty_{2})$;
ordered, i.e., in a List of TYs $(ty_{1}, \dots , ty_{i}, \dots , ty_{j}, \dots , ty_{n})$ $i<j \Rightarrow before(ty_{i},\, ty_{j})$.

Given l:l i s t_o f_t y, we denote with l.s i z e the number of elements contained in l. l[i] is the i-th element of the list l (with 1≤i≤l.s i z e). We also use the notation “append el to l” and “remove from l element in position i” to denote, respectively, insertion in the last position and the classical deletion of an element from the list l. In order to grant maximality, if an element is added to a List of TYs and it meets the subsequent element or the previous one meets it and their types are equal, they are automatically merged.

In addition, two conversion operations are defined: the toTY operation converts a set of ITEs into a List of TYs, and the toITE operation converts a List of TYs into a set of ITEs. Given the previous notions, we now describe the algorithm for difference between ITEs (see Algorithm 6). It is basically divided into three phases:

1.
Both the minuend and the subtrahends are converted into Lists of TYs (Algorithms 1, 2, 3). This operation is performed by using the toTY function (Algorithm 1) that, given as input a set of ITEs set, returns a List of TYs. In the basic case, in which set contains only an element, called ite, the number of returned elements depends on the structure of ite: in case its determinate interval [d _s,d _e) is empty, a single INDET TY is returned, otherwise a list containing a INDET TY representing [i _s,d _s) (if not empty), a DET one representing [d _s,d _e) and a INDET one representing [d _e,i _e) (if not empty) is returned.

In the case in which set contains more than one element (e.g., for the subtrahends), set is partitioned into two subsets, then, for each of them, a List of TYs is obtained separately. Finally, the two lists are combined by the merge algorithm (see Algorithm 2), which grants that the list in the result respects the properties previously mentioned (i.e., it is maximal, without intersections and ordered).
2.
The subtrahends are subtracted from the minuend (Algorithms 6, 4). A many-to-many difference between the List of TYs deriving from minuends and those obtained from subtrahends needs to be performed. Exploiting the ordering of both the lists, they are visited only once. In particular, if an element of s u b t r a h e n d_l i s t is in relation of before with the one actually considered for the minuend m i n u e n d_e l, there is no need to compare it with the following elements of the minuend. On the other hand, if the current m i n u e n d_e l is before the i-th element of s u b t r a h e n d_l i s t, there is no need to compare m i n u e n d_e l with the following elements of s u b t r a h e n d_l i s t. Thus, Algorithm 6 takes into account each element s u b t r a h e n d_l i s t[j] of s u b t r a h e n d_l i s t until it finds a s u b t r a h e n d_l i s t[j] such that s u b t r a h e n d_l i s t[j] is not before m i n u e n d_e l. There are three cases:
- the end of subtrahend_list is reached. If j>s u b t r a h e n d_l i s t.s i z e, m i n u e n d_e l and the following elements of m i n u e n d_l i s t do not intersect with any element in the subtrahend. Thus, they are added to the result.
- an element subtrahend_list[j] is found such that minuend_el is before subtrahend_list[j]. In this case, the current m i n u e n d_e l can be added to the result as it is and the algorithm continues with the next TY of m i n u e n d_l i s t (and with the current s u b t r a h e n d_l i s t[j]).
- subtrahend_ list[j] overlaps minuend_el. In such a case, the difference between m i n u e n d_e l and s u b t r a h e n d_l i s t[j] is computed (Algorithm 4). The result of the difference between two TYs is a List of TYs composed by up to three TYs. All the elements of this list, except for the last one, can be inserted in the result, while the last one (if there is at least one element) takes the place of m i n u e n d_e l in Algorithm 6 and it is compared with the following elements of s u b t r a h e n d_l i s t.
3.
A set of ITEs representing the result of the difference between TYs is obtained (Algorithm 5). This operation is performed by the function toITE (see Algorithm 5), which accomplishes, for the TYs, the same task of the cover function of Fig. 2. Given the particular structures used in this implementation, the partition policy of our algorithm tends to create ITEs similar to the ones shown in the upper part of Fig. 3.

1.1 Discussion on Complexity

Complexity (Difference between two sets of ITEs).

Suppose that n is the number of ITEs that have to be subtracted from one ITE. The difference algorithm operates in three main steps: (1) pre-processing (toTY), (2) difference computation, and (3) post-processing (toITE). The toTY conversion takes in input a set of ITEs and converts it into an ordered list of TYs. In general, each ITE may correspond to three TYs. toTY basically operates like the classical mergesort algorithm, with a complexity which is $ O(m \log _{2} m) $, where m is the number of TYs (i.e., m is at most 3(n+1)). By exploiting the ordering of the list of TYs, in step 2 the difference can be computed by visiting each TY at most once, i.e., in a time that is O(m). Finally, toITE reconverts TYs into ITEs, also “coalescing” indeterminate TYs that meet each other. By exploiting the ordering of TYs in the list, also such operation is performed by visiting each TY once, i.e., in linear time O(m). Overall, the complexity is thus dominated by the initial ordering step (step 1), and it is $ O(m \log _{2} m) $.

As discussed in Section 4, the complexity of our relational operator of difference is the same as the one of many TDB approaches (in particular, the same I/O operations are performed in both cases), including TSQL2, except for the operation of difference between time intervals. We now consider the complexity of the difference between time intervals, comparing our specific implementation described above with the complexity of difference between determinate-time intervals.

Let us now consider the complexity in the determinate case, supposing that n is the cardinality of the set of (determinate) time intervals to be subtracted from a given one. For the sake of efficiency, also the difference between sets of determinate-time intervals can exploit a pre-processing step to order them. A slight variation of mergesort can be used so that the complexity is $ O(n \log _{2} n) $. After that, the difference can be computed subtracting one interval at a time in time O(n). No post-processing step is needed in the determinate case (since no conversion is required and the output of determinate difference is already coalesced). As in the case of indeterminate time, the complexity of difference is thus dominated by the ordering step, which, applied to sets of x elements, requires $ O(x \log _{2} x) $ time. The main difference is thus a multiplicative constant, due to the fact that an ITE indeed corresponds to an ordered list of (at most) three intervals.

Appendix C: Comparison with a non-closed approach

In this appendix, we compare our approach with the approach proposed by Das and Musen (1994) in the medical field, to exemplify the importance of devising a data model and algebra with the closure property. As discussed in Section 5, Das and Musen (1994) proposed a 1NF temporal relational model coping with temporal indeterminacy through the introduction of two intervals of uncertainty (IOUs), one for the starting time and one for the ending time. The interval between the upper bound of the starting time and the lower bound of the ending time represents an interval of certainty (IOC; i.e., an interval of time in which the fact necessarily holds). Notice that Das and Musen propose an implementation in which (at most) three tuples are used to model a temporally indeterminate fact. A tuple with Type equal to “body” represents the time when the fact certainly holds (IOC), while the other two tuples with Type equal to “start” and “end” represent the time when the fact possibly holds (the IOU of the starting time and the IOU of the ending time, respectively). Considering Example 2, Das and Musen represent data as shown in Table 7.

Table 7 Relation SIDE_EFFECTS^DM (Das and Musen representation of Example 2)

Full size table

However, Das and Musen did not extend their temporal algebra to cope with temporal indeterminacy.

“To manipulate states, on the other hand, we must choose between the minimum or maximum span representation of the state. […] Either of these approaches then results in a single pair of endpoints for the state-based data” Das and Musen (1994).

In other words, they cope with temporal indeterminacy in the query by first removing temporal indeterminacy from input data (by taking either the minimum or the maximum valid-time interval for indeterminate time), and then they apply their temporal algebra for determinate time to the result. However, this is restrictive: indeed, their extended formalism coping with indeterminacy is not closed under their algebra, since their algebraic operators cannot operate on indeterminate time and the output of their queries cannot be an indeterminate temporal relation (in fact, indeterminacy is removed in the first, necessary, step of their queries). This is a major limitation. Indeed, Das and Musen themselves noticed that their algebraic operators, when operating on temporally indeterminate facts, “[…] may produce anomalous results […]” (Das and Musen 1994). Indeed, in the following, we show a simple example demonstrating that Das and Musen’s approach is limited since certain queries cannot be properly managed.

Let us consider the situation described by Example 2, and suppose that the user wants to know when drug B and not drug A caused nausea (i.e., Query 2). Considering the information in Example 2, the output should be that B and not A caused nausea certainly on day 4, and possibly on days 2, 3, 5 and 6. Notice, however, that such a result cannot be obtained operating as proposed by Das and Musen. If the minimum valid time (i.e., the “certain” time IOC) is first selected, then the difference between the time intervals [1,4] and [1,1] should be computed, obtaining [2,4] as a result (see Table 8). On the other hand, if the maximum valid time (i.e., the “possible” time, IOU) is first selected, then the difference between the time intervals [1,6] and [1,3] should be computed, obtaining [4,6] as a result (see Table 9). Obviously, none of them is the desired (correct) result to Query 2.

Table 8 Das and Musen’s answer to Query 2, considering the minimum valid time

Full size table

Table 9 Das and Musen’s answer to Query 2, considering the maximum valid time

Full size table

Indeed, although quite simple, the above example demonstrates the necessity of developing a closed temporal algebra for indeterminate time, i.e., an algebra in which temporal indeterminate data (relations) are directly managed (with no need of removing indeterminacy) as first-class entities, which may be input and output of the queries. Despite the diffusion of the relational model and the relevance of temporally indeterminate data in many real-world contexts, so far there is no temporal relational approach providing both a 1NF data representation formalism and a closed relational algebra operating on it to cope with temporally indeterminate data (see also the discussion in Section 5). Providing such an approach and proving its reducibility to the standard non-temporal algebra are the results we achieved in the work we describe in this paper.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Anselma, L., Piovesan, L. & Terenziani, P. A 1NF temporal relational model and algebra coping with valid-time temporal indeterminacy. J Intell Inf Syst 47, 345–374 (2016). https://doi.org/10.1007/s10844-015-0367-2

Download citation

Received: 06 October 2014
Revised: 14 May 2015
Accepted: 14 May 2015
Published: 24 June 2015
Issue Date: December 2016
DOI: https://doi.org/10.1007/s10844-015-0367-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A 1NF temporal relational model and algebra coping with valid-time temporal indeterminacy

Abstract

Access this article

Similar content being viewed by others

Aspects of Dealing with Imperfect Data in Temporal Databases