# Join

**DOI:**https://doi.org/10.1007/978-1-4899-7993-3_1260-2

## Definition

The join is a binary operator of the relational algebra that combines tuples of different relations based on a relationship between values of their attributes. The primitive version of the join operator is called *natural join*. Given two relation instances *R* _{1}, over set of attributes *U* _{1}, and *R* _{2} over set of attributes *U* _{2}, the natural join *R* _{1} ⋈ *R* _{2} returns a new relation, over set of attributes *U* _{1} ∪ *U* _{2}, consisting of tuples {*t* |*t*(*U* _{1}) ∈ *R* _{1} and *t*(*U* _{2}) ∈ *R* _{2}}. Here *t*(*U*) denotes the restriction of the tuple *t* to attributes in the set *U*.

A derivable version of the join operator is obtained by composing the natural join with the selection operator σ: the *theta join R* _{1} ⋈_{ θ } *R* _{2} is defined as *σ* _{ θ } (*R* _{1} ⋈ *R* _{2}), where *θ* is an arbitrary condition allowed in a generalized selection over set of attributes *U* _{1} ∪ *U* _{2}. In the case that *θ* is a conjunction of equality atoms of the form *A* = *B*, where *A* is an attribute in *U* _{1} and *B* an attribute in *U* _{2}, the theta join is called *equijoin*.

Another derivable join operator is the *semijoin*, denoted by *R* _{1} ⋉ *R* _{2}; it is defined as \( {\pi}_{U_1} \)(*R* _{1} ⋈ *R* _{2}), where \( {\pi}_{U_1} \) denotes the projection on attributes *U* _{1}.

## Key Points

In the natural join *R* _{1} ⋈ *R* _{2}, tuples of *R* _{1} and *R* _{2} having the same values of common attributes are combined. If the sets of attributes of *R* _{1} and *R* _{2} are disjoint, *R* _{1} ⋈ *R* _{2} coincides with the Cartesian product.

The natural join is often used to combine tuples based on attributes correlated by a foreign key constraint: consider a relation *Students* over attributes (*student-number, student-name*), containing tuples {(1001, *Black*), (1002, *White*)}, and a relation *Exams* over attributes (*course-number, student-number, grade*), containing tuples {(*EH*1, 1001, *A*), (*EH*1, 1002, *A*), (*GH*5, 1001, *C*)}. Then the natural join *Students* ⋈ *Exams* is a relation over attributes (*student-number, student-name, course-number, grade*) with tuples {(1001, *Black*, *EH*1, *A*), (1001, *Black*, *GH*5, *C*), (1002, *White*, *EH*1, *A*)}.

In the absence of attribute names, the only primitive notion of join is the Cartesian product. In this case operators of theta join and equijoin can be derived by composing selection and Cartesian product. More precisely, if θ is a Boolean combination of atoms of the form *j* α *k* with *j* ≤ *arity*(*R* _{1}) and *k* ≤ *arity*(*R* _{2}) and α ∈ {=, ≠, <, >, ≤, ≥}, then the theta join *R* _{1} ⋈_{ θ } *R* _{2} in the unnamed algebra is defined as *σ* _{ θ′}(*R* _{1} × *R* _{2}), where *θ*′ is obtained from *θ* by replacing each atom *j* α *k* with *j* α (*arity*(*R* _{1}) + *k*).

In each of the join operators described above (except the semijoin), there can be tuples of the input relations which do not occur in the output, because they satisfy the join condition with no tuple of the other relation. The *left (right) outer join* adds to the join of *R* _{1} and *R* _{2} all tuples of *R* _{1} (*R* _{2}) not occurring in the join, completed with nulls on attributes of *R* _{2} (*R* _{1}). The *full outer join* adds both tuples of *R* _{1} and *R* _{2} to the join.