2. Relational Data Model

Formal Definitions

Formal	Informal
Relation	Table
Tuple	Row
Attribute	Column
Degree	Num. attributes
Cardinality	Num. rows

Relation: set of tuples
Schema of relation: $R(A_1, \dots, A_n)$
Tuple: ordered set of values
Domain: set of valid values for an attribute

Relation: subset of the cartesian product of the domains i.e. any valid combination of values for all the attributes of the table. $R$ is the intension - the schema of the relation, and $r\ of\ R$ is an extension - a specific valid value of the relation.

The ordering of tuples irrelevant, but the ordering of values within a tuple relevant.

Constraints

Conditions that hold for all valid relation instances.

Domain constraint

Specifies that all values in a relation must be inside their respective domains.

Key Constraint

Superkey of $R$ : set of attributes such that no two tuples will have the same value
Key - a minimal superkey: removing any attribute stops it from being a superkey
A primary key is arbitrarily chosen from the set of candidate keys. The rest are called secondary keys
- Attributes of the candidate keys are called prime attributes

Entity Integrity Constraint

Primary key values must not be null.

Referential Integrity Constraint

Referencing relation has foreign key attribute referencing the primary key of the referenced relation. The value can be null, and the primary key can itself be a foreign key.

Semantic Integrity Constraint

Constraints not expressed by the model and relating to the application semantics.

EER to Relation Mapping Algorithm

Simple tables: make a new table, pick any key as the primary key
Weak entity; make a new table, with the primary key being the combination of the owner and partial key. If the partial key is composite, use the components
1:N: Put the foreign key in the ‘N’ side
1:1: Put the foreign key on the side with total participation
M:N: make a new table with two foreign keys; the combination of the two form the primary key
Multivalued attributes: make a new table with a foreign key to the owner. The combination of the value of the attribute and foreign key is the primary key
N-ary relationships: make a new table, with the combination of the foreign keys being the primary key.
Specialization/generalization:
- Superclass has its own table with general attributes; subclasses have their own table with specific attributes; the primary key is a foreign key to superclass
- Total specialization: tables just for the subclasses (duplicating superclass attributes)
- Disjoint specialization: single table with attributes for every subclass, plus attribute denoting which subclass it is (called the type)
- Overlapping specializations: same as above, but Boolean for every subclass indicating which subclasses it is a member of
Categories: primary key is an artificial, automatically generated ID; all superclasses of the category have this as a foreign key

Relational Algebra

Variables

VARNAME \leftarrow operation

Select

$\sigma_{condition}(relation)$
Horizontal slices of a table
Condition can be composite; use logical operators

Project

$\pi_{attribute\_list}(relation)$
Vertical slices of a table
Returns a set, so if two rows are identical the duplicate will not be added

Rename

$\rho_{S(col1, \dots coln)}(R)$
Renames relation properties

Set

Union-compatible relations: require the same degree and pairs of corresponding attributes to have the same domain.

Union: $S \cup R$
Intersection: $S \cap R$
Difference: $S - R$
Cartesian Product: $S \times R$

Union and intersection are:

Commutative: order of arguments doesn’t matter
Associative: order of operations doesn’t matter

Joins

R \Join_{condition} S

Equijoin:

$condition \leftarrow R.col = S.col$
$R.col$ and $S.col$ are both present

Natural join:

$R * S$
Special case of equijoin where primary and foreign keys have the same name
The primary and foreign key columns are merged

$\theta$ join:

$condition \leftarrow R.col\ OP\ S.col$ where $OP\in\{<, <=, >=, >, !=\}$

The same type of joins are available for left, right and full outer joins (symbols have two horizontal lines sticking out of the top and bottom of the left/right end of the $\Join$ symbol).

Division

‘For every’ queries.

If $R(X)$ and $S(Z)$ , $R \div S = T(Y)$ where $Y = X - Z$ ; columns only in $S$ will not appear.

For a tuple $t$ to appear in $T$ , the values in $t$ must appear in $R$ in combination with every tuple in $S$ .

If $S$ contains a single key $A$ and $R$ contains an additional column $B$ , then $T$ will be a table with a single column $B$ . If a value $b$ exists in $B$ , there exists a tuple $(a, b)$ for all values of $a$ found in $S$ .

e.g. if $S = {(a1), (a2)}$ and $R={(a1, b1), (a2, b1), (a1, b2)}$ then $T$ will contain ${(a1)}$ . If $S$ is a table and $R$ is a table with the exact same columns plus a primary key:

For the id to appear, the id must appear along with every tuple in $S$
The resultant table will only have the primary key (columns in $R$ minus columns in $S$ )

Equivalent to:

\begin{aligned} T1 &\leftarrow \pi_Y(R) \text{ (getting rid of columns only in S)} \\ T2 &\leftarrow \pi_Y((S \times T1) - R) \\ T1 &\leftarrow T1 - T2 \end{aligned}

If $R \times S = T$ , then $T \div S = R$ .

Explanation of division:

Given $\delta = \pi_{A-B}(\alpha)$ :

\alpha \div \beta = \delta - \pi_{A-B}((\delta \times \beta) - \alpha)

$\delta$ is a superset of the desired result; some need to be removed
$\delta \times \beta$ gives all possible tuples that could be contained in a relation with schema $A$
$(\delta \times \beta) - \alpha$ gives all possible tuples not in $\alpha$
This can be subtracted from $\delta$ after a project operation