5: Physical Layer

Internal Schema

Good database design usually requires multiple tables and hence, many select operations will require access to multiple tables, which lead to performance complexities.

DBMS designers must ensure the database, which will be stored in persistent memory, performs well. They must take into factors such as:

The database designers can affect performance through:

Memory Hierarchy

Primary memory - cache and main memory:

Secondary/tertiary memory - flash, HDDs, optical disks, tape:

Disk Storage Devices

Record Management

Multiple records are stored in a block:

File Organization

The DBMS:

The file descriptor/header contains metadata for the file, such as field names, data types and addresses of the file blocks on the disks. The physical disk blocks can be contiguous, linked or indexed.

Typical file operations:

Unordered files:

Ordered files:

Hashed Files (External Hashing)

Collision Resolution:

Misc. notes:

Redundant Arrays of Independent Disks (RAID)

Uses parallelism across multiple disks to increase reads/write performance using data striping.

RAID levels:

Indexing

Indexes provide an efficient way to access records in different orders without expensive operations like sorting or duplicate data by creating a data structure that maps between a key and address.

Dense indexes have entries for every record, while sparse indexes contain entries for only some of the records (not a one-to-one relationship).

A non-clustered index stores the indexes in a structure separate from the actual data, containing the index key values and a pointer to the row.

A clustered index physically defines the order in which records are stored. Hence, there can only be one clustered index. This course limits clustering indexes to be a non-key field so a file has either a primary or clustered index.

Under this definition, a clustered index will have an entry for each distinct value of the field and hence, it is a sparse index.

A primary index is indexed by the primary key and contains an entry for each block. Each entry contains a pointer to the block and either the value of the first or last record on the block. Hence, the primary index is a sparse index. To improve performance when inserting records, a overflow block may be used.

Secondary indexes are dense indexes that can be on either candidate keys or non-keys, and contain the value of the field and a pointer to either a block or record.

Example: Storage/Speed Tradeoffs

r=30,000r = 30,000 records, record size R=100BR = 100 B, block size B=1024B = 1024, index key size V=9BV = 9 B, block pointer size P=6BP = 6 B. Blocks are unspanned.

Case 1: Ordered File

The blocking factor BFR=floor(block_sizerecord_size)=floor(BR)=floor(1,024100)=10BFR = floor(\frac{block\_size}{record\_size}) = floor(\frac{B}{R}) = floor(\frac{1,024}{100}) = 10. Hence, 10 records fit in a block.

Hence, the number of blocks required, b=ceil(number_of_recordsblocking_factor)=ceil(30,00010)=3,000b = ceil(\frac{number\_of\_records}{blocking\_factor}) = ceil(\frac{30,000}{10}) = 3,000.

For linear search:

For binary search, log2(b)12log_{2}(b) \approx 12 blocks must be read from the disk.

Case 2: Primary Index

The primary index will have a entry for each block. Each index entry has a size of I=V+P=9B+6B=15BI = V + P = 9 B + 6 B = 15 B.

Hence, the blocking factor for the index block, BFRI=floor(block_sizeindex_entrysize)=floor(BI)=floor(1,02415)=68BFRI = floor(\frac{block\_size}{index\_entry size}) = floor(\frac{B}{I}) = floor(\frac{1,024}{15}) = 68, so the index requires ceil(number_of_indexentriesindex_blocking_factor)=ceil(bBFRI)=ceil(300068)=45ceil(\frac{number\_of\_index entries}{index\_blocking\_factor}) = ceil(\frac{b}{BFRI}) = ceil(\frac{3000}{68}) = 45.

To find an individual record, binary search is done on the index, so the the number of blocks read is log2(45)6log_{2}(45) \approx 6. There is an extra read required to read the block the record is stored in, so it requires 77 reads in total.

The primary index must be ordered, so update/insertion/deletion operations are costly.

Case 3: Secondary Index

The secondary index is a dense index with an entry for each entry.

Hence, the index requires ceil(number_of_index_entriesindex_blocking_factor)=ceil(rBFRI)=ceil(30,00068)=442ceil(\frac{number\_of\_index\_entries}{index\_blocking\_factor}) = ceil(\frac{r}{BFRI}) = ceil(\frac{30,000}{68}) = 442 blocks.

Searching the index requires log2(442)9log_{2}(442) \approx 9 block reads, and then an additional read to read the block the record is stored in, so it requires 1010 reads in total.

Unlike the primary index, it does not need to be ordered so update/insertion/deletion operations are faster.

Multi-Level Indexes

The index itself is an ordered file, so a primary index to the index can be created; the original index is called the first-level index, the index to the index the second-level index etc.

This process can be repeated until all entries fit in a single block.

Assuming the first-level index is a primary index, each block for the first first-level index can store references to BFRI1BFRI^1 blocks, the second-level index references to BFRI2BFRI^2 blocks and the nnth-level index BFRInBFRI^n blocks.

This has consequences for performance: as the indexes are ordered, update/insertion/deletion operations may require indexes at all levels to update. Hence, this is more useful in read-heavy scenarios.

B-Trees and B±Trees

Due to the insertion/deletion problem, B or B+ tree structures are used.

In a B-tree, a node of order p has the structure P1,<K1,Pr1>,P2,<K2,Pr2>,,Pq1,<Kq1,Prq1>,PqP_1, <K_1, Pr_1>, P_2, <K_2, Pr_2>, \dots, P_{q-1}, <K_{q-1}, Pr_{q-1}>, P_q where:

A B-tree of order m has:

The keys divide the subtree: all values in P2P_2 will be between K2K_2 and K3K_3.

On insertion, if a node becomes full then the node is spit into two nodes, with the mid-value being used as the key for the parent. If the parent then becomes full, this may cause a cascade. This behavior ensures it is always balanced.

On deletion, if the node has less than m / 2 keys, it can be merged with a sibling.

In a B±tree, pointers to data records only exist in leaf nodes:

Example: Calculating Order

Find the order p of a B±tree if the block size B=512BB = 512 B, index key size V=9BV = 9 B, data pointer size Pr=7BPr = 7 B, tree pointer size P=6BP = 6 B:

Each node can have up to p tree pointers and p - 1 keys, taking up a total of 6p+(p1)96 * p + (p - 1) * 9 bytes. This must be less than the block size:

15p9<512p<=521/1534.715 * p - 9 < 512 \therefore p <= 521/15 \approx 34.7. Hence, the order and maximum value of p is p=34p = 34.

To store one million records requires log34(1,000,000)4log_{34}(1,000,000) \approx 4 levels (data is stored in the leaf nodes, so one million leaf nodes are required).