Internal Schema
Good database design usually requires multiple tables and hence, many select operations will require access to multiple tables, which lead to performance complexities.
DBMS designers must ensure the database, which will be stored in persistent memory, performs well. They must take into factors such as:
- Secondary storage
- Buffering, caching
- Indexing strategy
- Query optimization
- Networks
- Concurrency
The database designers can affect performance through:
- Data types (e.g. using
instead of ) - Degree of normalization
- Query formulation
- Overhead for constraints, triggers etc.
- Indexing strategies
- Media (never store media in a DB)
Memory Hierarchy
Primary memory - cache and main memory:
- Fast access
- Limited capacity
- Volatile
Secondary/tertiary memory - flash, HDDs, optical disks, tape:
- High capacity
- Low cost
- Non-volatile
Disk Storage Devices
- Data stored as magnetized areas
- A disk pack contains several magnetic disks connected to a rotating spindle
- Each surface of the disk is divided into concentric circular tracks, each storing 4-50 kB
- Tracks are divided into blocks/sectors - arcs of the sector
- They are typically 512 B to 4 kB in size
- The entire block is read/written to at once
- A read-write head moves to the track containing the block to be transferred
- Reading/writing is time consuming due to seek times, s, and rotational delay, rd
- A physical disk block address consists of:
- A cylinder number: the disk containing the data
- The track number
- The block number
Record Management
- A file is a sequence of records
- Each record is a collection of fields; a row
- Each field contains values for a certain data type
- A file can have fixed- or variable-length records, and records can have fixed- or variable-length fields
- If variable-length, separator characters or length fields are required
Multiple records are stored in a block:
- The blocking factor, bfr is the number of records per block
- File records can be:
- unspanned: records are always in a single block. This is preferred as it means only a single read is required to get a particular record
- spanned: records can be stored in multiple blocks
- If unspanned and the blocking factor is not an integer, there will be empty space on a block
- If a file contains fixed-length records, unspanned blocking is usually used
File Organization
The DBMS:
- Uses the OS/file system APIs to read/write physical blocks (pages) between disks and memory
- May store DB data on multiple disks; if the block size is different and the records are unspanned, this may cause headaches
- May have its own file system
- An in-memory database relies on the main memory for data storage, leading to faster access but also the potential for data loss if the system crashes
The file descriptor/header contains metadata for the file, such as field names, data types and addresses of the file blocks on the disks. The physical disk blocks can be contiguous, linked or indexed.
Typical file operations:
- Open: ready the file
- Find: search for the first record satisfying a condition, and makes it the current record
- Find next: searches for the next record satisfying a condition, and makes it a current record
- Read: read the current record into a variable
- Insert: insert a new record into the file, and make it the current record
- Delete: remove the current file record (usually by marking it as invalid)
- Modify: change the value of fields in the current record
- Close: terminate access to a file
- Reorganize: e.g. remove records marked as invalid
- Read ordered: read file blocks ordered by a certain field in the file
Unordered files:
- Called a heap/pile file
- Efficient insertion: new records inserted at the end
- Linear search required to search for a record
- Reading ordered by a particular field requires sorting
Ordered files:
- Records sorted by an ordering field
- Update is expensive; must be inserted in the correct order
- A overflow/transaction file, which is periodically merged with the main ordered file, may be used to improve performance
- If it is near the start, all records after that may need to be moved
- Binary search can be used to search for the record on its ordering field value
- Reading the records in the order of the ordering field is efficient
Hashed Files (External Hashing)
- File blocks divided into M equal-size buckets (usually corresponding to one or a fixed number of disk blocks)
- The record with hash key K is stored in bucket i where
, is the hashing function (e.g. modulo) - Search via the hash key is efficient
- When a bucket is full, a overflow bucket is used; bucket has pointer to overflow bucket
Collision Resolution:
- Open addressing: put in the next available position
- Chaining: place in overflow bucket
- Multiple-hashing: if collision occurs, use a secondary hash function etc. Fall back to open addressing
Misc. notes:
- For performance, the hash file is kept 70-80% complete
- The hash function should distribute the records uniformly; otherwise, overflow records will increase search time
- Ordered access on the hash key is inefficient; requires sorting
- Fixed number of buckets; problem if the number of records grows or shrinks
Redundant Arrays of Independent Disks (RAID)
Uses parallelism across multiple disks to increase reads/write performance using data striping.
RAID levels:
- Level 0: striping, but no redundant data; best write performance
- Level 1: mirrored disks; show as writes have to be performed on both disks
- Level 4: block-level data striping, with parity information stored on one or more disks
- Level 5: level 4, but parity information distributed across all disks
- Level 6: P + Q redundancy with Reed-Solomon codes. Two disks can fail with no data loss
- Level 10 (1 + 0): RAID 0, but each ‘disk’ is a set of RAID 1 mirrored disks
Indexing
Indexes provide an efficient way to access records in different orders without expensive operations like sorting or duplicate data by creating a data structure that maps between a key and address.
Dense indexes have entries for every record, while sparse indexes contain entries for only some of the records (not a one-to-one relationship).
A non-clustered index stores the indexes in a structure separate from the actual data, containing the index key values and a pointer to the row.
A clustered index physically defines the order in which records are stored. Hence, there can only be one clustered index. This course limits clustering indexes to be a non-key field so a file has either a primary or clustered index.
Under this definition, a clustered index will have an entry for each distinct value of the field and hence, it is a sparse index.
A primary index is indexed by the primary key and contains an entry for each block. Each entry contains a pointer to the block and either the value of the first or last record on the block. Hence, the primary index is a sparse index. To improve performance when inserting records, a overflow block may be used.
Secondary indexes are dense indexes that can be on either candidate keys or non-keys, and contain the value of the field and a pointer to either a block or record.
Example: Storage/Speed Tradeoffs
Case 1: Ordered File
The blocking factor
Hence, the number of blocks required,
For linear search:
- The best case is reading a single data block from the disk
- The worst case is reading all data blocks from the disk
- The average case is
blocks
For binary search,
Case 2: Primary Index
The primary index will have a entry for each block. Each index entry has a size of
Hence, the blocking factor for the index block,
To find an individual record, binary search is done on the index, so the the number of blocks read is
The primary index must be ordered, so update/insertion/deletion operations are costly.
Case 3: Secondary Index
The secondary index is a dense index with an entry for each entry.
Hence, the index requires
Searching the index requires
Unlike the primary index, it does not need to be ordered so update/insertion/deletion operations are faster.
Multi-Level Indexes
The index itself is an ordered file, so a primary index to the index can be created; the original index is called the first-level index, the index to the index the second-level index etc.
This process can be repeated until all entries fit in a single block.
Assuming the first-level index is a primary index, each block for the first first-level index can store references to
This has consequences for performance: as the indexes are ordered, update/insertion/deletion operations may require indexes at all levels to update. Hence, this is more useful in read-heavy scenarios.
B-Trees and B±Trees
Due to the insertion/deletion problem, B or B+ tree structures are used.
In a B-tree, a node of order p has the structure
-
is a tree pointer -
is a key value -
is a data pointer
A B-tree of order m has:
- At most m children (tree pointers)
- If a non-leaf node:
- Has at least m / 2 children (except the root node)
- If it has k children, it contains k - 1 keys
- All leaf nodes are at the same level
The keys divide the subtree: all values in
On insertion, if a node becomes full then the node is spit into two nodes, with the mid-value being used as the key for the parent. If the parent then becomes full, this may cause a cascade. This behavior ensures it is always balanced.
On deletion, if the node has less than m / 2 keys, it can be merged with a sibling.
In a B±tree, pointers to data records only exist in leaf nodes:
- Internal nodes of the form
- Keys sandwiched between block pointers
- Leaf nodes of the form
- The last node is a block pointer to the next leaf node
Example: Calculating Order
Find the order p of a B±tree if the block size
Each node can have up to p tree pointers and p - 1 keys, taking up a total of
To store one million records requires