Introduction to B Trees

B Trees are a type of self-balancing search tree in which nodes can have multiple children. They are designed to work well on systems that read and write large blocks of data, such as databases and file systems. B Trees maintain sorted data and allow searches, sequential access, insertions, and deletions in logarithmic time.

Key Characteristics of B Trees

  • Balanced Tree: B Trees are balanced, meaning that the path from the root to any leaf is of the same length.
  • Multiple Children: Each node can have more than two children.
  • Sorted Order: The keys within a node are sorted.
  • Node Capacity: Each node can contain a certain number of keys, typically defined by a parameter called the minimum degree (t).

Structure of a B Tree

A B Tree of minimum degree t has the following properties:

  • Every node has at most 2t - 1 keys.
  • Every node (except root) has at least t - 1 keys.
  • The root has at least one key.
  • All leaves appear on the same level.
  • A non-leaf node with k keys has k + 1 children.

Example of a B Tree

Consider a B Tree of minimum degree t = 3:

          [10 | 20]
         /    |    \
  [1 | 5 | 8] [12 | 15 | 18] [22 | 25 | 30]

Basic Operations on B Trees

  1. Search

Searching in a B Tree is similar to searching in a binary search tree but involves more comparisons due to multiple keys per node.

Algorithm:

  1. Start at the root node.
  2. Perform a binary search within the node.
  3. If the key is found, return it.
  4. If the key is not found and the node is a leaf, return null.
  5. If the key is not found and the node is not a leaf, recursively search the appropriate child.

Example:

class BTreeNode:
    def __init__(self, t, leaf=False):
        self.t = t  # Minimum degree
        self.leaf = leaf
        self.keys = []
        self.children = []

def search(node, key):
    i = 0
    while i < len(node.keys) and key > node.keys[i]:
        i += 1
    if i < len(node.keys) and key == node.keys[i]:
        return node
    if node.leaf:
        return None
    return search(node.children[i], key)

  1. Insertion

Insertion in a B Tree involves finding the appropriate leaf node and inserting the key while maintaining the B Tree properties. If a node overflows (i.e., exceeds 2t - 1 keys), it is split.

Algorithm:

  1. Start at the root.
  2. If the root is full, split it and adjust the tree.
  3. Traverse to the appropriate child node.
  4. Insert the key in the appropriate position in the leaf node.
  5. If the leaf node overflows, split it and propagate the split upwards if necessary.

Example:

def insert_non_full(node, key):
    i = len(node.keys) - 1
    if node.leaf:
        node.keys.append(0)
        while i >= 0 and key < node.keys[i]:
            node.keys[i + 1] = node.keys[i]
            i -= 1
        node.keys[i + 1] = key
    else:
        while i >= 0 and key < node.keys[i]:
            i -= 1
        i += 1
        if len(node.children[i].keys) == 2 * node.t - 1:
            split_child(node, i)
            if key > node.keys[i]:
                i += 1
        insert_non_full(node.children[i], key)

def split_child(parent, i):
    t = parent.children[i].t
    new_node = BTreeNode(t, parent.children[i].leaf)
    parent.children[i].keys, new_node.keys = parent.children[i].keys[:t-1], parent.children[i].keys[t:]
    if not parent.children[i].leaf:
        parent.children[i].children, new_node.children = parent.children[i].children[:t], parent.children[i].children[t:]
    parent.children.insert(i + 1, new_node)
    parent.keys.insert(i, parent.children[i].keys.pop(t-1))

  1. Deletion

Deletion in a B Tree is more complex than insertion and involves several cases to ensure the tree remains balanced.

Algorithm:

  1. If the key is in a leaf node, remove it directly.
  2. If the key is in an internal node:
    • If the child preceding the key has at least t keys, replace the key with the predecessor.
    • If the child following the key has at least t keys, replace the key with the successor.
    • If both children have fewer than t keys, merge the key and the two children.
  3. If the key is not in the node, ensure the appropriate child has at least t keys before descending.

Example:

def delete(node, key):
    t = node.t
    i = 0
    while i < len(node.keys) and key > node.keys[i]:
        i += 1
    if i < len(node.keys) and key == node.keys[i]:
        if node.leaf:
            node.keys.pop(i)
        else:
            if len(node.children[i].keys) >= t:
                node.keys[i] = get_predecessor(node, i)
                delete(node.children[i], node.keys[i])
            elif len(node.children[i + 1].keys) >= t:
                node.keys[i] = get_successor(node, i)
                delete(node.children[i + 1], node.keys[i])
            else:
                merge(node, i)
                delete(node.children[i], key)
    else:
        if node.leaf:
            return
        if len(node.children[i].keys) < t:
            fill(node, i)
        delete(node.children[i], key)

def get_predecessor(node, i):
    current = node.children[i]
    while not current.leaf:
        current = current.children[-1]
    return current.keys[-1]

def get_successor(node, i):
    current = node.children[i + 1]
    while not current.leaf:
        current = current.children[0]
    return current.keys[0]

def merge(node, i):
    child = node.children[i]
    sibling = node.children[i + 1]
    child.keys.append(node.keys.pop(i))
    child.keys.extend(sibling.keys)
    if not child.leaf:
        child.children.extend(sibling.children)
    node.children.pop(i + 1)

def fill(node, i):
    t = node.t
    if i != 0 and len(node.children[i - 1].keys) >= t:
        borrow_from_prev(node, i)
    elif i != len(node.children) - 1 and len(node.children[i + 1].keys) >= t:
        borrow_from_next(node, i)
    else:
        if i != len(node.children) - 1:
            merge(node, i)
        else:
            merge(node, i - 1)

def borrow_from_prev(node, i):
    child = node.children[i]
    sibling = node.children[i - 1]
    child.keys.insert(0, node.keys[i - 1])
    if not child.leaf:
        child.children.insert(0, sibling.children.pop())
    node.keys[i - 1] = sibling.keys.pop()

def borrow_from_next(node, i):
    child = node.children[i]
    sibling = node.children[i + 1]
    child.keys.append(node.keys[i])
    if not child.leaf:
        child.children.append(sibling.children.pop(0))
    node.keys[i] = sibling.keys.pop(0)

Practical Exercises with B Trees

Exercise 1: Implement a B Tree Insertion

Task: Implement the insert function for a B Tree.

Solution:

class BTree:
    def __init__(self, t):
        self.root = BTreeNode(t, True)
        self.t = t

    def insert(self, key):
        root = self.root
        if len(root.keys) == 2 * self.t - 1:
            new_root = BTreeNode(self.t, False)
            new_root.children.append(self.root)
            split_child(new_root, 0)
            self.root = new_root
        insert_non_full(self.root, key)

Exercise 2: Implement a B Tree Search

Task: Implement the search function for a B Tree.

Solution:

def search(node, key):
    i = 0
    while i < len(node.keys) and key > node.keys[i]:
        i += 1
    if i < len(node.keys) and key == node.keys[i]:
        return node
    if node.leaf:
        return None
    return search(node.children[i], key)

Exercise 3: Implement a B Tree Deletion

Task: Implement the delete function for a B Tree.

Solution:

def delete(node, key):
    t = node.t
    i = 0
    while i < len(node.keys) and key > node.keys[i]:
        i += 1
    if i < len(node.keys) and key == node.keys[i]:
        if node.leaf:
            node.keys.pop(i)
        else:
            if len(node.children[i].keys) >= t:
                node.keys[i] = get_predecessor(node, i)
                delete(node.children[i], node.keys[i])
            elif len(node.children[i + 1].keys) >= t:
                node.keys[i] = get_successor(node, i)
                delete(node.children[i + 1], node.keys[i])
            else:
                merge(node, i)
                delete(node.children[i], key)
    else:
        if node.leaf:
            return
        if len(node.children[i].keys) < t:
            fill(node, i)
        delete(node.children[i], key)

Conclusion

B Trees are powerful data structures that provide efficient insertion, deletion, and search operations. They are particularly useful in database and file system implementations due to their ability to handle large blocks of data efficiently. Understanding B Trees and their operations is crucial for any programmer working with large datasets or developing systems that require efficient data retrieval and storage.

In the next module, we will explore Graphs, another fundamental data structure with wide-ranging applications in computer science.

© Copyright 2024. All rights reserved