[食譜好菜] 用 C# 來實作 B-Tree 資料結構

2022-10-09

1185
0
C#
2022-10-18

B-Tree 是一種資料結構，大多是應用在搜尋的場景上，B-Tree 的搜尋複雜度最差是 O(log n)，這個複雜度算是相當低，表示 B-Tree 的搜尋演算法是很有效率的，常見的資料庫系統及磁碟檔案系統都有用上它，除此之外，如果我們要自建搜尋系統的話，B-Tree 或許能派上用場。

名詞定義

我們在一棵 B-Tree 上會看到有三種節點，分別是：

根節點：B-Tree 的第一個節點，也是 B-Tree 的進入節點，要增刪查資料都從它開始。
葉節點：沒有任何子節點的節點
內節點：非根節點也非葉節點的節點

每一個節點裡面至少會包含兩種資料：

索引鍵集合：存放索引鍵
子節點集合：指向子節點

把 B-Tree 畫出來之後，大概就類似像這樣：

原則

B-Tree 是一種自平衡樹，所謂的自平衡樹，指的是為了得到更高效率的查詢，所有的葉子節點的深度會差不多，接近平衡，因此它就有一些原則需要去遵守，來維持樹的平衡。

在建立 B-Tree 的時候，會需要給定一個參數 - m 階（Degree），m 的值必須大於等於 3，小於 3 會無法遵守 B-Tree 的原則。

我假定要建立一棵 5 階的 B-Tree，下面則是它的原則：

每個節點最多只能有 5 - 1 個索引鍵
節點最少要有 Math.Ceiling(5/2) - 1 個索引鍵，根節點除外。
索引鍵必須是有序的

以上是主要的原則，這些主要的原則會衍伸出下面的原則：

每個節點的子節點最多只能有 5 個
非葉節點至少會有 Math.Ceiling(5/2) 個子節點
索引鍵以升冪排序（左小右大），則任一索引鍵以左（含子節點內）的索引鍵，皆比該索引鍵小，以右（含子節點內）的索引鍵，皆比該索引鍵大；反之亦然。

夯不啷噹大概就這六條原則，這六條原則會引領我們將 B-Tree 給建好。

實作 - 新增索引鍵

我們要實作的有兩個部分：新增索引鍵及刪除索引鍵，我這邊推薦一個舊金山大學做的一個網頁 - B-Tree Visualization，這個網頁將 B-Tree 建立的變化過程做成動畫，它可以加快或放慢動畫的播放速度，甚至還能將動畫定格在某個步驟，能成為我們在實作 B-Tree 時的小幫手，之後我使用到的 B-Tree 的圖，也都擷取自這個網頁。

接下來，我先建立一個 BTreeNode 的類別，將 B-Tree 節點的屬性先宣告出來，這些屬性分別是：

Keys：索引鍵陣列，陣列長度為 degree（階）。
KeyCount：索引鍵個數
Children：子節點陣列，陣列長度為 Math.Ceiling(degree / 2d) - 1。
Parent：父節點
IsRoot：是否為根節點？
IsLeaf：是否為葉節點？

這個會有一個實作上要調整的地方，我們每個節點原則上最多只能有 degree - 1 個索引鍵，但是在建立 B-Tree 的過程當中，為了方便運算，索引鍵陣列在設計上我們允許容納 degree 個。

internal class BTreeNode
{
    private readonly int maxKeyLength;
    private readonly int minKeyLength;

    public BTreeNode(int degree, BTreeNode parent)
    {
        this.maxKeyLength = degree;
        this.minKeyLength = (int)Math.Ceiling(degree / 2d) - 1;
        this.Keys = new int?[degree];
        this.Parent = parent;
        this.Children = new BTreeNode[degree + 1];
    }

    public int?[] Keys { get; }

    public int KeyCount { get; private set; }

    public BTreeNode[] Children { get; }

    public BTreeNode Parent { get; private set; }

    public bool IsRoot => this.Parent == null;

    public bool IsLeaf => this.Children[0] == null;
}

然後，還需要 BTree 的類別做為 BTreeNode 的容器，也是實際提供 API 給外部呼叫的類別。

public class BTree
{
    private readonly BTreeNode root;

    public BTree(int degree)
    {
        this.root = new BTreeNode(degree, null);
    }

    public int Count { get; private set; }

    // For Testing and Debugging
    internal BTreeNode root => this.root;
}

接下來，我就用亂數取 20 個 1~99 之間不重複的整數做為索引鍵，將它們一一塞進一個 4 階的 B-Tree 做為範例，這 20 個整數分別是 48,62,26,53,35,94,55,79,23,78,8,72,50,67,74,38,84,69,17,96，新增索引鍵一律先找到葉節點再執行新增，而根據原則，4 階 B-Tree 的節點最多只能有 3 個索引鍵，所以當我們新增到 53 這個整數的時候，會進行節點分割。

分割的邏輯是先找中間的索引鍵，中間位置的算法是 Math.Ceiling(索引鍵數量 / 2) - 1，然後新增左右兩個子節點，將中間索引鍵以左的索引鍵塞到左節點，以右的索引鍵塞到右節點。

public class BTree
{
    // ...

    public void Add(int key)
    {
        var node= FindLeaf(this.root, key);

        node.Add(key);

        this.Count++;
    }
    
    private static BTreeNode FindLeaf(BTreeNode node, int key)
    {
        if (node.IsLeaf) return node;

        var i = 0;

        for (; i < node.KeyCount; i++)
        {
            if (key.CompareTo(node.Keys[i]) < 0)
            {
                return FindLeaf(node.Children[i], key);
            }
        }

        return FindLeaf(node.Children[i], key);
    }
}

internal class BTreeNode
{
    // ...

    public void Add(int key)
    {
        this.Insert(key);

        this.SplitIfNecessary();
    }

    private static void SetChild(BTreeNode node, int childIndex, BTreeNode child)
    {
        if (child != null)
        {
            child.Parent = node;
        }

        node.Children[childIndex] = child;
    }

    private int Insert(int key)
    {
        var index = 0;

        for (; index < this.KeyCount; index++)
        {
            var comparison = key.CompareTo(this.Keys[index]);

            if (comparison == 0) return -1;

            if (comparison < 0) break;
        }

        if (this.KeyCount > 0 && index < this.KeyCount)
        {
            for (var i = this.KeyCount; i > index; i--)
            {
                this.Keys[i] = this.Keys[i - 1];
                this.Children[i + 1] = this.Children[i];
            }

            this.Children[index + 1] = this.Children[index];
            this.Children[index] = default;
        }

        this.Keys[index] = key;
        this.KeyCount++;

        return index;
    }

    private void SplitIfNecessary()
    {
        if (this.KeyCount < this.maxKeyLength) return;

        var medianIndex = (int)Math.Ceiling(this.KeyCount / 2d) - 1;

        var index = 0;

        var leftNode = new BTreeNode(this.maxKeyLength, this);

        SetChild(leftNode, 0, this.Children[index]);

        for (var i = index; index < medianIndex; index++)
        {
            leftNode.Keys[i] = this.Keys[index];
            leftNode.KeyCount++;

            SetChild(leftNode, i + 1, this.Children[index + 1]);

            i++;
        }

        index++;

        var rightNode = new BTreeNode(this.maxKeyLength, this);

        SetChild(rightNode, 0, this.Children[index]);

        for (var i = 0; index < this.KeyCount; index++)
        {
            rightNode.Keys[i] = this.Keys[index];
            rightNode.KeyCount++;

            SetChild(rightNode, i + 1, this.Children[index + 1]);

            i++;
        }

        var medianKey = this.Keys[medianIndex];

        if (this.IsRoot)
        {
            Array.Clear(this.Keys, 0, this.Keys.Length);
            Array.Clear(this.Children, 0, this.Children.Length);

            this.Keys[0] = medianKey;
            this.Children[0] = leftNode;
            this.Children[1] = rightNode;

            this.KeyCount = 1;
        }
        else
        {
            var keyIndexInParent = this.Parent.Insert(medianKey.Value);

            SetChild(this.Parent, keyIndexInParent, leftNode);
            SetChild(this.Parent, keyIndexInParent + 1, rightNode);

            this.Parent.SplitIfNecessary();
        }
    }
}

新增索引鍵的邏輯就是這樣滿了分割、滿了分割，直到將所有索引鍵塞進 B-Tree。

實作 - 刪除索引鍵

刪除索引鍵相較於新增索引鍵複雜許多，一般來說，索引鍵一旦新增進 B-Tree，要被刪除的機會不大，除非真的 B-Tree 裡面的索引鍵大多數已經確定沒有被搜尋的可能，那了不起將整個 B-Tree 重建就完了，不過，我們依舊把刪除的邏輯給弄出來。

刪除索引鍵的邏輯是先找到索引鍵的所在節點，將索引鍵刪除後，找到左邊子節點內最接近被刪除索引鍵的索引鍵，移上來替代。

假定我要刪除 48 這個索引鍵，可以看到下圖，把 48 刪掉之後，拿 38 上來取代原先 48 的位置。

在刪除索引鍵之後，如果索引鍵個數小於可允許的最小數量（Math.Ceiling(4/2) - 1），若左右鄰居節點其中一個的索引鍵個數大於可允許的最小數量，則將父索引鍵填補到被刪除的索引鍵位置，然後從鄰居的索引鍵當中搬移最接近的索引鍵到父節點的位置。

假定我要刪除 79，則 78 會填補到原先 79 的位置，再從左鄰居節點搬 74 上到父索引鍵的位置。

如果左右鄰居節點的索引鍵個數皆小於等於可允許的最小數量，則優先與左鄰居及父索引鍵進行節點合併。

假定我依序刪除 78,69，則最終 72,74 會合併成一個節點。

public class BTree
{
    // ...

    public void Remove(int key)
    {
        BTreeNode node;
        int index;

        if (this.root == null)
        {
            throw new ArgumentNullException(nameof(this.root));
        }
        else
        {
            node = FindNode(this.root, key, out index);
        }

        if (node != null)
        {
            node.RemoveAt(index);

            this.Count--;
        }
    }

    private static BTreeNode FindNode(BTreeNode node, int key, out int index)
    {
        var i = 0;

        for (; i < node.KeyCount; i++)
        {
            var comparison = key.CompareTo(node.Keys[i]);

            if (comparison == 0)
            {
                index = i;
                return node;
            }

            if (comparison < 0)
            {
                return FindNode(node.Children[i], key, out index);
            }
        }

        if (!node.IsLeaf)
        {
            return FindNode(node.Children[i], key, out index);
        }

        index = -1;
        return null;
    }
}

internal class BTreeNode
{
    // ...

    public void RemoveAt(int index)
    {
        if (this.IsLeaf)
        {
            this.DeleteAt(index);

            this.MergeIfNecessary();
        }
        else
        {
            this.Keys[index] = default;

            var leftLeaf = FindLeftLeaf();
            
            this.Keys[index] = leftLeaf.Keys[leftLeaf.KeyCount - 1];

            leftLeaf.RemoveAt(leftLeaf.KeyCount - 1);
        }

        BTreeNode FindLeftLeaf()
        {
            var leaf = this.Children[index];

            while (!leaf.IsLeaf)
            {
                leaf = leaf.Children[leaf.KeyCount];
            }

            return leaf;
        }
    }

    private void MergeIfNecessary()
    {
        if (this.IsRoot) return;
        if (this.KeyCount >= this.minKeyLength) return;

        var childIndexInParent = this.FindChildIndexInParent();

        var leftSibling = childIndexInParent > 0 ? this.Parent.Children[childIndexInParent - 1] : default;
        var rightSibling = childIndexInParent < this.maxKeyLength ? this.Parent.Children[childIndexInParent + 1] : default;

        if (leftSibling != null && leftSibling.KeyCount > this.minKeyLength)
        {
            var parentKeyIndex = childIndexInParent - 1;

            if (this.KeyCount == 0)
            {
                this.Children[1] = this.Children[0];
            }

            this.Insert(this.Parent.Keys[parentKeyIndex].Value);
            this.Parent.Keys[parentKeyIndex] = leftSibling.Keys[leftSibling.KeyCount - 1];

            SetChild(this, 0, leftSibling.Children[leftSibling.KeyCount]);

            leftSibling.Children[leftSibling.KeyCount] = default;
            leftSibling.DeleteAt(leftSibling.KeyCount - 1);
        }
        else if (rightSibling != null && rightSibling.KeyCount > this.minKeyLength)
        {
            var parentKeyIndex = childIndexInParent;

            this.Insert(this.Parent.Keys[parentKeyIndex].Value);
            this.Parent.Keys[parentKeyIndex] = rightSibling.Keys[0];

            SetChild(this, this.KeyCount, rightSibling.Children[0]);
            
            for (var i = 0; i < rightSibling.KeyCount; i++)
            {
                rightSibling.Children[i] = rightSibling.Children[i + 1];
            }

            rightSibling.Children[rightSibling.KeyCount] = rightSibling.Children[rightSibling.KeyCount + 1];

            rightSibling.DeleteAt(0);
        }
        else
        {
            var parentKeyIndex = leftSibling != null ? childIndexInParent - 1 : childIndexInParent;

            var mergedNode = leftSibling ?? this;
            var manureNode = mergedNode == this ? rightSibling : this;

            mergedNode.Keys[mergedNode.KeyCount] = this.Parent.Keys[parentKeyIndex];
            mergedNode.KeyCount++;

            SetChild(mergedNode, mergedNode.KeyCount, manureNode.Children[0]);

            for (var i = 0; i < manureNode.KeyCount; i++)
            {
                mergedNode.Keys[mergedNode.KeyCount] = manureNode.Keys[i];
                mergedNode.KeyCount++;

                SetChild(mergedNode, mergedNode.KeyCount, manureNode.Children[i + 1]);
            }

            for (var i = parentKeyIndex; i < this.Parent.KeyCount; i++)
            {
                this.Parent.Children[i] = this.Parent.Children[i + 1];
            }

            this.Parent.Children[this.Parent.KeyCount] = default;
            this.Parent.Children[parentKeyIndex] = mergedNode;
            this.Parent.DeleteAt(parentKeyIndex);

            if (this.Parent.IsRoot && this.Parent.KeyCount == 0)
            {
                Array.Clear(this.Parent.Keys, 0, this.Parent.Keys.Length);
                Array.Clear(this.Parent.Children, 0, this.Parent.Children.Length);

                for (var i = 0; i < mergedNode.KeyCount; i++)
                {
                    this.Parent.Keys[i] = mergedNode.Keys[i];

                    SetChild(this.Parent, i, mergedNode.Children[i]);
                }

                SetChild(this.Parent, mergedNode.KeyCount, mergedNode.Children[mergedNode.KeyCount]);

                this.Parent.KeyCount = mergedNode.KeyCount;
            }
            else
            {
                this.Parent.MergeIfNecessary();
            }
        }
    }

    private static void SetChild(BTreeNode node, int childIndex, BTreeNode child)
    {
        if (child != null)
        {
            child.Parent = node;
        }

        node.Children[childIndex] = child;
    }

    private int FindChildIndexInParent()
    {
        for (var i = 0; i < this.Parent.Children.Length; i++)
        {
            if (this.Parent.Children[i] == this) return i;
        }

        return -1;
    }

    private void DeleteAt(int index)
    {
        this.KeyCount--;

        for (var i = index; i < this.KeyCount; i++)
        {
            this.Keys[i] = this.Keys[i + 1];
        }

        this.Keys[this.KeyCount] = default;
    }
}

雖然說刪除索引鍵比新增索引鍵複雜，但終究歸納之後也只是重複著「刪除 -> 搬移 -> 合併」。

BTree.Find() vs List<T>.FirstOrDefault() vs List<T>.SingleOrDefault()

最後我們來比較一下 BTree 搜尋索引鍵的效能，跟我們常用的 List<T>.FirstOrDefault() 及 List<T>.SingleOrDefault() 比起來是否真的有比較好？

我隨機從 1 ~ 9999 取 1000 個不重複整數出來建立成一棵 7 階的 B-Tree，然後再隨機從 1 ~ 9999 取 100 個不重複整數出來逐一執行搜尋，底下是跑 Benchmark 的結果。

不過說實在的，只是要搜尋單一個索引鍵，Key-Value 系列的資料結構或許更合適一些，而 B-Tree 範圍搜尋的效能又比 List<T>.Where() 差，所以真的要使用 B-Tree 還不如用它的進化版 B+Tree（B Plus Tree），關於 B+Tree，之後有機會再來分享，以上，針對 B-Tree 的實作就分享給大家。

參考資料

< Source Code >

軟體主廚的程式料理廚房

軟體跟料理一樣，都變成了每人每天的生活必需品。