class: center, middle name: title # CS 10C - Trees ![:newline] .Huge[*Prof. Ty Feng*] ![:newline] .Large[UC Riverside - WQ 2026] ![:newline 6]  .footnote[Copyright © 2026 Joël Porquet-Lupine and Ty Feng - [CC BY-NC-SA 4.0 International License](https://creativecommons.org/licenses/by-nc-sa/4.0/)] ??? - So far we have talked about 3 ADTs: Lists, stacks, and queues. - And we have seen that these ADTs could be implemented by different data structures: arrays, singly linked lists, doubly linked lists. - Today, we will start talking about another type of data structure, called trees. - Before we dive into the details about trees though, we need to understand why they could be useful in the first place. For that, let's talk about searching data. --- layout: true # Searching data --- ## Introduction - How to retrieve specific items from a collection? - Such as a record from a database? - Items are usually described by a **(key, value)** pair ![:newline 1] ### Example: DNS lookup - Given a domain name, find the corresponding IP address .center[ | Domain name (Key) | IP address (Value) | |--------------------|--------------------| | www.ucr.edu | 23.185.0.2 | | www.cs.ucr.edu | 169.235.30.15 | | www.ucsb.edu | 23.185.0.3 | | www.ucdavis.edu | 23.185.0.4 | | www.wikipedia.org | 185.15.59.224 | ] ??? - Note that the reason why ucr.edu, ucsb.edu, and ucdavis.edu have similar IP addresses is that they all use the same cloud provider for their website (fastly.net)! --- ### Ubiquitous operation - Searching data is a fundamental operations in programming .center.footnotesize[ | Application | Purpose of search | Key | Value | |-------------------|------------------------------|---------------|----------------------| | DNS | Find IP address | Domain name | IP address | | Reverse DNS | Find domain name | IP | Domain name | | File system | Find file on disk | Filename | Location on disk | | Book index | Find relevant pages | Term | List of page numbers | | Web search | Find relevant web pages | Keyword | List of URLs | | Compiler | Find properties of variables | Variable name | Type and value | | Dean's honor list | Find best students | GPA | Student's name | ] -- ![:newline 1] .lcol[ ### Related operations - Finding maximum or minimum - Floor, ceiling, range, rank, etc. ] ??? - floor value: find largest key smaller or equal to searched key - ceil value: find smallest key greater or equal to searched key - rank: rank of key in collection -- .rcol[ ### Questions - How to make searching as efficient as possible? - What are the best *search data structures* available? ] --- ## Linear search .lcol[ ### Definition - Sequentially check each item of list until match is found - Also called *sequential search* ] .rcol[  ] ![:flush 1] ### Characteristics - Very simple to implement - Only practical when list is short or for single-search scenario ??? - Apparently for list up to 100 items, linear search outperforms sort + binary search - Makes sense since sorting has a cost -- ![:newline 1] ### Related data structures and complexity .footnotesize[ | Data structure | Search | Insert | Delete | Find min/max | |----------------------|--------|-----------------|------------------------|--------------| | Unsorted array | $O(N)$ | $O(1)$ amortzed | (search time +) $O(1)$ | $O(N)$ | | Unsorted linked list | $O(N)$ | $O(1)$ | (search time +) $O(1)$ | $O(N)$ | | Sorted linked list | $O(N)$ | $O(N)$ | (search time +) $O(1)$ | $O(1)$ | ] ??? - Complexity: - unsorted array: - insert is O(1) amortized - for delete, we can swap the item to delete with last element and simply pop() last item, which is O(1) amortized ![:newline 1] - Why don't we have "sorted array" in this list of data structures? * See next slide: if we have a sorted array, then we don't want to use a linear search --- ## Sorted array and binary search ### Definition - Compare with middle item of sorted array for match; if not a match, divide search interval in half and repeat  ### Characteristics - Requires to have a sorted array - Either by construction, when inserting/deleting items - Or by sorting it before the search (at least $O(N \lg N)$ in the worst-case) -- ![:newline 1] ### Related data structures and complexity .footnotesize[ | Data structure | Search | Insert | Delete | Find min/max | |----------------|------------|--------|------------------------|--------------| | Sorted array | $O(\lg N)$ | $O(N)$ | (search time +) $O(N)$ | $O(1)$ | ] --- ## Summary .center.footnotesize[ | Data structure | Search | Insert | Delete | Find min/max | |----------------------|------------|--------|------------------------|--------------| | Unsorted array | $O(N)$ | $O(1)$ | (search time +) $O(1)$ | $O(N)$ | | Unsorted linked list | $O(N)$ | $O(1)$ | (search time +) $O(1)$ | $O(N)$ | | Sorted linked list | $O(N)$ | $O(N)$ | (search time +) $O(1)$ | $O(1)$ | | Sorted array | $O(\lg N)$ | $O(N)$ | (search time +) $O(N)$ | $O(1)$ | ] -- ![:newline 1] ## Problem - For large sets of items, the linear complexity of linked lists and arrays for some operations is prohibitive - Need for a new search data structure for which the **average** time of most operations is $O(\lg N)$... -- ![:newline 1] .center.footnotesize[ | Data structure | Search | Insert | Delete | Find min/max | |--------------------------|------------|------------|------------|--------------| | .red[Binary search tree] | $O(\lg N)$ | $O(\lg N)$ | $O(\lg N)$ | $O(\lg N)$ | ] --- layout: true # Trees --- ## Recursive definition A tree can be defined recursively as - A collection of **nodes** starting at a **root** node, - Where each node is connected to zero or more **child** nodes via **edges**, - With the constraints that no reference is duplicated, and no node points back to the root.  ??? - Note that a tree represents both an ADT and a data structure - ADT - No exact ADT definition as part of std library or Boost - But a tree can be implemented using various data structures (eg linked list, array, AVL, RB tree, etc.) - Data structure - It's also the data structure for many other ADTs such as map, set, priority queues --- ## Some terminology... ### Root The root node is the tree's top node  > There can be only one! --- ### Edge An edge is a connecting link between two nodes  ![:newline 2] - A tree composed of **N** nodes has **N - 1** edges --- ### Parent A parent is a node which is the *direct predecessor* of another node  ??? - About half of all nodes are parents --- ### Child A child is a node which is the *direct descendant* of another node  ![:newline 2] - In a tree, all the nodes except the root are child nodes ??? - Nodes can be parent and child at the same, it's not exclusive - We often refer to parent/child locally - eg E is the parent of J, J is not the child of G nor B --- ### Descendant/Ancestor Nodes connected by one edge or more  ![:newline 2] - A descendant node is also known as *subchild* ??? - Think grandparents, grandchildren here --- ### Siblings Siblings are all the nodes descending from the same parent  --- ### Leaf A leaf node is a node which has no child  ![:newline 2] - Also called *external*, *outer* or *terminal* node --- ### NIL The absence of child node is sometimes explicitly represented as NIL  ![:newline] - When implementing trees as data structures, NIL nodes are usually represented by a NULL pointer ??? - Especially popular in books or in the literature - Can make a parallel with the end of a linked-list via a drawing --- ### Internal node An internal node is a node which has at least one child  - Opposite of leaf node and also called *inner* or *branch* node --- ### Degree The degree of a node is the number of its children  - A leaf node has always degree 0 - The degree of a tree is the maximum degree of any of its nodes ??? - When a tree has nodes with many children, we often say that it has a high "fanout" - eg B-tree can often have a couple hundred children, very high fanout --- ### Level The level of a node is one greater than the level of its parent, knowing that the level of the root node is 1  ??? - Could think of this as family generations --- ### Path A path is a sequence of nodes and edges connecting a node with a descendant  ![:newline 2] - The **length** of the path is the number of edges composing it ??? - Later, when we talk about graphs, which are a generalization of trees, we will see some algorithms to find the shortest path between two nodes - Essentially what your GPS does to give you the best directions --- ### Depth The depth of a node is the path length from the root to this node  - The depth of root is 0 - The depth of a tree is equal to the depth of the deepest leaf --- ### Height The height of a node is the length of the longest path from a leaf to this node  - All leaves are at height 0 - The height of a tree is equal to the height of the root from the deepest leaf *(which is always equal to the depth of the tree)* ??? - Opposite to depth - depth from top to bottom - height from bottom to top --- ### Subtree A subtree consists of a child node and all of its descendants  ??? - A leaf node is also a subtree - A subtree has all the same properties as a tree: root node, height, depth, etc. --- ## Typical implementations ### Array of children .lcol.small[ ```C template
class Tree
{ struct Node { T item; std::vector
children; }; std::unique_ptr
root; ... }; ``` ] .rcol[  ] -- ![:flush 1] #### Characteristics - Most intuitive representation but... - Node degrees are not necessarily known in advance for the construction - Can lead to waste of space ??? - Very difficult to implement if not using vectors --- ### Linked list of siblings .lcol.small[ ```C template
class Tree
{ struct Node { T item; std::unique_ptr
first_child; std::unique_ptr
next_sibling; }; std::unique_ptr
root; ... }; ``` ] .rcol[  ] -- ![:flush 1] #### Characteristics - Adaptable to any degree - Predictable memory complexity --- template: title --- layout: false # Recap ## Efficient searching .center.scriptsize[ | Data structure | Search | Insert | Delete | Find min/max | |--------------------------|------------|------------|------------------------|--------------| | Unsorted array | $O(N)$ | $O(1)$ | (search time +) $O(1)$ | $O(N)$ | | Unsorted linked list | $O(N)$ | $O(1)$ | (search time +) $O(1)$ | $O(N)$ | | Sorted linked list | $O(N)$ | $O(N)$ | (search time +) $O(1)$ | $O(1)$ | | Sorted array | $O(\lg N)$ | $O(N)$ | $O(N)$ | $O(1)$ | | .red[Binary search tree] | $O(\lg N)$ | $O(\lg N)$ | $O(\lg N)$ | $O(\lg N)$ | ] ![:newline 1] ## Tree structure .lcol70[  ] .rcol30[ ### Terminology Node, root, edge, parent, child, siblings, leaf, internal node, degree... ] --- layout: true # Binary trees --- ## Definition A **binary tree** is a tree in which each node has **at most two** children - Usually referred to as *left* child and *right* child  ??? Keep in mind that - The depth of an average binary tree is considerably smaller than N (that's the goal!) - Although the depth can be N-1 in the worst case (before we talk about self-balancing trees) --- ## Types ### Full binary tree In a full binary tree, every node has either 0 or 2 children  ??? Full binary trees are used for: - expression tree (each leaf is an operand, internal nodes are operators) - decision tree with yes/no, true/false results - huffman coding: internal node always has two children, otherwise it would be a leaf encoding a symbol --- ### Complete binary tree In a complete binary tree, every level, except possibly the last one, is completely filled and all nodes in the last level are as far left as possible  ??? - Question: where should the next new node go to maintain a complete binary tree? --- ### Perfect binary tree In a perfect binary tree, all internal nodes have two children and all leaf nodes have the same depth  ??? - Apparently, the literature on the subject of binary trees is not well-standardized which can be confusing - Sometimes perfect binary trees are actually called complete, and complete binary trees are then called almost/nearly complete --- ## Traversal - Many algorithms require all nodes of a binary tree to be visited and their contents processed or examined - Printing, searching, updating, etc. - Unlike linear data structures (such as arrays or linked lists), trees do not inherently have a linear order in which they can be traversed ![:newline 2]  --- ### Traversal types - **Depth-first** order - From root down to leaves - **Breadth-first** order - Level by level ![:newline 2]  --- ### Depth-first order .lcol[ #### General recursive pattern From a node N - (N) Process N itself - (L) Recurse on N's left subtree - (R) Recurse on N's right subtree These steps can be done in any order. ] -- .rcol.small[ #### Pre-order (NLR)  ] ??? - pre-order: each parent is traversed before any of its children -- ![:flush] .rcol.small[ #### Post-order (LRN)  ] ??? - post-order: children are traversed before their respective parents -- .lcol.small[ #### In-order (LNR)  ] ??? - in-order: left-subtree, then node, then right-subtree ![:newline 1] - Sailboat navigating around tree. - Stops on west, east, or south. ![:newline 1] - Live quiz --- ### Breadth-first order #### Level-order .lcol[ Visit every node on a level before going to a lower level ] .rcol[  ] ??? - Once again, repeat that there's no *correct* traversal order - The fact that this one prints the letter in alphabetical order is purely coincidental --- ## Data structure implementation ### Node-based .lcol.footnotesize[ ```C template
class BinaryTree { public: struct Node{ T item; std::unique_ptr
left; std::unique_ptr
right; }; std::unique_ptr
root; }; ```  ] .rcol[  ] ??? - Ownership from parent to children - nullptr and NIL -- ![:flush 0.5] .lcol25[ #### Example  ] ??? ![:newline 1] - Static construction of tree - probably never what you want, but it's an interesting exercise -- .rcol75.scriptsize[ ```C // Make a type alias using Node = BinaryTree
::Node; ... // Statically build a binary tree std::unique_ptr
nj(new Node{'j'}); std::unique_ptr
ni(new Node{'i'}); std::unique_ptr
nh(new Node{'h'}); std::unique_ptr
ng(new Node{'g'}); std::unique_ptr
nf(new Node{'f', std::move(nj)}); std::unique_ptr
ne(new Node{'e'}); std::unique_ptr
nd(new Node{'d', std::move(nh), std::move(ni)}); std::unique_ptr
nc(new Node{'c', std::move(nf), std::move(ng)}); std::unique_ptr
nb(new Node{'b', std::move(nd), std::move(ne)}); std::unique_ptr
na(new Node{'a', std::move(nb), std::move(nc)}); BinaryTree
bin_tree { std::move(na) }; ```  ] ??? ![:newline 1] - `using` keyword for typedefs - introduced in c++11 - especially useful with templates --- ## Traversal implementations .lcol[ ### Recursive pre-order .footnotesize[ ```C void PreorderRecur(Node *n) { if (!n) return; std::cout << n->item << ' '; PreorderRecur(n->left.get()); PreorderRecur(n->right.get()); } ```  ```C std::cout << "Recursive pre-order:\n"; PreorderRecur(bin_tree.root.get()); std::cout << std::endl; ```  ] ] -- .rcol[ ### Recursive in-order .footnotesize[ ```C void InorderRecur(Node *n) { if (!n) return; InorderRecur(n->left.get()); std::cout << n->item << ' '; InorderRecur(n->right.get()); } ```  ```C std::cout << "Recursive in-order:\n"; InorderRecur(bin_tree.root.get()); std::cout << std::endl; ```  ] ] -- .lcol[ ### Recursive post-order .footnotesize[ ```C void PostorderRecur(Node *n) { if (!n) return; PostorderRecur(n->left.get()); PostorderRecur(n->right.get()); std::cout << n->item << ' '; } ```  ```C std::cout << "Recursive post-order:\n"; PostorderRecur(bin_tree.root.get()); std::cout << std::endl; ```  ] ] -- .rcol[ ![:newline 1] - All these implementations based on recursive function calls require a stack call proportional to the height of the binary tree ] ??? - What can happen if binary tree is very deep? -- .rcol[ - What can happen if the binary tree is very deep? Is there another way to traverse a tree? - Stack-based iterative traversal! ] ??? - Will do that in discussion during week 5 --- ### In-Order Traversal Produces Sorted Output .boxed[ **Critical Insight:** In-order traversal of a BST visits nodes in **ascending key order** ] -- ### Why does this work? .lcol60[ **Recursive reasoning:** 1. Visit left subtree first - By BST property: all keys < current node 2. Visit current node 3. Visit right subtree last - By BST property: all keys > current node This pattern applies **recursively** at every level: - Left < Node < Right holds everywhere - Result: sorted sequence! ] -- .rcol40[ ### Practical Applications **Sort data:** ```cpp void Sort(Node* n, vector
& v) { if (!n) return; Sort(n->left, v); v.push_back(n->key); Sort(n->right, v); } ``` **Find k smallest elements:** - Stop after k nodes visited **Find k largest elements:** - Reverse in-order (right-node-left) - Stop after k nodes ] --- ### Recursive level-order .lcol60.footnotesize[ ```C void LevelRecur(Node *n, int level) { if (!n) return; if (level == 1) { std::cout << n->item << ' '; } else { LevelRecur(n->left.get(), level - 1); LevelRecur(n->right.get(), level - 1); } } void LevelorderRecur(Node *root, int levels) { for (int l = 1; l <= levels; l++) LevelRecur(root, l); } ```  ```C std::cout << "Recursive level-order:\n"; LevelorderRecur(bin_tree.root.get(), 4); std::cout << std::endl; ```  ] .rcol40[  ] -- ![:flush 1] - What is the complexity of this recursive level-order traversal? -- - $O(N \lg N)$ ![:newline 0.5] - Is there another way of performing a level-order traversal? ??? - The outer loop in `LevelorderRecur()` is $O(\lg N)$, assuming that the tree is mostly complete - Each call to `LevelRecur()` traverses *all the nodes* up to a certain level: * level 0: 1 node * level 1: 1 + 2 nodes * level 2: 1 + 2 + 4 nodes * level l: $2^0 + 2^1 + 2^2 + ... + 2^{l-1}$ * Only the last call is fully $O(2^{\lg N}) = O(N)$ but each call is roughly proportional to N (it's like a `for(j = 0; j < i; j++)` kind of situation) -- - Only $O(N)$ if using a queue-based iterative traversal! ??? - Will also do that in week 5's discussion --- template: title --- layout: false # Recap .lcol[ ## Binary tree  ] -- .rcol[ ### Traverals .lcol[  .center.footnotesize[Pre-order] ] .rcol[  .center.footnotesize[In-order] ] .lcol[  .center.footnotesize[Post-order] ] .rcol[  .center.footnotesize[Level-order] ] ] ??? - Sailboat method -- .lcol[ ![:newline 3] ### Node-based implementation  ] .rcol[ ![:newline 1] .tiny[ ```C template
class BinaryTree { public: struct Node{ T item; std::unique_ptr
left; std::unique_ptr
right; }; std::unique_ptr
root; }; ```  ] .comment[ ### Array-based implementation  ] ] ??? - This node-based representation (similar to a linked list) is not the only data structure available for representing a binary tree --- # Binary trees ## Alternative data structure implementation ### Array-based data structure  .rcol40.small[ ```C template
class BinaryTree { public: std::unique_ptr
items; }; ``` ] ??? - Other representation of a binary tree * Works particularly well for *complete* trees - Will see this representation again when we talk about the data structure called a heap in the context of a priority queue * In a few weeks -- .lcol60[ For node at index $i$ - Left child at index $2i + 1$ - Right child at index $2i + 2$ - Parent (if not root) at index $\lfloor{\frac{i-1}{2}}\rfloor$ ] -- ![:flush 1] #### Characteristics - Intrinsic breadth-first order for fast level-order traversal - No waste of space as long as binary tree is complete - Often used for *binary heaps* (see later) --- layout: true # Binary search trees --- ## Definition A **Binary search tree (BST)** is a special kind of binary tree in which - Each node contains one *comparable* key - (and optionally some associated data) - Each node's key is **greater** to any key stored in its **left subtree**, - Each node's key is **less than** any key stored in its **right subtree**.  --- ## Typical API ```C public: // Return whether @key is found in tree bool Contains(const K& key); // Return max key in tree const K& Max(); // Return min key in tree const K& Min(); // Insert @key in tree void Insert(const K &key); // Remove @key from tree void Remove(const K &key); // Print tree in-order void Print(); ```  ??? - Very comparable to List ADT - Contains() akin to Find() - No Get() with a position in sequence since not a sequence -- ![:newline 3] Other typical methods include - `Floor(key)`: Find largest key smaller or equal to given key - `Ceil(key)`: Find smallest key greater of equal to given key - `Rank(key)`: Find key rank in collection - Etc. ??? - Today, we will study all these methods except `Remove()` * Remove is surprisingly complex while the others are quite straightforward! --- ## Data structure ```C template
class BST { ... private: struct Node{ K key; std::unique_ptr
left; std::unique_ptr
right; }; std::unique_ptr
root; // Useful recursive helper methods Node* Min(Node *n); void Insert(std::unique_ptr
&n, const K &key); void Remove(std::unique_ptr
&n, const K &key); void Print(Node *n, int level); }; ```  ![:newline 1] - Here, simple implementation of a **set** - No associated data item, but just a key of a *comparable* type - Implementation handling pairs of key-value would be a **map** ??? - recursive version for min() - but iterative version for max() - recursive versions of insert/remove - recursive print, so that we can print the level of the node --- ## `Contains()` .lcol[ .small[ ```C template
bool BST
::Contains(const K &key) { Node *n = root.get(); while (n) { if (key == n->key) return true; if (key < n->key) n = n->left.get(); else n = n->right.get(); } return false; } ```  ] ![:newline 1] - Iterative version - Compare given key to node's key and follow left or right subtree accordingly ] -- .rcol[ ### Example  ![:newline 2] .small[ ```C std::cout << "Tree contains 54? " << bst.Contains(54) << "\n"; std::cout << "Tree contains 55? " << bst.Contains(55) << "\n"; ```  ```terminal Tree contains 54? 1 Tree contains 55? 0 ``` ] ] ??? - Point out to NIL idea for `Contains(55)` --- ## `Max()` .lcol[ .small[ ```C template
const K& BST
::Max(void) { Node *n = root.get(); while (n->right) n = n->right.get(); return n->key; } ```  ] ![:newline 1] - Iterative version - Go as far right as possible to find the max key ] -- .rcol[ ### Example  ![:newline 2] .small[ ```C std::cout << "Max value: " << bst.Max() << "\n"; ```  ```terminal Max value: 99 ``` ] ] ??? - Comparison with List ADT - Need to go through the entire sequence to find Max value --- ## `Min()` .lcol60[ .small[ ```C template
const K& BST
::Min(void) { return Min(root.get())->key; } template
typename BST
::Node* BST
::Min(Node *n) { if (n->left) return Min(n->left.get()); else return n; } ```  ] ![:newline 1] - Recursive version using a helper - Go as far left as possible to find the min key ] -- .rcol40[ ### Example  ![:newline 2] .small[ ```C std::cout << "Min value: " << bst.Min() << "\n"; ```  ```terminal Min value: 2 ``` ] ] --- ## `Insert()` .lcol75.footnotesize[ ```C template
void BST
::Insert(const K &key) { Insert(root, key); } template
void BST
::Insert(std::unique_ptr
&n, const K &key) { if (!n) n = std::unique_ptr
(new Node{key}); else if (key < n->key) Insert(n->left, key); else if (key > n->key) Insert(n->right, key); else std::cerr << "Key " << key << " already inserted!\n"; } ```  ] .rcol25.small[ - Recursive version using a helper - Find the right location for the new key ] -- .lcol33.footnotesize[ ```C std::vector
keys{51, 43, 93, 18, 42, 99, 54, 2, 74, 74}; BST
bst; for (auto i : keys) bst.Insert(i); ```  ] .rcol66[  ] --- ## `Print()` .lcol.footnotesize[ ```C template
void BST
::Print() { Print(root.get(), 1); std::cout << std::endl; } template
void BST
::Print(Node *n, int level) { if (!n) return; Print(n->left.get(), level + 1); std::cout << n->key << " [" << level << "] "; Print(n->right.get(), level + 1); } ```  ] -- .rcol.small[ ### Example ```C bst.Print(); ```  ```terminal 2 [4] 18 [3] 42 [4] 43 [2] 51 [1]... ...54 [3] 74 [4] 93 [2] 99 [3] ``` ] ??? - Anything remarkable about this in-order traversal? * Ordered values * Can reconstitute original BST from output (figure next slide but maybe show on blackboard first) -- .rcol[ ![:newline 3]  ] -- ![:flush 3] - In-order traversal coupled with level information is enough data to reconstitute BST based on output --- layout: false # Recap .lcol40[ ## Binary search tree - Specific ordering of keys ] .rcol60[  ] -- ![:flush 0] ### Typical API .footnotesize[ ```C // Return whether @key is found in tree bool Contains(const K& key); // Return max key in tree const K& Max(); // Return min key in tree const K& Min(); // Insert @key in tree void Insert(const K &key); // Remove @key from tree void Remove(const K &key); // Print tree in-order void Print(); ```  ] ??? - Live quiz - The last operation I wanted to talk to you about is remove() but it's a bit tricky so we're gonna go slowly - The first strategy for removing a node out of a BST is kind of funny is not removing it. --- layout: true # Binary search trees --- ## How to remove a node? ### Lazy node deletion .lcol[ - Don't actually delete the node - Keep the key to guide search, and mark node as "disabled" - Upon later insertion of the same key, node can easily be re-enabled ] .rcol[ Example after removing key `93`:  ] -- ![:flush 3] - Somewhat viable approach only if number of deletions is very small --- ### Actual node deletion 3 scenarios (also by order of complexity): .lcol[ - Leaf node, i.e. node with no child ] .rcol[  ] -- ![:flush] .lcol[ - Internal node with one child ] .rcol[  ] -- ![:flush] .lcol[ - Internal node with two children ] .rcol[  ] --- ### Leaf node .lcol[ #### Before  ] -- .rcol[ #### Algorithm - Simply remove node to be deleted from the tree  ] -- .lcol[ ![:newline 1] #### After  ] --- ### Internal node with one child .lcol[ #### Before  ] -- .rcol[ #### Algorithm - Replace node with its child  ] -- .lcol[ ![:newline 3] #### After  ] --- ### Internal node with two children .lcol[ #### Before  ] -- .rcol[ #### Algorithm - Call *D* the node to be deleted - Find min node in *D*'s right tree - i.e. *D*'s successor key value - Copy min node's key into *D* - Recursively delete min node  ] -- .lcol[ ![:newline 3] #### After  ] ??? - Note that min node in right tree cannot have two children, when being deleted --- ### Implementation .footnotesize[ ```C template
void BST
::Remove(const K &key) { Remove(root, key); } template
void BST
::Remove(std::unique_ptr
&n, const K &key) { // Key not found if (!n) return; if (key < n->key) { Remove(n->left, key); } else if (key > n->key) { Remove(n->right, key); } else { // Found node if (n->left && n->right) { // Two children: replace with min node in right subtree n->key = Min(n->right.get())->key; Remove(n->right, n->key); } else { // Replace with only child or with nullptr n = std::move((n->left) ? n->left : n->right); } } } ```  ] - Notice how smart pointers will do the garbage collecting for us ??? - Also notice why we needed an internal Min() function that returned a Node (unlike Max() which is implemented with no helper function) --- ## Complexity analysis ### The good .lcol40[  ] .rcol60[ - Running time of all operations is $O(d)$ - Apart from traversal, which is necessarily $O(N)$ - Where $d$ is the depth of the accessed node ] -- ![:flush 1] .lcol60[ But tree shape depends on order of insertion: - At best, if tree is complete, its height is $O(\lg N)$ - On average, if keys are inserted in random order, the height is still proportional to $O(\lg N)$ ] .rcol40[  ] --- ### The bad - Deletions (with the presented approach) tend to displace nodes from the right subtrees to the left subtrees - Tree becomes unbalanced and depth tends towards $O(\sqrt{N})$ -- ![:newline 2] .lcol[ Shape after 100 pairs of random insertion/deletion:  - Tree has become left heavy ] -- .rcol[ Some solutions: - Avoid/forbid deletions! - Alternate picking min node in right tree with max node in left tree when deleting nodes with two children *(but improvement not mathematically proven)*  ] --- ### The ugly .lcol[ Worst-case scenario - Insertions of ordered values ] .rcol[  ] -- .lcol[ - Complexity for all operations becomes $O(N)$! ] --- ## Conclusion Big disparities depending on shape of BST .lcol[ ### Average running time .center.tiny[ | Search | Insert | Delete | Find min/max | |------------|------------|------------|--------------| | $O(\lg N)$ | $O(\lg N)$ | $O(\lg N)$ | $O(\lg N)$ | ] ] .rcol[ ### Worst-case running time .center.tiny[ | Search | Insert | Delete | Find min/max | |--------|--------|--------|--------------| | $O(N)$ | $O(N)$ | $O(N)$ | $O(N)$ | ] ] -- ![:flush 3] ### Solution - Make BST always balanced - Each node has about the same number of descendants in its left subtree as in its right subtree - Guarantee logarithmic performance for all operations - Such binary search trees are called **self-balancing** - E.g. AVL trees, Red-Black trees, etc.