Trees

# CS 10C - Trees

![:newline]

.Huge[*Prof. Ty Feng*]

![:newline]

.Large[UC Riverside - WQ 2026]

![:newline 6]

![:scale 40%](../utils/style/ucr_cs_logo.png)

???

- So far we have talked about 3 ADTs: Lists, stacks, and queues.
- And we have seen that these ADTs could be implemented by different data
  structures: arrays, singly linked lists, doubly linked lists.
- Today, we will start talking about another type of data structure, called
  trees.
- Before we dive into the details about trees though, we need to understand why
  they could be useful in the first place. For that, let's talk about searching
  data.

---

# Searching data

---

## Introduction

- How to retrieve specific items from a collection?
    - Such as a record from a database?
    - Items are usually described by a **(key, value)** pair

![:newline 1]

### Example: DNS lookup
- Given a domain name, find the corresponding IP address

.center[
| Domain name (Key)  | IP address (Value) |
|--------------------|--------------------|
| www.ucr.edu        | 23.185.0.2         |
| www.cs.ucr.edu     | 169.235.30.15      |
| www.ucsb.edu       | 23.185.0.3         |
| www.ucdavis.edu    | 23.185.0.4         |
| www.wikipedia.org  | 185.15.59.224      |
]

???

- Note that the reason why ucr.edu, ucsb.edu, and ucdavis.edu have similar IP addresses is
  that they all use the same cloud provider for their website (fastly.net)!

---

### Ubiquitous operation

- Searching data is a fundamental operations in programming

.center.footnotesize[
| Application       | Purpose of search            | Key           | Value                |
|-------------------|------------------------------|---------------|----------------------|
| DNS               | Find IP address              | Domain name   | IP address           |
| Reverse DNS       | Find domain name             | IP            | Domain name          |
| File system       | Find file on disk            | Filename      | Location on disk     |
| Book index        | Find relevant pages          | Term          | List of page numbers |
| Web search        | Find relevant web pages      | Keyword       | List of URLs         |
| Compiler          | Find properties of variables | Variable name | Type and value       |
| Dean's honor list | Find best students           | GPA           | Student's name       |
]

![:newline 1]

???

- floor value: find largest key smaller or equal to searched key
- ceil value: find smallest key greater or equal to searched key
- rank: rank of key in collection

.rcol[
### Questions
- How to make searching as efficient as possible?
- What are the best *search data structures* available?
]

---

## Linear search

.lcol[
### Definition
- Sequentially check each item of list until match is found
- Also called *sequential search*
]

![:flush 1]

### Characteristics
- Very simple to implement
- Only practical when list is short or for single-search scenario

???

- Apparently for list up to 100 items, linear search outperforms sort + binary
  search
    - Makes sense since sorting has a cost

![:newline 1]

### Related data structures and complexity

.footnotesize[
| Data structure       | Search | Insert          | Delete                 | Find min/max |
|----------------------|--------|-----------------|------------------------|--------------|
| Unsorted array       | $O(N)$ | $O(1)$ amortzed | (search time +) $O(1)$ | $O(N)$       |
| Unsorted linked list | $O(N)$ | $O(1)$          | (search time +) $O(1)$ | $O(N)$       |
| Sorted linked list   | $O(N)$ | $O(N)$          | (search time +) $O(1)$ | $O(1)$       |
]

???
- Complexity:
    - unsorted array:
        - insert is O(1) amortized
        - for delete, we can swap the item to delete with last element and
          simply pop() last item, which is O(1) amortized

![:newline 1]

- Why don't we have "sorted array" in this list of data structures?
    * See next slide: if we have a sorted array, then we don't want to use a
      linear search

---

## Sorted array and binary search

### Definition
- Compare with middle item of sorted array for match; if not a match, divide
  search interval in half and repeat

![:scale 40%](images/binary_search.svg)

### Characteristics
- Requires to have a sorted array
    - Either by construction, when inserting/deleting items
    - Or by sorting it before the search (at least $O(N \lg N)$ in the
      worst-case)

![:newline 1]

### Related data structures and complexity

.footnotesize[
| Data structure | Search     | Insert | Delete                 | Find min/max |
|----------------|------------|--------|------------------------|--------------|
| Sorted array   | $O(\lg N)$ | $O(N)$ | (search time +) $O(N)$ | $O(1)$       |
]

---

## Summary

.center.footnotesize[
| Data structure       | Search     | Insert | Delete                 | Find min/max |
|----------------------|------------|--------|------------------------|--------------|
| Unsorted array       | $O(N)$     | $O(1)$ | (search time +) $O(1)$ | $O(N)$       |
| Unsorted linked list | $O(N)$     | $O(1)$ | (search time +) $O(1)$ | $O(N)$       |
| Sorted linked list   | $O(N)$     | $O(N)$ | (search time +) $O(1)$ | $O(1)$       |
| Sorted array         | $O(\lg N)$ | $O(N)$ | (search time +) $O(N)$ | $O(1)$       |
]

![:newline 1]

## Problem

- For large sets of items, the linear complexity of linked lists and arrays for
  some operations is prohibitive
- Need for a new search data structure for which the **average** time of most
  operations is $O(\lg N)$...

![:newline 1]

.center.footnotesize[
| Data structure           | Search     | Insert     | Delete     | Find min/max |
|--------------------------|------------|------------|------------|--------------|
| .red[Binary search tree] | $O(\lg N)$ | $O(\lg N)$ | $O(\lg N)$ | $O(\lg N)$   |
]

---

# Trees

---

## Recursive definition

A tree can be defined recursively as
- A collection of **nodes** starting at a **root** node,
- Where each node is connected to zero or more **child** nodes via **edges**,
- With the constraints that no reference is duplicated, and no node points back
  to the root.

![:scale 60%](images/tree.svg)

???

- Note that a tree represents both an ADT and a data structure
- ADT
    - No exact ADT definition as part of std library or Boost
    - But a tree can be implemented using various data structures (eg linked
      list, array, AVL, RB tree, etc.)
- Data structure
    - It's also the data structure for many other ADTs such as map, set,
      priority queues

---

## Some terminology...

### Root

The root node is the tree's top node

![:scale 70%](images/tree_root.svg)

> There can be only one!

---

### Edge

An edge is a connecting link between two nodes

![:scale 80%](images/tree_edge.svg)

![:newline 2]
- A tree composed of **N** nodes has **N - 1** edges

---

### Parent

A parent is a node which is the *direct predecessor* of another node

![:scale 80%](images/tree_parent.svg)

???

- About half of all nodes are parents

---

### Child

A child is a node which is the *direct descendant* of another node

![:scale 80%](images/tree_child.svg)

![:newline 2]
- In a tree, all the nodes except the root are child nodes

???

- Nodes can be parent and child at the same, it's not exclusive
- We often refer to parent/child locally
    - eg E is the parent of J, J is not the child of G nor B

---

### Descendant/Ancestor

Nodes connected by one edge or more

![:scale 80%](images/tree_descendant_ancestor.svg)

![:newline 2]
- A descendant node is also known as *subchild*

???

- Think grandparents, grandchildren here

---

### Siblings

Siblings are all the nodes descending from the same parent

![:scale 80%](images/tree_siblings.svg)

---

### Leaf

A leaf node is a node which has no child

![:scale 80%](images/tree_leaf.svg)

![:newline 2]
- Also called *external*, *outer* or *terminal* node

---

### NIL

The absence of child node is sometimes explicitly represented as NIL

![:scale 80%](images/tree_nil.svg)

![:newline]
- When implementing trees as data structures, NIL nodes are usually represented
  by a NULL pointer

???

- Especially popular in books or in the literature
- Can make a parallel with the end of a linked-list via a drawing

---

### Internal node

An internal node is a node which has at least one child

![:scale 80%](images/tree_internal.svg)

- Opposite of leaf node and also called *inner* or *branch* node

---

### Degree

The degree of a node is the number of its children

![:scale 80%](images/tree.svg)

- A leaf node has always degree 0
- The degree of a tree is the maximum degree of any of its nodes

???

- When a tree has nodes with many children, we often say that it has a high
  "fanout"
  - eg B-tree can often have a couple hundred children, very high fanout

---

### Level

The level of a node is one greater than the level of its parent, knowing that
the level of the root node is 1

![:scale 80%](images/tree_level.svg)

???

- Could think of this as family generations

---

### Path

A path is a sequence of nodes and edges connecting a node with a descendant

![:scale 80%](images/tree_path.svg)

![:newline 2]
- The **length** of the path is the number of edges composing it

???

- Later, when we talk about graphs, which are a generalization of trees, we will
  see some algorithms to find the shortest path between two nodes
  	- Essentially what your GPS does to give you the best directions

---

### Depth

The depth of a node is the path length from the root to this node

![:scale 80%](images/tree_depth.svg)

- The depth of root is 0
- The depth of a tree is equal to the depth of the deepest leaf

---

### Height

The height of a node is the length of the longest path from a leaf to this node

![:scale 70%](images/tree_height.svg)

- All leaves are at height 0
- The height of a tree is equal to the height of the root from the deepest leaf
  *(which is always equal to the depth of the tree)*

???

- Opposite to depth
    - depth from top to bottom
    - height from bottom to top

---

### Subtree

A subtree consists of a child node and all of its descendants

![:scale 80%](images/tree_subtree.svg)

???

- A leaf node is also a subtree
- A subtree has all the same properties as a tree: root node, height, depth,
  etc.

---

## Typical implementations

### Array of children

.lcol.small[
```C
template <typename T>
class Tree<T> {
  struct Node {
    T item;
    std::vector<Node> children;
  };
  std::unique_ptr<Node> root;
  ...
};
```
]

![:flush 1]

#### Characteristics
- Most intuitive representation but...
- Node degrees are not necessarily known in advance for the construction
- Can lead to waste of space

???

- Very difficult to implement if not using vectors

---

### Linked list of siblings

.lcol.small[
```C
template <typename T>
class Tree<T> {
  struct Node {
    T item;
    std::unique_ptr<Node> first_child;
    std::unique_ptr<Node> next_sibling;
  };
  std::unique_ptr<Node> root;
  ...
};
```
]

![:flush 1]

#### Characteristics
- Adaptable to any degree
- Predictable memory complexity

---

---

## Efficient searching
.center.scriptsize[
| Data structure           | Search     | Insert     | Delete                 | Find min/max |
|--------------------------|------------|------------|------------------------|--------------|
| Unsorted array           | $O(N)$     | $O(1)$     | (search time +) $O(1)$ | $O(N)$       |
| Unsorted linked list     | $O(N)$     | $O(1)$     | (search time +) $O(1)$ | $O(N)$       |
| Sorted linked list       | $O(N)$     | $O(N)$     | (search time +) $O(1)$ | $O(1)$       |
| Sorted array             | $O(\lg N)$ | $O(N)$     | $O(N)$                 | $O(1)$       |
| .red[Binary search tree] | $O(\lg N)$ | $O(\lg N)$ | $O(\lg N)$             | $O(\lg N)$   |
]

![:newline 1]

## Tree structure
.lcol70[
![:scale 65%](images/tree.svg)
]
.rcol30[
### Terminology
Node, root, edge, parent, child, siblings, leaf, internal node, degree...
]

---

# Binary trees

---

## Definition

A **binary tree** is a tree in which each node has **at most two** children
- Usually referred to as *left* child and *right* child

![:scale 80%](images/binary_tree.svg)

???

Keep in mind that
- The depth of an average binary tree is considerably smaller than N (that's the
  goal!)
- Although the depth can be N-1 in the worst case (before we talk about
  self-balancing trees)

---

## Types

### Full binary tree

In a full binary tree, every node has either 0 or 2 children

![:scale 80%](images/binary_tree_full.svg)

???

Full binary trees are used for:

- expression tree (each leaf is an operand, internal nodes are operators)
- decision tree with yes/no, true/false results
- huffman coding: internal node always has two children, otherwise it would be a
  leaf encoding a symbol

---

### Complete binary tree

In a complete binary tree, every level, except possibly the last one, is
completely filled and all nodes in the last level are as far left as possible

![:scale 100%](images/binary_tree_complete.svg)

???

- Question: where should the next new node go to maintain a complete binary
  tree?

---

### Perfect binary tree

In a perfect binary tree, all internal nodes have two children and all leaf
nodes have the same depth

![:scale 100%](images/binary_tree_perfect.svg)

???

- Apparently, the literature on the subject of binary trees is not
  well-standardized which can be confusing
- Sometimes perfect binary trees are actually called complete, and complete
  binary trees are then called almost/nearly complete

---

## Traversal

- Many algorithms require all nodes of a binary tree to be visited and their
  contents processed or examined
    - Printing, searching, updating, etc.
- Unlike linear data structures (such as arrays or linked lists), trees do not
  inherently have a linear order in which they can be traversed

![:newline 2]

![:scale 60%](images/binary_tree2.svg)

---

### Traversal types

- **Depth-first** order
    - From root down to leaves
- **Breadth-first** order
    - Level by level

![:newline 2]

![:scale 60%](images/binary_tree_traversal.svg)

---

### Depth-first order

.lcol[
#### General recursive pattern
From a node N
- (N) Process N itself
- (L) Recurse on N's left subtree
- (R) Recurse on N's right subtree

These steps can be done in any order.
]

.rcol.small[
#### Pre-order (NLR)
![:scale 80%](images/binary_tree_preorder.svg)
]

???

- pre-order: each parent is traversed before any of its children

--
![:flush]

.rcol.small[
#### Post-order (LRN)
![:scale 80%](images/binary_tree_postorder.svg)
]

???

- post-order: children are traversed before their respective parents

.lcol.small[
#### In-order (LNR)
![:scale 80%](images/binary_tree_inorder.svg)
]

???

- in-order: left-subtree, then node, then right-subtree

![:newline 1]
- Sailboat navigating around tree.
    - Stops on west, east, or south.

![:newline 1]
- Live quiz

---

### Breadth-first order

#### Level-order

???

- Once again, repeat that there's no *correct* traversal order
- The fact that this one prints the letter in alphabetical order is purely
  coincidental

---

## Data structure implementation

### Node-based

.lcol.footnotesize[

```C
template <typename T>
class BinaryTree {
 public:
  struct Node{
    T item;
    std::unique_ptr<Node> left;
    std::unique_ptr<Node> right;
  };
  std::unique_ptr<Node> root;
};

```

![:filename](binary_tree.h)

]

???

- Ownership from parent to children
- nullptr and NIL

![:flush 0.5]

???

![:newline 1]
- Static construction of tree
    - probably never what you want, but it's an interesting exercise

.rcol75.scriptsize[

```C
// Make a type alias
using Node = BinaryTree<char>::Node;
...
  // Statically build a binary tree
  std::unique_ptr<Node> nj(new Node{'j'});
  std::unique_ptr<Node> ni(new Node{'i'});
  std::unique_ptr<Node> nh(new Node{'h'});
  std::unique_ptr<Node> ng(new Node{'g'});
  std::unique_ptr<Node> nf(new Node{'f', std::move(nj)});
  std::unique_ptr<Node> ne(new Node{'e'});
  std::unique_ptr<Node> nd(new Node{'d', std::move(nh), std::move(ni)});
  std::unique_ptr<Node> nc(new Node{'c', std::move(nf), std::move(ng)});
  std::unique_ptr<Node> nb(new Node{'b', std::move(nd), std::move(ne)});
  std::unique_ptr<Node> na(new Node{'a', std::move(nb), std::move(nc)});
  BinaryTree<char> bin_tree { std::move(na) };

```

![:filename](binary_tree.cc)

]

???

![:newline 1]
- `using` keyword for typedefs
    - introduced in c++11
    - especially useful with templates

---

## Traversal implementations

```C
void PreorderRecur(Node *n) {
  if (!n) return;

std::cout << n->item << ' ';
  PreorderRecur(n->left.get());
  PreorderRecur(n->right.get());
}

```

![:filename](binary_tree.cc)

```C
  std::cout << "Recursive pre-order:\n";
  PreorderRecur(bin_tree.root.get());
  std::cout << std::endl;

```

![:filename](binary_tree.cc)

]
]

```C
void InorderRecur(Node *n) {
  if (!n) return;

InorderRecur(n->left.get());
  std::cout << n->item << ' ';
  InorderRecur(n->right.get());
}

```

![:filename](binary_tree.cc)

```C
  std::cout << "Recursive in-order:\n";
  InorderRecur(bin_tree.root.get());
  std::cout << std::endl;

```

![:filename](binary_tree.cc)

]
]

```C
void PostorderRecur(Node *n) {
  if (!n) return;

PostorderRecur(n->left.get());
  PostorderRecur(n->right.get());
  std::cout << n->item << ' ';
}

```

![:filename](binary_tree.cc)

```C
  std::cout << "Recursive post-order:\n";
  PostorderRecur(bin_tree.root.get());
  std::cout << std::endl;

```

![:filename](binary_tree.cc)

]
]

.rcol[
![:newline 1]
- All these implementations based on recursive function calls require a stack
  call proportional to the height of the binary tree
]

???

- What can happen if binary tree is very deep?

.rcol[
- What can happen if the binary tree is very deep? Is there another way to
  traverse a tree?
	- Stack-based iterative traversal!
]

???

- Will do that in discussion during week 5

---
### In-Order Traversal Produces Sorted Output

### Why does this work?

.lcol60[
**Recursive reasoning:**
1. Visit left subtree first
   - By BST property: all keys < current node
2. Visit current node
3. Visit right subtree last
   - By BST property: all keys > current node

This pattern applies **recursively** at every level:
- Left < Node < Right holds everywhere
- Result: sorted sequence!
]

**Sort data:**
```cpp
void Sort(Node* n, 
          vector<int>& v) {
  if (!n) return;
  Sort(n->left, v);
  v.push_back(n->key);
  Sort(n->right, v);
}
```

**Find k smallest elements:**
- Stop after k nodes visited

**Find k largest elements:**
- Reverse in-order (right-node-left)
- Stop after k nodes
]

---

### Recursive level-order

.lcol60.footnotesize[

```C
void LevelRecur(Node *n, int level) {
  if (!n) return;
  if (level == 1) {
    std::cout << n->item << ' ';
  } else {
    LevelRecur(n->left.get(), level - 1);
    LevelRecur(n->right.get(), level - 1);
  }
}

void LevelorderRecur(Node *root, int levels) {
  for (int l = 1; l <= levels; l++)
    LevelRecur(root, l);
}

```

![:filename](binary_tree.cc)

```C
  std::cout << "Recursive level-order:\n";
  LevelorderRecur(bin_tree.root.get(), 4);
  std::cout << std::endl;

```

![:filename](binary_tree.cc)

]

![:flush 1]

- What is the complexity of this recursive level-order traversal?
--

- $O(N \lg N)$

![:newline 0.5]
- Is there another way of performing a level-order traversal?

???

- The outer loop in `LevelorderRecur()` is $O(\lg N)$, assuming that the tree
  is mostly complete
- Each call to `LevelRecur()` traverses *all the nodes* up to a certain level:
    * level 0: 1 node
    * level 1: 1 + 2 nodes
    * level 2: 1 + 2 + 4 nodes
    * level l: $2^0 + 2^1 + 2^2 + ... + 2^{l-1}$
    * Only the last call is fully $O(2^{\lg N}) = O(N)$ but each call is
      roughly proportional to N (it's like a `for(j = 0; j < i; j++)` kind of
      situation)

--
    - Only $O(N)$ if using a queue-based iterative traversal!

???

- Will also do that in week 5's discussion

---

---

# Recap

.rcol[
### Traverals
.lcol[
![:scale 100%](images/binary_tree_preorder.svg)
.center.footnotesize[Pre-order]
]
.rcol[
![:scale 100%](images/binary_tree_inorder.svg)
.center.footnotesize[In-order]
]
.lcol[
![:scale 100%](images/binary_tree_postorder.svg)
.center.footnotesize[Post-order]
]
.rcol[
![:scale 100%](images/binary_tree_levelorder.svg)
.center.footnotesize[Level-order]
]
]

???

- Sailboat method

### Node-based implementation
![:scale 80%](images/binary_tree_implem.svg)
]
.rcol[
![:newline 1]
.tiny[

```C
template <typename T>
class BinaryTree {
 public:
  struct Node{
    T item;
    std::unique_ptr<Node> left;
    std::unique_ptr<Node> right;
  };
  std::unique_ptr<Node> root;
};

```

![:filename](binary_tree.h)

]

???

- This node-based representation (similar to a linked list) is not the only data
  structure available for representing a binary tree

---

# Binary trees

## Alternative data structure implementation

### Array-based data structure

![:scale 100%](images/binary_tree_array.svg)

.rcol40.small[
```C
template <typename T>
class BinaryTree {
 public:
  std::unique_ptr<T[]> items;
};
```
]

???

- Other representation of a binary tree
    * Works particularly well for *complete* trees
- Will see this representation again when we talk about the data structure
  called a heap in the context of a priority queue
    * In a few weeks

.lcol60[
For node at index $i$
- Left child at index $2i + 1$
- Right child at index $2i + 2$
- Parent (if not root) at index $\lfloor{\frac{i-1}{2}}\rfloor$
]

![:flush 1]

#### Characteristics
- Intrinsic breadth-first order for fast level-order traversal
- No waste of space as long as binary tree is complete
- Often used for *binary heaps* (see later)

---

# Binary search trees

---

## Definition

A **Binary search tree (BST)** is a special kind of binary tree in which
- Each node contains one *comparable* key
    - (and optionally some associated data)
- Each node's key is **greater** to any key stored in its **left subtree**,
- Each node's key is **less than** any key stored in its **right subtree**.

![:scale 65%](images/bst.svg)

---

## Typical API

```C
 public:
  // Return whether @key is found in tree
  bool Contains(const K& key);
  // Return max key in tree
  const K& Max();
  // Return min key in tree
  const K& Min();
  // Insert @key in tree
  void Insert(const K &key);
  // Remove @key from tree
  void Remove(const K &key);
  // Print tree in-order
  void Print();

```

![:filename](bst.h)

???

- Very comparable to List ADT
    - Contains() akin to Find()
    - No Get() with a position in sequence since not a sequence

![:newline 3]
Other typical methods include
- `Floor(key)`: Find largest key smaller or equal to given key
- `Ceil(key)`: Find smallest key greater of equal to given key
- `Rank(key)`: Find key rank in collection
- Etc.

???

- Today, we will study all these methods except `Remove()`
    * Remove is surprisingly complex while the others are quite straightforward!

---

## Data structure

```C
template <typename K>
class BST {
...
 private:
  struct Node{
    K key;
    std::unique_ptr<Node> left;
    std::unique_ptr<Node> right;
  };
  std::unique_ptr<Node> root;

// Useful recursive helper methods
  Node* Min(Node *n);
  void Insert(std::unique_ptr<Node> &n, const K &key);
  void Remove(std::unique_ptr<Node> &n, const K &key);
  void Print(Node *n, int level);
};

```

![:filename](bst.h)

![:newline 1]
- Here, simple implementation of a **set**
    - No associated data item, but just a key of a *comparable* type
- Implementation handling pairs of key-value would be a **map**

???

- recursive version for min()
    - but iterative version for max()
- recursive versions of insert/remove
- recursive print, so that we can print the level of the node

---

## `Contains()`

```C
template <typename K>
bool BST<K>::Contains(const K &key) {
  Node *n = root.get();

while (n) {
    if (key == n->key)
      return true;

if (key < n->key)
      n = n->left.get();
    else
      n = n->right.get();
  }

return false;
}

```

![:filename](bst.h)

]
![:newline 1]
- Iterative version
- Compare given key to node's key and follow left or right subtree accordingly
]

![:newline 2]
.small[

```C
  std::cout << "Tree contains 54? "
      << bst.Contains(54) << "\n";
  std::cout << "Tree contains 55? "
      << bst.Contains(55) << "\n";

```

![:filename](bst_tester.cc)

```terminal
Tree contains 54? 1
Tree contains 55? 0
```
]
]

???

- Point out to NIL idea for `Contains(55)`

---

## `Max()`

```C
template <typename K>
const K& BST<K>::Max(void) {
  Node *n = root.get();
  while (n->right)
	  n = n->right.get();
  return n->key;
}

```

![:filename](bst.h)

]
![:newline 1]
- Iterative version
- Go as far right as possible to find the max key
]

![:newline 2]
.small[

```C
  std::cout << "Max value: "
      << bst.Max() << "\n";

```

![:filename](bst_tester.cc)

```terminal
Max value: 99
```
]
]

???

- Comparison with List ADT
    - Need to go through the entire sequence to find Max value

---

## `Min()`

```C
template <typename K>
const K& BST<K>::Min(void) {
  return Min(root.get())->key;
}

template <typename K>
typename BST<K>::Node* BST<K>::Min(Node *n) {
  if (n->left)
    return Min(n->left.get());
  else
    return n;
}

```

![:filename](bst.h)

]
![:newline 1]
- Recursive version using a helper
- Go as far left as possible to find the min key
]

![:scale 80%](images/bst_min.svg)

![:newline 2]
.small[

```C
  std::cout << "Min value: "
      << bst.Min() << "\n";

```

![:filename](bst_tester.cc)

```terminal
Min value: 2
```
]
]

---

## `Insert()`

.lcol75.footnotesize[

```C
template <typename K>
void BST<K>::Insert(const K &key) {
  Insert(root, key);
}

template <typename K>
void BST<K>::Insert(std::unique_ptr<Node> &n, const K &key) {
  if (!n)
    n = std::unique_ptr<Node>(new Node{key});
  else if (key < n->key)
    Insert(n->left, key);
  else if (key > n->key)
    Insert(n->right, key);
  else
    std::cerr << "Key " << key << " already inserted!\n";
}

```

![:filename](bst.h)

]

.rcol25.small[
- Recursive version using a helper
- Find the right location for the new key
]

.lcol33.footnotesize[

```C
  std::vector<int> keys{51, 43, 93, 18, 42, 99, 54, 2, 74, 74};

BST<int> bst;
  for (auto i : keys)
    bst.Insert(i);

```

![:filename](bst_tester.cc)

]

---

## `Print()`

.lcol.footnotesize[

```C
template <typename K>
void BST<K>::Print() {
  Print(root.get(), 1);
  std::cout << std::endl;
}

template <typename K>
void BST<K>::Print(Node *n, int level) {
  if (!n) return;

Print(n->left.get(), level + 1);
  std::cout << n->key
      << " [" << level << "] ";
  Print(n->right.get(), level + 1);
}

```

![:filename](bst.h)

]

.rcol.small[
### Example

```C
  bst.Print();

```

![:filename](bst_tester.cc)

```terminal
2 [4] 18 [3] 42 [4] 43 [2] 51 [1]...
	...54 [3] 74 [4] 93 [2] 99 [3]
```
]

???

- Anything remarkable about this in-order traversal?
    * Ordered values
    * Can reconstitute original BST from output (figure next slide but maybe
      show on blackboard first)

![:flush 3]
- In-order traversal coupled with level information is enough data to
  reconstitute BST based on output

---

# Recap

.lcol40[
## Binary search tree
- Specific ordering of keys
]
.rcol60[
![:scale 80%](images/bst.svg)
]

![:flush 0]

### Typical API
.footnotesize[

```C
  // Return whether @key is found in tree
  bool Contains(const K& key);
  // Return max key in tree
  const K& Max();
  // Return min key in tree
  const K& Min();
  // Insert @key in tree
  void Insert(const K &key);
  // Remove @key from tree
  void Remove(const K &key);
  // Print tree in-order
  void Print();

```

![:filename](bst.h)

]

???

- Live quiz
- The last operation I wanted to talk to you about is remove() but it's a bit
  tricky so we're gonna go slowly
- The first strategy for removing a node out of a BST is kind of funny is not
  removing it.

---

# Binary search trees

---

## How to remove a node?

### Lazy node deletion

.lcol[
- Don't actually delete the node
- Keep the key to guide search, and mark node as "disabled"
    - Upon later insertion of the same key, node can easily be re-enabled
]

![:flush 3]
- Somewhat viable approach only if number of deletions is very small

---

### Actual node deletion

3 scenarios (also by order of complexity):
.lcol[
- Leaf node, i.e. node with no child
]
.rcol[
![:scale 70%](images/bst_remove_1a.svg)
]

--
![:flush]
.lcol[
- Internal node with one child
]
.rcol[
![:scale 70%](images/bst_remove_2a.svg)
]

--
![:flush]
.lcol[
- Internal node with two children
]
.rcol[
![:scale 70%](images/bst_remove_3a.svg)
]

---

### Leaf node

.rcol[
#### Algorithm
- Simply remove node to be deleted from the tree
![:scale 100%](images/bst_remove_1b.svg)
]

#### After
![:scale 100%](images/bst_remove_1c.svg)
]

---

### Internal node with one child

#### After
![:scale 80%](images/bst_remove_2c.svg)
]

---

### Internal node with two children

.rcol[
#### Algorithm
- Call *D* the node to be deleted
- Find min node in *D*'s right tree
    - i.e. *D*'s successor key value
- Copy min node's key into *D*
- Recursively delete min node
![:scale 100%](images/bst_remove_3b.svg)
]

--
.lcol[
![:newline 3]

#### After
![:scale 90%](images/bst_remove_3c.svg)
]

???

- Note that min node in right tree cannot have two children, when being deleted

---

### Implementation

```C
template <typename K>
void BST<K>::Remove(const K &key) {
  Remove(root, key);
}

template <typename K>
void BST<K>::Remove(std::unique_ptr<Node> &n, const K &key) {
  // Key not found
  if (!n) return;

if (key < n->key) {
    Remove(n->left, key);
  } else if (key > n->key) {
    Remove(n->right, key);
  } else {
    // Found node
    if (n->left && n->right) {
      // Two children: replace with min node in right subtree
      n->key = Min(n->right.get())->key;
      Remove(n->right, n->key);
    } else {
      // Replace with only child or with nullptr
      n = std::move((n->left) ? n->left : n->right);
    }
  }
}

```

![:filename](bst.h)

]
- Notice how smart pointers will do the garbage collecting for us

???

- Also notice why we needed an internal Min() function that returned a Node
  (unlike Max() which is implemented with no helper function)

---

## Complexity analysis

### The good

.rcol60[
- Running time of all operations is $O(d)$
	- Apart from traversal, which is necessarily $O(N)$
- Where $d$ is the depth of the accessed node
]

![:flush 1]

.lcol60[
But tree shape depends on order of insertion:
- At best, if tree is complete, its height is $O(\lg N)$
- On average, if keys are inserted in random order, the height is still
  proportional to $O(\lg N)$
]

---

### The bad

- Deletions (with the presented approach) tend to displace nodes from the right
  subtrees to the left subtrees
- Tree becomes unbalanced and depth tends towards $O(\sqrt{N})$

![:newline 2]

.lcol[
Shape after 100 pairs of random insertion/deletion:
![:scale 100%](images/bst_deletion_skew.svg)
- Tree has become left heavy
]

.rcol[
Some solutions:
- Avoid/forbid deletions!
- Alternate picking min node in right tree with max node in left tree when
  deleting nodes with two children *(but improvement not mathematically proven)*
![:scale 80%](images/bst_deletion_alternate.svg)
]

---

### The ugly

---

## Conclusion

Big disparities depending on shape of BST

.center.tiny[
| Search     | Insert     | Delete     | Find min/max |
|------------|------------|------------|--------------|
| $O(\lg N)$ | $O(\lg N)$ | $O(\lg N)$ | $O(\lg N)$   |
]
]

.center.tiny[
| Search | Insert | Delete | Find min/max |
|--------|--------|--------|--------------|
| $O(N)$ | $O(N)$ | $O(N)$ | $O(N)$       |
]
]

![:flush 3]

### Solution

- Make BST always balanced
    - Each node has about the same number of descendants in its left subtree as in
      its right subtree
    - Guarantee logarithmic performance for all operations
- Such binary search trees are called **self-balancing**
    - E.g. AVL trees, Red-Black trees, etc.