Sorting

# CS 10C - Sorting

![:newline]

.Huge[*Prof. Ty Feng*]

![:newline]

.Large[UC Riverside - WQ 2026]

![:newline 6]

![:scale 40%](../utils/style/ucr_cs_logo.png)

---

---

## Sorting in the wild
.lcol60[
- Sorting is a fundamental operation
    - User interface
    - Data organization
    - Also at the base of other operations...
]
.rcol40[
![:scale 100%](images/dolphin_sortby.png)
]

???

- Everything is sorted
    - Books at library
    - Passenger listing on plane
    - To some extend, food at grocery store

## Sorting-based operations

Once a collection of items is sorted:

- **Searching**
    - Use binary search
- **Element uniqueness**
    - Scan items for adjacent duplicates
- **Frequency distribution**
    - Scan items and count duplicates
- **Selection**
    - Find *kth* largest item at *kth* position
- Etc.
]

???

- Could also mention "closest pair"
    - In a collection, find the two closest integers

---

## Definition

- Sorting organizes the elements of a collection in a certain order
    - E.g., numerical order, lexicographical order
- Elements are *comparable keys*

## History

- Sorting algorithms have been a popular research topic since the beginning of
  computing, and for many decades
  - 50s: e.g., Merge sort, Bubble sort
  - 60s: e.g., Heapsort
  - 70s: e.g., Quicksort
  - Now somewhat considered a solved problem

???

- A couple more in the early 2000s, but nothing much
- Might get some interest again once we get manycore processors, to exploit
  parallelism

--
![:newline 1]

- Overabundance of possibilities...
.footnotesize[
    Bubble sort, Cocktail shaker sort, Odd–even sort, Comb sort, Gnome sort,
    Quicksort, Slowsort, Stooge sort, Bogosort, Selection sort, Heapsort,
    Smoothsort, Cartesian tree sort, Tournament sort, Cycle sort, Weak-heap
    sort, Insertion sort, Shellsort, Splaysort, Tree sort, Library sort,
    Patience sorting, Merge sort, Cascade merge sort, Oscillating merge sort,
    Polyphase merge sort, American flag sort, Bead sort, Bucket sort, Burstsort,
    Counting sort, Pigeonhole sort, Proxmap sort, Radix sort, Flashsort, etc.
]

---

## Classifications (1/2)

### Complexity

Three main classes of complexities:
- Naive algorithms (e.g., Selection sort, Insertion sort)
    - Average complexity of $O(n^2)$
    - Relevant only for small arrays
- Efficient algorithms (e.g., Heapsort, Merge sort, Quicksort)
    - Average complexity of $O(n\log n)$
    - Default option in most cases
- Non-comparison based algorithms (e.g., Bucket sort, Radix sort)
    - Leverage properties of the keys to be sorted
    - Not limited by $O(n\log n)$ bound

### Memory occupation

- *In-place* sorting
    - Transform initial collection directly, without using extra space
- Otherwise, extra space of often either $O(\log n)$ to $O(n)$

---

## Classifications (2/2)

### Stability

- A sorting algorithm is stable if equal keys remain in the same relative order
  before and after the sort

- Important if items have other comparable attributes
    - Equality might not then mean exact duplicates
    - Enables chaining several levels of sorting on the same collection (using a
      first attribute, then another, etc.)

#### Example

.lcol.footnotesize[
![:newline 4]
```C
{ { 7, "heart" },
  { 9, "clubs"},
  { 3, "diamonds" },
  { 7, "spades" } }
```
.center[Initial collection]
]

.rcol.footnotesize[
```C
{ { 3, "diamonds" },
  { 7, "heart" },
  { 7, "spades" },
  { 9, "clubs"} }
```
.center[Result of stable sorting]
]

.rcol.footnotesize[
```C
{ { 3, "diamonds" },
  { 7, "spades" },
  { 7, "heart" },
  { 9, "clubs"} }
```
.center[*Potential* result of unstable sorting]
]

---

# Selection sort

---

## Principle

- Assuming a collection `a` of $n$ items, requires $n$ phases (from $0$ to
  $n-1$)
- At phase $i$, find index `min` of minimum item in the remaining items
- Swap `a[i]` and `a[min]`

--
![:newline 1]

### Snapshot example

.center.footnotesize[
| i | min |   a[0]   |   a[1]   |   a[2]   | a[3] | a[4] | a[5] |   a[6]  | a[7] | a[8] |
|:-:|:---:|:--------:|:--------:|:--------:|:----:|:----:|:----:|:-------:|:----:|:----:|
| 3 |  3  | .gray[D] | .gray[E] | .gray[E] | .red[I] |   R  |   S  |   I  |   R  |   V  |
]

- Before phase $3$:
    - Range $0$ to $i-1$ (`a[0-2]`) is already in sorted and final order
- At phase $3$:
    - Find minimum item in range $i$ to $n-1$ (`a[3-8]`)
    - Minimum item (`'I'`) is already at position $i$ (`a[3]`)
- After phase $3$:
    - Range $0$ to $i$ (`a[0-3]`) is now sorted and in final order
    - Go to phase $4$...

---

### Full example

- In .red[red], minimum item at phase $i$
- In .gray[gray], items in sorted and final order

![:newline 1]

.center.footnotesize[
| i | min |   a[0]   |   a[1]   |   a[2]   |   a[3]   |   a[4]   |   a[5]   |   a[6]   |   a[7]   |   a[8]  |
|:-:|:---:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:-------:|
|   |     |     R    |     I    |     V    |     E    |     R    |     S    |     I    |     D    |    E    |
| 0 |  7  |     R    |     I    |     V    |     E    |     R    |     S    |     I    |  .red[D] |    E    |
| 1 |  3  | .gray[D] |     I    |     V    |  .red[E] |     R    |     S    |     I    |     R    |    E    |
| 2 |  8  | .gray[D] | .gray[E] |     V    |     I    |     R    |     S    |     I    |     R    | .red[E] |
| 3 |  3  | .gray[D] | .gray[E] | .gray[E] |  .red[I] |     R    |     S    |     I    |     R    |    V    |
| 4 |  6  | .gray[D] | .gray[E] | .gray[E] | .gray[I] |     R    |     S    |  .red[I] |     R    |    V    |
| 5 |  6  | .gray[D] | .gray[E] | .gray[E] | .gray[I] | .gray[I] |     S    |  .red[R] |     R    |    V    |
| 6 |  7  | .gray[D] | .gray[E] | .gray[E] | .gray[I] | .gray[I] | .gray[R] |     S    |  .red[R] |    V    |
| 7 |  7  | .gray[D] | .gray[E] | .gray[E] | .gray[I] | .gray[I] | .gray[R] | .gray[R] |  .red[S] |    V    |
| 8 |  8  | .gray[D] | .gray[E] | .gray[E] | .gray[I] | .gray[I] | .gray[R] | .gray[R] | .gray[S] | .red[V] |
|   |     |     D    |     E    |     E    |     I    |     I    |     R    |     R    |     S    |    V    |
]

---

## Code

```C
// Generic selection sort function
template<typename K>
void selection_sort(std::vector<K> &a) {
  // N phases, from 0 to N-1
  for (unsigned i = 0; i < a.size(); i++) {
    // Assume the minimum is the first element of the remaining collection
    unsigned min = i;

// Look for a better minimum in remaining collection if any
    for (unsigned j = i + 1; j < a.size(); j++)
      if (a[j] < a[min])
        min = j;

// Swap minimum with current element
    if (min != i)
      std::swap(a[min], a[i]);
  }
}

```

![:filename](selection_sort.cc)

]

.lcol.footnotesize[
![:newline 1]

```C
  // Vector of unordered integer numbers
  std::vector<int> vecint = {
    74, 98, 83, 52, 66, 64, 75, 8, 1, 69
  };
...
  // Sort vector
  selection_sort<int>(vecint);

```

![:filename](selection_sort.cc)

]

### Execution
.footnotesize[
```terminal
Before : 74 98 83 52 66 64 75 8 1 69
After : 1 8 52 64 66 69 74 75 83 98
```
]
]

---

## Conclusion

- One of the simplest sorting algorithms
- Constant $\Theta(n^2)$ time complexity
    - Even if collection is already sorted!
- In-place

.lcol[
- *Not* stable
    - Long distance exchanges might invert position of equal items
]
.rcol.footnotesize[
| i | min |   a[0]   |    a[1]   |   a[2]   |
|:-:|:---:|:--------:|:---------:|:--------:|
| 0 |  2  |    B1    |     B2    |  .red[A] |
| 1 |  1  | .gray[A] |  .red[B2] |    B1    |
| 2 |  2  | .gray[A] | .gray[B2] | .red[B1] |
|   |     |     A    |     B2    |    B1    |
]

???

- Complexity does not depend on input data, contrarily to most other sorting
  algorithms

---

# Insertion sort

---

## Principle

- Also $n$ phases (from $0$ to $n-1$)
- At phase $i$, repeatedly swap item at `a[i]` with each larger on its left

### Snapshot example

.center.footnotesize[
|  i | pos |   a[0]  | a[1] | a[2] |   a[3]   |   a[4]   |   a[5]   |   a[6]   |   a[7]   |   a[8]   |
|:--:|:---:|:-------:|:----:|:----:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|
|  2 |  -  |    I    |   R  |   V  | .gray[E] | .gray[R] | .gray[S] | .gray[I] | .gray[D] | .gray[E] |
|  3 |  0  | .red[E] |   I  |   R  |     V    | .gray[R] | .gray[S] | .gray[I] | .gray[D] | .gray[E] |
]

- Before phase $3$ (i.e., after phase $2$, as shown above):
    - Range $0$ to $i-1$ (`a[0-2]`) is *locally* sorted
    - Range $i$ to $n-1$ (`a[3-8]`) has not been seen yet
- At phase $3$:
    - Make item at position $i$ (`a[3] = 'E'`) find its position in range $0$ to
      $i$ by successive swapping
- After phase $3$:
    - Range $0$ to $i$ (`a[0-3]`) is now *locally* sorted
    - Go to phase $4$...

???

- Very much like someone would sort a deck of cards

---

### Full example

- In .gray[gray], items do not move
- In .red[red], item at phase $i$ finding its proper position in the left range

![:newline 1]

.center.footnotesize[
| i | pos |   a[0]   |   a[1]   |   a[2]   |   a[3]   |   a[4]   |   a[5]   |   a[6]   |   a[7]   |   a[8]   |
|:-:|:---:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|
|   |     |     R    |     I    |     V    |     E    |     R    |     S    |     I    |     D    |     E    |
| 0 |  0  |  .red[R] | .gray[I] | .gray[V] | .gray[E] | .gray[R] | .gray[S] | .gray[I] | .gray[D] | .gray[E] |
| 1 |  0  |  .red[I] |     R    | .gray[V] | .gray[E] | .gray[R] | .gray[S] | .gray[I] | .gray[D] | .gray[E] |
| 2 |  2  | .gray[I] | .gray[R] |  .red[V] | .gray[E] | .gray[R] | .gray[S] | .gray[I] | .gray[D] | .gray[E] |
| 3 |  0  |  .red[E] |     I    |     R    |     V    | .gray[R] | .gray[S] | .gray[I] | .gray[D] | .gray[E] |
| 4 |  3  | .gray[E] | .gray[I] | .gray[R] |  .red[R] |     V    | .gray[S] | .gray[I] | .gray[D] | .gray[E] |
| 5 |  4  | .gray[E] | .gray[I] | .gray[R] | .gray[R] |  .red[S] |     V    | .gray[I] | .gray[D] | .gray[E] |
| 6 |  2  | .gray[E] | .gray[I] |  .red[I] |     R    |     R    |     S    |     V    | .gray[D] | .gray[E] |
| 7 |  0  |  .red[D] |     E    |     I    |     I    |     R    |     R    |     S    |     V    | .gray[E] |
| 8 |  2  | .gray[D] | .gray[E] |  .red[E] |     I    |     I    |     R    |     R    |     S    |     V    |
|   |     |     D    |     E    |     E    |     I    |     I    |     R    |     R    |     S    |     V    |
]

---

## Code

```C
// Generic insertion sort function
template<typename K>
void insertion_sort(std::vector<K> &a) {
  // N phases, from 0 to N-1
  for (unsigned i = 0; i < a.size(); i++) {
    // Swap current item with already sorted items
    // until finding its correct position
    for (int j = i ; j > 0; j--)
      if (a[j] < a[j-1])
        std::swap(a[j], a[j-1]);
      else
        break;
  }
}

```

![:filename](insertion_sort.cc)

]

--
![:newline 1]

### Execution
.lcol.footnotesize[

```C
  // Vector of unordered integer numbers
  std::vector<int> vecint = {
    74, 98, 83, 52, 66, 64, 75, 8, 1, 69
  };
...
  // Sort vector
  insertion_sort<int>(vecint);

```

![:filename](insertion_sort.cc)

]

.rcol.footnotesize[
```terminal
Before : 74 98 83 52 66 64 75 8 1 69
After : 1 8 52 64 66 69 74 75 83 98
```
]

---

## Conclusion

- With selection sort, one of the simplest sorting algorithms
    - Popular on small arrays, performs better than efficient sorting algorithms
- Usually more efficient than selection sort
    - Best case of $O(n)$ if collection is already sorted
    - However, number of writes is $O(n^2)$ because of the pairwise swaps
      (versus $O(n)$ for selection sort)
- In-place and stable
- *Online*
    - Can sort collection as it receives it!

???

- In practice, faster than more advanced algos such as Quicksort for small
  arrays
- In some environments, writes are expensive (some rom memories that cannot be
  written many times because it causes physical degradation)
    - In that case, selection sort would be preferred

---

---

# Recap

## Classifications

- Runtime complexity
    * Naive $O(n^2)$ (Selection sort, Insertion sort)
    * Efficient $O(n \lg n)$ (Heapsort, Merge sort, Quicksort)
    * Non-comparison based (Bucket sort, Radix sort)
- Memory complexity
    * In-place vs additional space required
- Stability
    * Equal items remain in same relative order

![:newline 1]

## Selection sort
- At each phase $i$, populate `a[i]` with minimum item of remaining collection
.scriptsize[
| i | min |   a[0]   |   a[1]   |   a[2]   | a[3] | a[4] | a[5] |   a[6]  | a[7] | a[8] |
|:-:|:---:|:--------:|:--------:|:--------:|:----:|:----:|:----:|:-------:|:----:|:----:|
| 3 |  3  | .gray[D] | .gray[E] | .gray[E] | .red[I] |   R  |   S  |   I  |   R  |   V  |
]
- Characteristics: $\Theta(n^2)$, in-place, non-stable

## Insertion sort
- At each phase $i$, repeatedly swap item at `a[i]` with each larger on its left
- Characteristics: $O(n^2)$, best case $O(n)$ in-place, stable

---

# Heapsort

---

## Principle

- Two step algorithm:
    1. Build heap out of collection of items ($O(n)$ using Floyd's linear approach)
    1. Remove items in order ($n$ operations, $O(\log n)$ each)
- Total time complexity of $O(n \log n)$

## Naive implementation based on min-heap .ref[\*]

.lcol60.footnotesize[

```C
// Generic heapsort function based on min-heap
template <typename K>
void heapsort(std::vector<K> &a) {
  // Make a vector copy for the heap
  std::vector<K> a_heap(a);
  // Heapify
  BuildHeap(a_heap);
  // Extract min values in order
  for (unsigned i = 0; i < a.size(); i++) {
    a[i] = Top(a_heap);
    Pop(a_heap);
  }
}

```

![:filename](heapsort_minheap.cc)

]

.footnote[ .ref[\*] FYI, about 50 lines of code to implement heap methods (based
on binary heap from priority queue) ]

.rcol40[
- Extra vector copy required ($O(n)$)
    - What very specific behavior of `Pop()` can we leverage to avoid it?
]

---

## In-place implementation based on max-heap

.lcol60.footnotesize[

```C
// Generic heapsort function based on max-heap
template <typename K>
void heapsort(std::vector<K> &a) {
  // Heapify vector
  std::make_heap(a.begin(), a.end());
  // Pop max values in order
  for (unsigned i = 0; i < a.size(); i++)
    std::pop_heap(a.begin(), a.end() - i);
}

```

![:filename](heapsort_maxheap.cc)

]

.rcol40[
- At each iteration, max value is pushed at the end of the remaining heap
- At the end, max values have been pushed in descending order from the end of
  the vector
]

--
.wide[
![:newline 2]
]

## Conclusion

- Popular sorting algorithm
    - E.g., default sort function in the Linux kernel
- Very consistent sorting algorithm
    - $O(n \log n)$ in average and worst cases
- In-place (if cleverly implemented)
- *Not stable*

---

# Towards faster sorting algorithms

## Simple sorting

.center.scriptsize[
| Name           | Best  | Average | Worst | Memory | Stable |
|----------------|-------|---------|-------|--------|--------|
| Selection sort | $n^2$ | $n^2$   | $n^2$ | 1      | No     |
| Insertion sort | $n$   | $n^2$   | $n^2$ | 1      | Yes    |
]

- Need another paradigm than repetitively and linearly scanning items...

--
![:newline 1]

## Divide-and-conquer algorithms

.lcol[
1. Divide the work into smaller chunk recursively
    - Make smaller pieces of the larger problem
2. Conquer the individual pieces
3. Combine the results back up recursively
]

.rcol.footnotesize[
```C
DivideAndConquer (input) {
    if (input small enough to solve)
        conquer, solve, return results
    else
        divide input into smaller pieces
        DivideAndConquer(smaller piece #1)
        DivideAndConquer(smaller piece #2)
        combine both results and return
}
```
]

- Divide-and-conquer based sorting algorithms:
    - Merge sort
    - Quicksort
]

---

# Merge sort

---

## Principle

### Conquer
- If a sub-array has 0 or 1 item, it is sorted

### Combine
- Merge two adjacent sorted sub-arrays together
- Recursively merge until whole array is reconstituted
]

---

## Merging sorted halves

### Example

- Merge two sorted halves `aux[lo]` to `aux[mid]`, and `aux[mid+1]` to
  `aux[hi]`, into a single sub-array `a[lo]` to `a[hi]`

![:scale 75%](images/mergesort_merge_halves.svg)

???

- Linear process

---

### Merging code

.lcol66.footnotesize[

```C
template<typename K>
void merge_sort_merge(std::vector<K> &a,
                      std::vector<K> &aux,
                      int lo, int mid, int hi) {
  // Save both halves into auxiliary space
  for (int k = lo; k <= hi; k++)
    aux[k] = a[k];

// Merge the two halves: [lo,mid][mid+1,hi]
  int i = lo, j = mid + 1;
  for (int k = lo; k <= hi; k++) {
    // Left half has been entirely merged
    if (i > mid)
      // Just take what's remaining from right half
      a[k] = aux[j++];

// Right half has been entirely merged
    else if (j > hi)
      // Just take what's remaining from left half
      a[k] = aux[i++];

// Get smallest item from either left or right halves
    else if (aux[j] < aux[i])
      a[k] = aux[j++];
    else
      a[k] = aux[i++];
  }

```

![:filename](merge_sort.cc)

]

---

## Dividing into halves

```C
template<typename K>
void merge_sort_recur(std::vector<K> &a,
                      std::vector<K> &aux,
                      int lo, int hi) {
  // Recursion stop condition: subarray is sorted if it contains 0 or 1 item
  if (hi <= lo)
    return;

// Otherwise, further divide subarray into two halves
  int mid = lo + (hi - lo) / 2;

// Recursively sort left half
  merge_sort_recur(a, aux, lo, mid);
  // Recursively sort right half
  merge_sort_recur(a, aux, mid + 1, hi);

// Merge two halves
  merge_sort_merge(a, aux, lo, mid, hi);
}

```

![:filename](merge_sort.cc)

]

![:newline 2]
![:scale 50%](images/mergesort_divide.svg)

---

## Putting the pieces together

.lcol60.footnotesize[

```C
template<typename K>
void merge_sort_merge(std::vector<K> &a,
                      std::vector<K> &aux,
                      int lo, int mid, int hi) {
...
}

template<typename K>
void merge_sort_recur(std::vector<K> &a,
                      std::vector<K> &aux,
                      int lo, int hi) {
...
}

// Generic merge sort function
template<typename K>
void merge_sort(std::vector<K> &a) {
  // Need auxiliary space of the same size
  std::vector<K> aux(a.size());
  merge_sort_recur(a, aux, 0, a.size() - 1);
}

```

![:filename](merge_sort.cc)

]

```C
  std::vector<int> vecint = {
    74, 98, 83, 52, 66,
    64, 75, 8, 1, 69
  };
...
  // Sort vector
  merge_sort<int>(vecint);

```

![:filename](merge_sort.cc)

]
.scriptsize[
```terminal
Before : 74 98 83 52 66 64 75 8 1 69
After : 1 8 52 64 66 69 74 75 83 98
```
]
]

---

## Conclusion

- Popular sorting algorithm
    - Compete directly with quicksort
    - Best choice in certain situations: e.g., sorting a linked list
- Consistent sorting algorithm
    - $O(n \log n)$ in average and worst cases
    - Dividing phase ($O(n)$), conquering/merging phase ($O(n \log n)$)
- Not in-place
    - Memory space occupation of $O(n)$
- Stable

???

- Usage
    - Used as default sorting algorithm in language Perl since 5.8
    - Used in Java as Arrays.sort() in some cases
    - Used in Linux kernel to sort linked list
- Sorting linked list
    - Because it has good linear traversal properties (merging) compared to
      quicksort for example which needs random accesses
- Complexity
    - When reconstructing, the merging phase is proportional to the level of the
      inverse tree
    - Last merge is 1 x N; previous merge phase is 2 x N; previous is 4 x N; and
      so on. So overall growth of $\lg N * N$

---

---

# Recap

## Insertion sort

.lcol[
- At phase $i$, repeatedly swap item at `a[i]` with each larger on its left
- Characteristics: $O(n^2)$, in-place, stable, and *online*
]
.rcol.tiny[
|  i | pos |   a[0]  | a[1] | a[2] |   a[3]  |   a[4]   |   a[5]   |   a[6]   |   a[7]   |   a[8]   |
|:--:|:---:|:-------:|:----:|:----:|:-------:|:--------:|:--------:|:--------:|:--------:|:--------:|
| 3a |  -  |    I    |   R  |   V  | .red[E] | .gray[R] | .gray[S] | .gray[I] | .gray[D] | .gray[E] |
| 3b |  0  | .red[E] |   I  |   R  |    V    | .gray[R] | .gray[S] | .gray[I] | .gray[D] | .gray[E] |
]

![:flush 1]

## Heapsort

.lcol60[
- Build max-heap and pop each item
- Characteristics: $O(n\lg n)$, in-place, not stable
]
.rcol40.tiny[
```C++
std::make_heap(a.begin(), a.end());
for (unsigned i = 0; i < a.size(); i++)
    std::pop_heap(a.begin(), a.end() - i);
```
]

![:flush 1]

## Mergesort

.lcol70[
- Divide and conquer approach
    * Recursively divide array in two, until single items
    * Recombine adjacent pairs of sorted sub-arrays
- Characteristics: $O(n\lg n)$, extra $O(n)$ memory space, stable
]
.rcol30[
![:scale 100%](images/mergesort_merge_snapshot.svg)
]

---

# Quicksort

---

## Principle

.lcol[
### Divide
- If partition has 0 or 1 item, it is sorted
- If partition has more than 1 item
    - Pick a pivot (e.g., first value)
    - Rearrange partition around pivot
]

???

- Not really a combining/merging phase in this algorithm

.lcol[
### Conquer
- Pivot is in final position
- Recursively sort right and left sub-partitions around pivot
]

???

- zoom on partitioning (first line and second line)

---

## Partitioning

### Phase 1: partition according to pivot

.lcol[
- Pivot is `a[lo]`
- Repeat until `i` and `j` pointers cross
    - Scan `i` from left to right while `a[i] < a[lo]`
    - Scan `j` from right to left while `a[j] > a[lo]`
    - Exchange `a[i]` with `a[j]`
]

???

- Maybe, mention stability here
    - Long swaps

.lcol[
- When pointers cross
    - Exchange `a[lo]` with `a[j]`
- Pivot is now in final position in sorted array
]

???

- Next code

---

### Partitioning code

.lcol60.footnotesize[

```C
template<typename K>
int quicksort_partition(std::vector<K> &a,
                        int lo, int hi) {
  int i = lo, j = hi + 1;
  while (true) {
    // From left to right, find first item
    // larger than pivot
    while (a[++i] < a[lo])
      // Prevents from going out of bounds
      if (i == hi) break;

// From right to left, find first item
    // smaller than pivot
    while (a[lo] < a[--j])
      // To keep code symmetry but unnecessary
      if (j == lo) break;

// If pointers i and j have crossed, stop
    if (i >= j) break;
    // Otherwise swap items
    std::swap(a[i], a[j]);
  }

// Put pivot in final position
  std::swap(a[lo], a[j]);
  // Return pivot position
  return j;
}

```

![:filename](quicksort.cc)

]

???

- Partitioning is the most important operation
- Next is the recursive sorting part

---

## Recursive sorting

### Code

.lcol60.footnotesize[

```C
template<typename K>
void quicksort_recur(std::vector<K> &a,
                     int lo, int hi) {
  // Recursion stop condition:
  // Sub-array is sorted if it contains 0 or 1 item
  if (hi <= lo)
    return;

// Partition array and get resulting pivot pos
  int p = quicksort_partition(a, lo, hi);
  // Recursively sort left sub-partition
  quicksort_recur(a, lo, p - 1);
  // Recursively sort right sub-partition
  quicksort_recur(a, p + 1, hi);
}

// Generic quicksort function
template<typename K>
void quicksort(std::vector<K> &a) {
  quicksort_recur(a, 0, a.size() - 1);
}

```

![:filename](quicksort.cc)

]

```C
  std::vector<int> vecint = {
    74, 98, 83, 52, 66,
    64, 75, 8, 1, 69
  };
...
  // Sort vector
  quicksort<int>(vecint);

```

![:filename](quicksort.cc)

]
.scriptsize[
```terminal
Before : 74 98 83 52 66 64 75 8 1 69
After : 1 8 52 64 66 69 74 75 83 98
```
]
]

---

## Conclusion

- One of the most popular sorting algorithms
- Efficient but not guaranteed
    - On average $O(n \log n)$
    - But worst case is $O(n^2)$ (seems rare)
    - Choice of pivot is important in the performance
        - Choosing leftmost item as pivot can cause worst-case behavior for
          already sorted arrays
        - Possible strategy is to actually shuffle the array before sorting it!
- Memory occupation
    - $O(\log n)$ for the stack space
- In-place but not stable
    - Possibility to make it stable if using auxiliary array ($O(n)$ extra
      space)

???

- algorithm of choice
    - Java for primitive types, C qsort(), Unix, matlab, etc.
- Efficiency
    - Worst case happens when the pivot keeps being either the smallest or
      biggest item in the sublist to sort
        * Need to scan N items per partionining
        * When recursing, one subarray is size 0, the other size N-1
    - Choosing pivot
        - random position, middle index of partition
    - Shuffling can be done in linear time
- Stability
    - extra space to append items when pivoting (instead of swapping which can
      change the order)

---

# Bucket sort

---

## Principle

- Distribute all the items of an array into buckets
- Sort the buckets individually
- Recombine buckets in order into original array

## Example

.wide[
![:newline 0]
]
.lcol[
- Scatter into buckets
]
.rcol[
![:scale 95%](images/bucket_sort_1.svg)
]

.wide[
![:newline 0]
]
.lcol[
- Sort each bucket individually
]
.rcol[
![:scale 100%](images/bucket_sort_2.svg)
]

.wide[
![:newline 0]
]
.lcol[
- Gather sorted buckets in order
]
.rcol[
![:scale 100%](images/bucket_sort_3.svg)
]

---

## Code
.footnotesize[

```C
// Bucket sort for int between 00 and 99
void bucket_sort(std::vector<int> &a) {
  // One bucket for each tens
  std::vector<std::vector<int>> bckts(10);

// Scatter items in buckets
  for (auto &i : a) {
    // Extract tens digit from item
    int bi = (i / 10) % 10;
    // Put item into corresponding bucket
    bckts[bi].push_back(i);
  }

// Sort each bucket separately
  for (auto &b : bckts)
    insertion_sort(b);

// Combine all buckets back together
  int k = 0;
  for (auto &b : bckts)
    for (auto &i : b)
      a[k++] = i;
}

```

![:filename](bucket_sort.cc)

]

---

## Limitations and conclusion

- Can work well if
    - Items are positive integers
    - Belong to a well-defined (usually small) range
    - And are uniformly distributed in the range
- Best-case scenario
    - If number of buckets is equal to the number of items
    - And items are uniformly distributed across the buckets
    - Then complexity of $O(n)$
- Worst-case scenario
    - All the items end up in a single bucket
    - Then complexity of bucket sorting
        - Up to $O(n^2)$ if using simple sorting algorithm

---

# Radix sort

---

## Principle (*least significant digit radix sort*)

- Starting with least significant digit, put each item in corresponding bucket
- Repeat on next digit, until reaching most significant digit

## Example

.lcol40.small[
- (1) Put each item in bucket corresponding to their unit digit
]
.rcol60.scriptsize[
| 0s      | 1s | 2s         | 3s | 4s | 5s     | 6s | 7s | 8s | 9s |
|---------|----|------------|----|----|--------|----|----|----|----|
| 170, 90 |    | 12, 802, 2 |    |    | 45, 75 | 66 |    |    |    |
]

.wide[
![:newline 0]
]
.lcol40.small[
.footnotesize[
`170, 90, 12, 802, 2, 45, 75, 66`
]
- (2) Repeat on tens digit
]
.rcol60.scriptsize[
| 0s      | 1s | 2s | 3s | 4s | 5s | 6s | 7s      | 8s | 9s |
|---------|----|----|----|----|----|----|---------|----|----|
| 802, 02 | 12 |    |    | 45 |    | 66 | 170, 75 |    | 90 |
]

.wide[
![:newline 0]
]
.lcol40.small[
.footnotesize[
`802, 2, 12, 45, 66, 170, 75, 90`
]
- (3) Repeat on hundreds digit
]
.rcol60.scriptsize[
| 0s                           | 1s  | 2s | 3s | 4s | 5s | 6s | 7s | 8s  | 9s |
|------------------------------|-----|----|----|----|----|----|----|-----|----|
| 002, 012, 045, 066, 075, 090 | 170 |    |    |    |    |    |    | 802 |    |
]

---

# Conclusion

## Hybrid sorting

For example, *introsort*:
- Starts with quicksort
- Resorts to heapsort when recursion becomes too deep
- Uses simple insertion sort for small partitions

???

- Introsort
    - Quicksort, because the fastest on average
    - When recursion becomes too bad (because size of partition to sort is too
      large), switch to heapsort (slower than quicksort on avg, but better
      guarantee of runtime time and memory occupation)
    - On small partitions, just use insertion sort
- Most sorting algos used in standard libraries are now hybrid

## New sorting algorithms

- Timsort (2002)
    - Used by Python, Java, Swift, Rust
    - Leverages the fact that most collections usually have consecutive
      ordered items already
    - Recently replaced by optimized version Powersort (2022)
- Library sort (2006)
    - Variant of insertion sort that leaves gaps to accelerate subsequent
      insertions of new items

???

- New sorting algos
    - Not very often that they are being invented, only 2 in the last 20 years
    - Library sort
        - Comes from the analogy of a librarian who would leave gaps between
          sections of books for later insertions.

## Standard language libraries

.lcol40.small[
- C++: `std::sort()`
- C: `qsort()`
]
.rcol60.small[
- Python: `list.sort()`
- Java: `java.util.Collections.sort()`
]

???

- Language libraries usually offer efficient sorting functions as part of their
  API