class: center, middle name: title # CS 10C - Sorting ![:newline] .Huge[*Prof. Ty Feng*] ![:newline] .Large[UC Riverside - WQ 2026] ![:newline 6]  .footnote[Copyright © 2026 Joël Porquet-Lupine and Ty Feng - [CC BY-NC-SA 4.0 International License](https://creativecommons.org/licenses/by-nc-sa/4.0/)] --- layout: true # Introduction --- ## Sorting in the wild .lcol60[ - Sorting is a fundamental operation - User interface - Data organization - Also at the base of other operations... ] .rcol40[  ] ??? - Everything is sorted - Books at library - Passenger listing on plane - To some extend, food at grocery store -- .lcol60[ ![:newline 1] ## Sorting-based operations Once a collection of items is sorted: - **Searching** - Use binary search - **Element uniqueness** - Scan items for adjacent duplicates - **Frequency distribution** - Scan items and count duplicates - **Selection** - Find *kth* largest item at *kth* position - Etc. ] ??? - Could also mention "closest pair" - In a collection, find the two closest integers --- ## Definition - Sorting organizes the elements of a collection in a certain order - E.g., numerical order, lexicographical order - Elements are *comparable keys* ## History - Sorting algorithms have been a popular research topic since the beginning of computing, and for many decades - 50s: e.g., Merge sort, Bubble sort - 60s: e.g., Heapsort - 70s: e.g., Quicksort - Now somewhat considered a solved problem ??? - A couple more in the early 2000s, but nothing much - Might get some interest again once we get manycore processors, to exploit parallelism -- ![:newline 1] - Overabundance of possibilities... .footnotesize[ Bubble sort, Cocktail shaker sort, Odd–even sort, Comb sort, Gnome sort, Quicksort, Slowsort, Stooge sort, Bogosort, Selection sort, Heapsort, Smoothsort, Cartesian tree sort, Tournament sort, Cycle sort, Weak-heap sort, Insertion sort, Shellsort, Splaysort, Tree sort, Library sort, Patience sorting, Merge sort, Cascade merge sort, Oscillating merge sort, Polyphase merge sort, American flag sort, Bead sort, Bucket sort, Burstsort, Counting sort, Pigeonhole sort, Proxmap sort, Radix sort, Flashsort, etc. ] --- ## Classifications (1/2) ### Complexity Three main classes of complexities: - Naive algorithms (e.g., Selection sort, Insertion sort) - Average complexity of $O(n^2)$ - Relevant only for small arrays - Efficient algorithms (e.g., Heapsort, Merge sort, Quicksort) - Average complexity of $O(n\log n)$ - Default option in most cases - Non-comparison based algorithms (e.g., Bucket sort, Radix sort) - Leverage properties of the keys to be sorted - Not limited by $O(n\log n)$ bound -- ### Memory occupation - *In-place* sorting - Transform initial collection directly, without using extra space - Otherwise, extra space of often either $O(\log n)$ to $O(n)$ --- ## Classifications (2/2) ### Stability - A sorting algorithm is stable if equal keys remain in the same relative order before and after the sort - Important if items have other comparable attributes - Equality might not then mean exact duplicates - Enables chaining several levels of sorting on the same collection (using a first attribute, then another, etc.) -- #### Example .lcol.footnotesize[ ![:newline 4] ```C { { 7, "heart" }, { 9, "clubs"}, { 3, "diamonds" }, { 7, "spades" } } ``` .center[Initial collection] ] .rcol.footnotesize[ ```C { { 3, "diamonds" }, { 7, "heart" }, { 7, "spades" }, { 9, "clubs"} } ``` .center[Result of stable sorting] ] .rcol.footnotesize[ ```C { { 3, "diamonds" }, { 7, "spades" }, { 7, "heart" }, { 9, "clubs"} } ``` .center[*Potential* result of unstable sorting] ] --- layout: true # Selection sort --- ## Principle - Assuming a collection `a` of $n$ items, requires $n$ phases (from $0$ to $n-1$) - At phase $i$, find index `min` of minimum item in the remaining items - Swap `a[i]` and `a[min]` -- ![:newline 1] ### Snapshot example .center.footnotesize[ | i | min | a[0] | a[1] | a[2] | a[3] | a[4] | a[5] | a[6] | a[7] | a[8] | |:-:|:---:|:--------:|:--------:|:--------:|:----:|:----:|:----:|:-------:|:----:|:----:| | 3 | 3 | .gray[D] | .gray[E] | .gray[E] | .red[I] | R | S | I | R | V | ] - Before phase $3$: - Range $0$ to $i-1$ (`a[0-2]`) is already in sorted and final order - At phase $3$: - Find minimum item in range $i$ to $n-1$ (`a[3-8]`) - Minimum item (`'I'`) is already at position $i$ (`a[3]`) - After phase $3$: - Range $0$ to $i$ (`a[0-3]`) is now sorted and in final order - Go to phase $4$... --- ### Full example - In .red[red], minimum item at phase $i$ - In .gray[gray], items in sorted and final order ![:newline 1] .center.footnotesize[ | i | min | a[0] | a[1] | a[2] | a[3] | a[4] | a[5] | a[6] | a[7] | a[8] | |:-:|:---:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:-------:| | | | R | I | V | E | R | S | I | D | E | | 0 | 7 | R | I | V | E | R | S | I | .red[D] | E | | 1 | 3 | .gray[D] | I | V | .red[E] | R | S | I | R | E | | 2 | 8 | .gray[D] | .gray[E] | V | I | R | S | I | R | .red[E] | | 3 | 3 | .gray[D] | .gray[E] | .gray[E] | .red[I] | R | S | I | R | V | | 4 | 6 | .gray[D] | .gray[E] | .gray[E] | .gray[I] | R | S | .red[I] | R | V | | 5 | 6 | .gray[D] | .gray[E] | .gray[E] | .gray[I] | .gray[I] | S | .red[R] | R | V | | 6 | 7 | .gray[D] | .gray[E] | .gray[E] | .gray[I] | .gray[I] | .gray[R] | S | .red[R] | V | | 7 | 7 | .gray[D] | .gray[E] | .gray[E] | .gray[I] | .gray[I] | .gray[R] | .gray[R] | .red[S] | V | | 8 | 8 | .gray[D] | .gray[E] | .gray[E] | .gray[I] | .gray[I] | .gray[R] | .gray[R] | .gray[S] | .red[V] | | | | D | E | E | I | I | R | R | S | V | ] --- ## Code .footnotesize[ ```C // Generic selection sort function template
void selection_sort(std::vector
&a) { // N phases, from 0 to N-1 for (unsigned i = 0; i < a.size(); i++) { // Assume the minimum is the first element of the remaining collection unsigned min = i; // Look for a better minimum in remaining collection if any for (unsigned j = i + 1; j < a.size(); j++) if (a[j] < a[min]) min = j; // Swap minimum with current element if (min != i) std::swap(a[min], a[i]); } } ```  ] -- .lcol.footnotesize[ ![:newline 1] ```C // Vector of unordered integer numbers std::vector
vecint = { 74, 98, 83, 52, 66, 64, 75, 8, 1, 69 }; ... // Sort vector selection_sort
(vecint); ```  ] .rcol[ ![:newline 1] ### Execution .footnotesize[ ```terminal Before : 74 98 83 52 66 64 75 8 1 69 After : 1 8 52 64 66 69 74 75 83 98 ``` ] ] --- ## Conclusion - One of the simplest sorting algorithms - Constant $\Theta(n^2)$ time complexity - Even if collection is already sorted! - In-place .lcol[ - *Not* stable - Long distance exchanges might invert position of equal items ] .rcol.footnotesize[ | i | min | a[0] | a[1] | a[2] | |:-:|:---:|:--------:|:---------:|:--------:| | 0 | 2 | B1 | B2 | .red[A] | | 1 | 1 | .gray[A] | .red[B2] | B1 | | 2 | 2 | .gray[A] | .gray[B2] | .red[B1] | | | | A | B2 | B1 | ] ??? - Complexity does not depend on input data, contrarily to most other sorting algorithms --- layout: true # Insertion sort --- ## Principle - Also $n$ phases (from $0$ to $n-1$) - At phase $i$, repeatedly swap item at `a[i]` with each larger on its left -- ### Snapshot example .center.footnotesize[ | i | pos | a[0] | a[1] | a[2] | a[3] | a[4] | a[5] | a[6] | a[7] | a[8] | |:--:|:---:|:-------:|:----:|:----:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | 2 | - | I | R | V | .gray[E] | .gray[R] | .gray[S] | .gray[I] | .gray[D] | .gray[E] | | 3 | 0 | .red[E] | I | R | V | .gray[R] | .gray[S] | .gray[I] | .gray[D] | .gray[E] | ] - Before phase $3$ (i.e., after phase $2$, as shown above): - Range $0$ to $i-1$ (`a[0-2]`) is *locally* sorted - Range $i$ to $n-1$ (`a[3-8]`) has not been seen yet - At phase $3$: - Make item at position $i$ (`a[3] = 'E'`) find its position in range $0$ to $i$ by successive swapping - After phase $3$: - Range $0$ to $i$ (`a[0-3]`) is now *locally* sorted - Go to phase $4$... ??? - Very much like someone would sort a deck of cards --- ### Full example - In .gray[gray], items do not move - In .red[red], item at phase $i$ finding its proper position in the left range ![:newline 1] .center.footnotesize[ | i | pos | a[0] | a[1] | a[2] | a[3] | a[4] | a[5] | a[6] | a[7] | a[8] | |:-:|:---:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:| | | | R | I | V | E | R | S | I | D | E | | 0 | 0 | .red[R] | .gray[I] | .gray[V] | .gray[E] | .gray[R] | .gray[S] | .gray[I] | .gray[D] | .gray[E] | | 1 | 0 | .red[I] | R | .gray[V] | .gray[E] | .gray[R] | .gray[S] | .gray[I] | .gray[D] | .gray[E] | | 2 | 2 | .gray[I] | .gray[R] | .red[V] | .gray[E] | .gray[R] | .gray[S] | .gray[I] | .gray[D] | .gray[E] | | 3 | 0 | .red[E] | I | R | V | .gray[R] | .gray[S] | .gray[I] | .gray[D] | .gray[E] | | 4 | 3 | .gray[E] | .gray[I] | .gray[R] | .red[R] | V | .gray[S] | .gray[I] | .gray[D] | .gray[E] | | 5 | 4 | .gray[E] | .gray[I] | .gray[R] | .gray[R] | .red[S] | V | .gray[I] | .gray[D] | .gray[E] | | 6 | 2 | .gray[E] | .gray[I] | .red[I] | R | R | S | V | .gray[D] | .gray[E] | | 7 | 0 | .red[D] | E | I | I | R | R | S | V | .gray[E] | | 8 | 2 | .gray[D] | .gray[E] | .red[E] | I | I | R | R | S | V | | | | D | E | E | I | I | R | R | S | V | ] --- ## Code .footnotesize[ ```C // Generic insertion sort function template
void insertion_sort(std::vector
&a) { // N phases, from 0 to N-1 for (unsigned i = 0; i < a.size(); i++) { // Swap current item with already sorted items // until finding its correct position for (int j = i ; j > 0; j--) if (a[j] < a[j-1]) std::swap(a[j], a[j-1]); else break; } } ```  ] -- ![:newline 1] ### Execution .lcol.footnotesize[ ```C // Vector of unordered integer numbers std::vector
vecint = { 74, 98, 83, 52, 66, 64, 75, 8, 1, 69 }; ... // Sort vector insertion_sort
(vecint); ```  ] .rcol.footnotesize[ ```terminal Before : 74 98 83 52 66 64 75 8 1 69 After : 1 8 52 64 66 69 74 75 83 98 ``` ] --- ## Conclusion - With selection sort, one of the simplest sorting algorithms - Popular on small arrays, performs better than efficient sorting algorithms - Usually more efficient than selection sort - Best case of $O(n)$ if collection is already sorted - However, number of writes is $O(n^2)$ because of the pairwise swaps (versus $O(n)$ for selection sort) - In-place and stable - *Online* - Can sort collection as it receives it! ??? - In practice, faster than more advanced algos such as Quicksort for small arrays - In some environments, writes are expensive (some rom memories that cannot be written many times because it causes physical degradation) - In that case, selection sort would be preferred --- template: title --- layout: false # Recap ## Classifications - Runtime complexity * Naive $O(n^2)$ (Selection sort, Insertion sort) * Efficient $O(n \lg n)$ (Heapsort, Merge sort, Quicksort) * Non-comparison based (Bucket sort, Radix sort) - Memory complexity * In-place vs additional space required - Stability * Equal items remain in same relative order ![:newline 1] ## Selection sort - At each phase $i$, populate `a[i]` with minimum item of remaining collection .scriptsize[ | i | min | a[0] | a[1] | a[2] | a[3] | a[4] | a[5] | a[6] | a[7] | a[8] | |:-:|:---:|:--------:|:--------:|:--------:|:----:|:----:|:----:|:-------:|:----:|:----:| | 3 | 3 | .gray[D] | .gray[E] | .gray[E] | .red[I] | R | S | I | R | V | ] - Characteristics: $\Theta(n^2)$, in-place, non-stable ## Insertion sort - At each phase $i$, repeatedly swap item at `a[i]` with each larger on its left - Characteristics: $O(n^2)$, best case $O(n)$ in-place, stable --- layout: true # Heapsort --- ## Principle - Two step algorithm: 1. Build heap out of collection of items ($O(n)$ using Floyd's linear approach) 1. Remove items in order ($n$ operations, $O(\log n)$ each) - Total time complexity of $O(n \log n)$ -- ## Naive implementation based on min-heap .ref[\*] .lcol60.footnotesize[ ```C // Generic heapsort function based on min-heap template
void heapsort(std::vector
&a) { // Make a vector copy for the heap std::vector
a_heap(a); // Heapify BuildHeap(a_heap); // Extract min values in order for (unsigned i = 0; i < a.size(); i++) { a[i] = Top(a_heap); Pop(a_heap); } } ```  ] .footnote[ .ref[\*] FYI, about 50 lines of code to implement heap methods (based on binary heap from priority queue) ] -- .rcol40[ - Extra vector copy required ($O(n)$) - What very specific behavior of `Pop()` can we leverage to avoid it? ] --- ## In-place implementation based on max-heap .lcol60.footnotesize[ ```C // Generic heapsort function based on max-heap template
void heapsort(std::vector
&a) { // Heapify vector std::make_heap(a.begin(), a.end()); // Pop max values in order for (unsigned i = 0; i < a.size(); i++) std::pop_heap(a.begin(), a.end() - i); } ```  ] .rcol40[ - At each iteration, max value is pushed at the end of the remaining heap - At the end, max values have been pushed in descending order from the end of the vector ] -- .wide[ ![:newline 2] ] ## Conclusion - Popular sorting algorithm - E.g., default sort function in the Linux kernel - Very consistent sorting algorithm - $O(n \log n)$ in average and worst cases - In-place (if cleverly implemented) - *Not stable* --- layout: false # Towards faster sorting algorithms ## Simple sorting .center.scriptsize[ | Name | Best | Average | Worst | Memory | Stable | |----------------|-------|---------|-------|--------|--------| | Selection sort | $n^2$ | $n^2$ | $n^2$ | 1 | No | | Insertion sort | $n$ | $n^2$ | $n^2$ | 1 | Yes | ] - Need another paradigm than repetitively and linearly scanning items... -- ![:newline 1] ## Divide-and-conquer algorithms .lcol[ 1. Divide the work into smaller chunk recursively - Make smaller pieces of the larger problem 2. Conquer the individual pieces 3. Combine the results back up recursively ] .rcol.footnotesize[ ```C DivideAndConquer (input) { if (input small enough to solve) conquer, solve, return results else divide input into smaller pieces DivideAndConquer(smaller piece #1) DivideAndConquer(smaller piece #2) combine both results and return } ``` ] -- .wide[ ![:newline 1] - Divide-and-conquer based sorting algorithms: - Merge sort - Quicksort ] --- layout: true # Merge sort --- ## Principle .lcol40[ ### Divide - If a sub-array has more than 1 item, recursively divide it into two halves ### Conquer - If a sub-array has 0 or 1 item, it is sorted ### Combine - Merge two adjacent sorted sub-arrays together - Recursively merge until whole array is reconstituted ] .rcol60[  ] --- ## Merging sorted halves ### Example - Merge two sorted halves `aux[lo]` to `aux[mid]`, and `aux[mid+1]` to `aux[hi]`, into a single sub-array `a[lo]` to `a[hi]`  ??? - Linear process --- ### Merging code .lcol66.footnotesize[ ```C template
void merge_sort_merge(std::vector
&a, std::vector
&aux, int lo, int mid, int hi) { // Save both halves into auxiliary space for (int k = lo; k <= hi; k++) aux[k] = a[k]; // Merge the two halves: [lo,mid][mid+1,hi] int i = lo, j = mid + 1; for (int k = lo; k <= hi; k++) { // Left half has been entirely merged if (i > mid) // Just take what's remaining from right half a[k] = aux[j++]; // Right half has been entirely merged else if (j > hi) // Just take what's remaining from left half a[k] = aux[i++]; // Get smallest item from either left or right halves else if (aux[j] < aux[i]) a[k] = aux[j++]; else a[k] = aux[i++]; } ```  ] .rcol33[ ![:newline 10]  ] --- ## Dividing into halves .footnotesize[ ```C template
void merge_sort_recur(std::vector
&a, std::vector
&aux, int lo, int hi) { // Recursion stop condition: subarray is sorted if it contains 0 or 1 item if (hi <= lo) return; // Otherwise, further divide subarray into two halves int mid = lo + (hi - lo) / 2; // Recursively sort left half merge_sort_recur(a, aux, lo, mid); // Recursively sort right half merge_sort_recur(a, aux, mid + 1, hi); // Merge two halves merge_sort_merge(a, aux, lo, mid, hi); } ```  ] ![:newline 2]  --- ## Putting the pieces together .lcol60.footnotesize[ ```C template
void merge_sort_merge(std::vector
&a, std::vector
&aux, int lo, int mid, int hi) { ... } template
void merge_sort_recur(std::vector
&a, std::vector
&aux, int lo, int hi) { ... } // Generic merge sort function template
void merge_sort(std::vector
&a) { // Need auxiliary space of the same size std::vector
aux(a.size()); merge_sort_recur(a, aux, 0, a.size() - 1); } ```  ] -- .rcol40[ ### Execution .footnotesize[ ```C std::vector
vecint = { 74, 98, 83, 52, 66, 64, 75, 8, 1, 69 }; ... // Sort vector merge_sort
(vecint); ```  ] .scriptsize[ ```terminal Before : 74 98 83 52 66 64 75 8 1 69 After : 1 8 52 64 66 69 74 75 83 98 ``` ] ] --- ## Conclusion - Popular sorting algorithm - Compete directly with quicksort - Best choice in certain situations: e.g., sorting a linked list - Consistent sorting algorithm - $O(n \log n)$ in average and worst cases - Dividing phase ($O(n)$), conquering/merging phase ($O(n \log n)$) - Not in-place - Memory space occupation of $O(n)$ - Stable ??? - Usage - Used as default sorting algorithm in language Perl since 5.8 - Used in Java as Arrays.sort() in some cases - Used in Linux kernel to sort linked list - Sorting linked list - Because it has good linear traversal properties (merging) compared to quicksort for example which needs random accesses - Complexity - When reconstructing, the merging phase is proportional to the level of the inverse tree - Last merge is 1 x N; previous merge phase is 2 x N; previous is 4 x N; and so on. So overall growth of $\lg N * N$ --- template: title --- layout: false # Recap ## Insertion sort .lcol[ - At phase $i$, repeatedly swap item at `a[i]` with each larger on its left - Characteristics: $O(n^2)$, in-place, stable, and *online* ] .rcol.tiny[ | i | pos | a[0] | a[1] | a[2] | a[3] | a[4] | a[5] | a[6] | a[7] | a[8] | |:--:|:---:|:-------:|:----:|:----:|:-------:|:--------:|:--------:|:--------:|:--------:|:--------:| | 3a | - | I | R | V | .red[E] | .gray[R] | .gray[S] | .gray[I] | .gray[D] | .gray[E] | | 3b | 0 | .red[E] | I | R | V | .gray[R] | .gray[S] | .gray[I] | .gray[D] | .gray[E] | ] ![:flush 1] ## Heapsort .lcol60[ - Build max-heap and pop each item - Characteristics: $O(n\lg n)$, in-place, not stable ] .rcol40.tiny[ ```C++ std::make_heap(a.begin(), a.end()); for (unsigned i = 0; i < a.size(); i++) std::pop_heap(a.begin(), a.end() - i); ``` ] ![:flush 1] ## Mergesort .lcol70[ - Divide and conquer approach * Recursively divide array in two, until single items * Recombine adjacent pairs of sorted sub-arrays - Characteristics: $O(n\lg n)$, extra $O(n)$ memory space, stable ] .rcol30[  ] --- layout: true # Quicksort --- ## Principle .lcol[ ### Divide - If partition has 0 or 1 item, it is sorted - If partition has more than 1 item - Pick a pivot (e.g., first value) - Rearrange partition around pivot ] .rcol[  ] ??? - Not really a combining/merging phase in this algorithm -- .rcol[  ] .lcol[ ### Conquer - Pivot is in final position - Recursively sort right and left sub-partitions around pivot ] ??? - zoom on partitioning (first line and second line) --- ## Partitioning ### Phase 1: partition according to pivot .lcol[ - Pivot is `a[lo]` - Repeat until `i` and `j` pointers cross - Scan `i` from left to right while `a[i] < a[lo]` - Scan `j` from right to left while `a[j] > a[lo]` - Exchange `a[i]` with `a[j]` ] .rcol[  ] ??? - Maybe, mention stability here - Long swaps -- .wide[ ### Phase 2: place pivot in right position ] .lcol[ - When pointers cross - Exchange `a[lo]` with `a[j]` - Pivot is now in final position in sorted array ] .rcol[  ] ??? - Next code --- ### Partitioning code .lcol60.footnotesize[ ```C template
int quicksort_partition(std::vector
&a, int lo, int hi) { int i = lo, j = hi + 1; while (true) { // From left to right, find first item // larger than pivot while (a[++i] < a[lo]) // Prevents from going out of bounds if (i == hi) break; // From right to left, find first item // smaller than pivot while (a[lo] < a[--j]) // To keep code symmetry but unnecessary if (j == lo) break; // If pointers i and j have crossed, stop if (i >= j) break; // Otherwise swap items std::swap(a[i], a[j]); } // Put pivot in final position std::swap(a[lo], a[j]); // Return pivot position return j; } ```  ] .rcol40[  ] ??? - Partitioning is the most important operation - Next is the recursive sorting part --- ## Recursive sorting ### Code .lcol60.footnotesize[ ```C template
void quicksort_recur(std::vector
&a, int lo, int hi) { // Recursion stop condition: // Sub-array is sorted if it contains 0 or 1 item if (hi <= lo) return; // Partition array and get resulting pivot pos int p = quicksort_partition(a, lo, hi); // Recursively sort left sub-partition quicksort_recur(a, lo, p - 1); // Recursively sort right sub-partition quicksort_recur(a, p + 1, hi); } // Generic quicksort function template
void quicksort(std::vector
&a) { quicksort_recur(a, 0, a.size() - 1); } ```  ] .rcol40[ ![:newline 3]  ] -- .rcol40[ ### Execution .footnotesize[ ```C std::vector
vecint = { 74, 98, 83, 52, 66, 64, 75, 8, 1, 69 }; ... // Sort vector quicksort
(vecint); ```  ] .scriptsize[ ```terminal Before : 74 98 83 52 66 64 75 8 1 69 After : 1 8 52 64 66 69 74 75 83 98 ``` ] ] --- ## Conclusion - One of the most popular sorting algorithms - Efficient but not guaranteed - On average $O(n \log n)$ - But worst case is $O(n^2)$ (seems rare) - Choice of pivot is important in the performance - Choosing leftmost item as pivot can cause worst-case behavior for already sorted arrays - Possible strategy is to actually shuffle the array before sorting it! - Memory occupation - $O(\log n)$ for the stack space - In-place but not stable - Possibility to make it stable if using auxiliary array ($O(n)$ extra space) ??? - algorithm of choice - Java for primitive types, C qsort(), Unix, matlab, etc. - Efficiency - Worst case happens when the pivot keeps being either the smallest or biggest item in the sublist to sort * Need to scan N items per partionining * When recursing, one subarray is size 0, the other size N-1 - Choosing pivot - random position, middle index of partition - Shuffling can be done in linear time - Stability - extra space to append items when pivoting (instead of swapping which can change the order) --- layout: true # Bucket sort --- ## Principle - Distribute all the items of an array into buckets - Sort the buckets individually - Recombine buckets in order into original array -- ## Example .lcol[ - Array of numbers to be sorted ] .rcol[ `29, 25, 3, 49, 9, 37, 21, 43` ] -- .wide[ ![:newline 0] ] .lcol[ - Scatter into buckets ] .rcol[  ] -- .wide[ ![:newline 0] ] .lcol[ - Sort each bucket individually ] .rcol[  ] -- .wide[ ![:newline 0] ] .lcol[ - Gather sorted buckets in order ] .rcol[  ] --- ## Code .footnotesize[ ```C // Bucket sort for int between 00 and 99 void bucket_sort(std::vector
&a) { // One bucket for each tens std::vector
> bckts(10); // Scatter items in buckets for (auto &i : a) { // Extract tens digit from item int bi = (i / 10) % 10; // Put item into corresponding bucket bckts[bi].push_back(i); } // Sort each bucket separately for (auto &b : bckts) insertion_sort(b); // Combine all buckets back together int k = 0; for (auto &b : bckts) for (auto &i : b) a[k++] = i; } ```  ] --- ## Limitations and conclusion - Can work well if - Items are positive integers - Belong to a well-defined (usually small) range - And are uniformly distributed in the range - Best-case scenario - If number of buckets is equal to the number of items - And items are uniformly distributed across the buckets - Then complexity of $O(n)$ - Worst-case scenario - All the items end up in a single bucket - Then complexity of bucket sorting - Up to $O(n^2)$ if using simple sorting algorithm --- layout: true # Radix sort --- ## Principle (*least significant digit radix sort*) - Starting with least significant digit, put each item in corresponding bucket - Repeat on next digit, until reaching most significant digit ## Example .small[ - Unordered list of numbers between 0-999: `170, 45, 75, 90, 12, 802, 2, 66` ] -- .lcol40.small[ - (1) Put each item in bucket corresponding to their unit digit ] .rcol60.scriptsize[ | 0s | 1s | 2s | 3s | 4s | 5s | 6s | 7s | 8s | 9s | |---------|----|------------|----|----|--------|----|----|----|----| | 170, 90 | | 12, 802, 2 | | | 45, 75 | 66 | | | | ] -- .wide[ ![:newline 0] ] .lcol40.small[ .footnotesize[ `170, 90, 12, 802, 2, 45, 75, 66` ] - (2) Repeat on tens digit ] .rcol60.scriptsize[ | 0s | 1s | 2s | 3s | 4s | 5s | 6s | 7s | 8s | 9s | |---------|----|----|----|----|----|----|---------|----|----| | 802, 02 | 12 | | | 45 | | 66 | 170, 75 | | 90 | ] -- .wide[ ![:newline 0] ] .lcol40.small[ .footnotesize[ `802, 2, 12, 45, 66, 170, 75, 90` ] - (3) Repeat on hundreds digit ] .rcol60.scriptsize[ | 0s | 1s | 2s | 3s | 4s | 5s | 6s | 7s | 8s | 9s | |------------------------------|-----|----|----|----|----|----|----|-----|----| | 002, 012, 045, 066, 075, 090 | 170 | | | | | | | 802 | | ] --- layout: false # Conclusion ## Hybrid sorting For example, *introsort*: - Starts with quicksort - Resorts to heapsort when recursion becomes too deep - Uses simple insertion sort for small partitions ??? - Introsort - Quicksort, because the fastest on average - When recursion becomes too bad (because size of partition to sort is too large), switch to heapsort (slower than quicksort on avg, but better guarantee of runtime time and memory occupation) - On small partitions, just use insertion sort - Most sorting algos used in standard libraries are now hybrid -- ## New sorting algorithms - Timsort (2002) - Used by Python, Java, Swift, Rust - Leverages the fact that most collections usually have consecutive ordered items already - Recently replaced by optimized version Powersort (2022) - Library sort (2006) - Variant of insertion sort that leaves gaps to accelerate subsequent insertions of new items ??? - New sorting algos - Not very often that they are being invented, only 2 in the last 20 years - Library sort - Comes from the analogy of a librarian who would leave gaps between sections of books for later insertions. -- ## Standard language libraries .lcol40.small[ - C++: `std::sort()` - C: `qsort()` ] .rcol60.small[ - Python: `list.sort()` - Java: `java.util.Collections.sort()` ] ??? - Language libraries usually offer efficient sorting functions as part of their API