Visualization of Sorting Algorithms

References

MIT, 6.006, Introduction to Algorithms, LECTURE 3
CLRS , Chapter 2

Summary

	Best-Case $T(n)$	Worst-Case $T(n)$	Average-Case $T(n)$ ?	Space Complexity
Bubble Sort	$O(n)$	$O(n^2)$	$O(n^2)$	$O(1)$
Selection Sort	$O(n^2)$	$O(n^2)$	$O(n^2)$	$O(1)$
Insertion Sort	$O(n)$	$O(n^2)$	$O(n^2)$	$O(1)$
Merge Sort	$O(n\log n)$	$O(n\log n)$	$O(n\log n)$	$O(n)$
Quick Sort	$O(n\log n)$	$O(n^2)$	$O(n\log n)$	$O(1)$

Can We Sort Faster?

For an array A[1...n], inversion is a pair (i, j), where i < j and A[i] > A[j].
Any algorithm only change the input by adjacent items takes $\Omega(n^2)$ time.
Any comparison-based algorithm takes $\Omega(n\log n)$ time.

Bubble Sort

Repeatedly swaps adjacent elements which are out of order
Iterates over the list multiple times, causing larger values to bubble to the end of the list

def bubbleSort(arr):
    n = len(arr)
    for i in range(n-1):
        for j in range(n-i-1):
            if arr[j] > arr[j+1]:
                arr[j], arr[j+1] = arr[j+1], arr[j]
    return arr

Worst case:

$$ T(n) = (n-1) + (n-2)+\cdots +1=\frac{n(n-1)}{2}=O(n^2) $$

Bubble sort can be modified to stop early if it finds that the list has become sorted
List is already sorted (best case): $O(n)$

def quickBubble(arr):
    n = len(arr)
    unsorted = True
    i = 0
    while unsorted:
        unsorted = False
        for j in range(n-i-1):
            if arr[j] > arr[j+1]:
                arr[j], arr[j+1] = arr[j+1], arr[j]
                unsorted = True
        i += 1  
    return arr

Selection Sort

Each time: minimum computation
Selection sort improves upon bubble sort by making fewer swaps
Average case and worst case: $T(n)=\frac{n(n-1)}{2}=O(n^2)$
List is already sorted (best case): $O(n^2)$ (Worse than bubble sort!)

def selectionSort(arr):
    n = len(arr)
    for i in range(n-1):
        minIdx = i
        for j in range(i+1, n):
            if arr[j] < arr[minIdx]:
                minIdx = j
        if minIdx != i:
            arr[i], arr[minIdx] = arr[minIdx], arr[i]
    return arr

Insertion Sort

Maintain the invariant of the prefix of the array (sorted version of those elements)
Average case and worst case: $T(n)=\frac{n(n-1)}{2}=O(n^2)$
- $O(n)$ steps (shift key positions)
- each step is $O(n)$ comparisons and swaps
List is already sorted (best case): $\Theta(n)$
“Almost sorted” case: $O(n)$
$O(n\log n)$ versus $O(n^2)$ : Once the input array length is small or the input array is almost sorted, insertion sort is quicker than merge sort!

def insertionSort(arr):
    n = len(arr)
    for i in range(1, n):
        val, j = arr[i], i
        while j:
            if val < arr[j-1]:
                arr[j], j = arr[j-1], j-1
            else:
                break
        arr[j] = val
    return arr

Binary Insertion Sort

def binarySearch(arr, target, low, high):
    if target < arr[low]:
        return low
    elif target > arr[high]:
        return high + 1
    
    mid = (low + high)//2
    
    if arr[mid] == target:
        return mid
    elif arr[mid] > target:
        return binarySearch(arr, target, low, mid-1)
    else:
        return binarySearch(arr, target, mid+1, high)

def insertionSort(arr):
    n = len(arr)
    for i in range(1, n):
        val, j = arr[i], i
        pos = binarySearch(arr, val, 0, i-1)
        while j > pos:
            arr[j], j = arr[j-1], j-1
        arr[pos] = val
    return arr

	Comparison Cost	Swap Cost
Insertion Sort	$O(n^2)$	$O(n^2)$
Binary Insertion Sort	$O(n\log n)$	$O(n^2)$

For numbers, compares $\approx$ swaps
When compares >> swaps (compares are more expensive, e.g. compare records & swap pointers), Binary Insertion Sort gives a better complexity.

Comparison Cost	Best-Case	Average-Case	Worst-Case
Insertion Sort	$O(n)$	$O(n^2)$	$O(n^2)$
Binary Insertion Sort	$O(n\log n)$	$O(n^2)$	$O(n^2)$

Inversions

Running time of Inversion Sort = # of inversions in the array [see Counting Inversions]

PROOF.

Let $Inv(i)=\# \{ j < i : arr[j] > arr[i]\}$ , then $$ \text{# of inversion} = \sum_{i=1}^{n-1} Inv(i) $$ Inversion Sort: for i = 1 to n-1: do $Inv(i)$ comparisions and swaps

Merge Sort

Divide and Conquer
Time complexity: $O(n\log n)$ [See analysis ]
Space complexity: $O(n)$

merge sort

def mergeSort(arr):
    n = len(arr)
    # base case
    if n < 2:
        return arr[:]

    m = n//2
    left, right = mergeSort(arr[:m]), mergeSort(arr[m:])

    i = j = 0
    result = []
    for k in range(n):
        if j == n-m or (i < m and left[i] < right[j]):
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    return result

place sentinels $\infty$ at end of each sorted sublists
python list method append() take amortized O(1) time

def mergeSort2(arr):
    n = len(arr)
    # base case
    if n < 2:
        return arr[:]

    m = n//2
    left, right = mergeSort(arr[:m]), mergeSort(arr[m:])
    left.append(float('Inf'))
    right.append(float('Inf'))
    i = j = 0
    result = []
    for k in range(n):
        if left[i] < right[j]:
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    return result

Merge Sort versus Insertion Sort

Insertion Sort: in-place; Merge Sort: $O(n)$ auxiliary place 【In-place merge sort】
Running time:
- Insertion Sort in Python: $0.2n^2$ $\mu s$ ; in C: $0.01n^2$ $\mu s$
- Merge Sort in Python: $2.2n\log n$ $\mu s$
- When n is sufficiently large, Merge Sort in Python beats Insertion Sort in C

Insertion Sort on Small Arrays in Merge Sort

CLRS, Problem 2-1

Although merge sort runs in $\Theta(n\log n)$ worst-case time and insertion sort runs in $\Theta(n^2)$ worst-case time, the constant factor in insertion sort can make it faster in practice for small problem sizes on many machines.

Thus, it makes sense to coarsen the leaves of the recursion by using insertion sort within merge sort when subproblems become sufficiently small.

Consider a modification to merge sort in which $n/k$ sublists of length $k$ are sorted using insertion sort and then merged using the standard merging mechanism, where $k$ is a value to be determined.

Insertion sort can sort the $n/k$ sublists, each of length $k$ in $\Theta(nk)$ worst-case time.

$$ \frac{n}{k}\times \Theta(k^2) =\Theta(nk) $$

Merge the sublists in $\Theta(n\log (n/k))$ worst-case time.

depth of recursion tree: $\log n - log k + 1=\log(n/k)+1$

merge: $\Theta(n)$

Thus the modified algorithm runs in $\Theta(nk + n\log(n/k))$ time. If $k=O(\log n)$ , the modified algorithm has at most the running time of merge sort in terms of $\Theta-$ notation.

Choice of $k=k(n)$ in practice: minimize

$$ f(n) = Ank(n) +Bn\log\left(\frac{n}{k(n)}\right) $$

Timsort

Python uses an algorithm called Timsort:

Timsort is a hybrid sorting algorithm, derived from merge sort and insertion sort, designed to perform well on many kinds of real-world data. It was invented by Tim Peters in 2002 for use in the Python programming language. The algorithm finds subsets of the data that are already ordered, and uses the subsets to sort the data more efficiently. This is done by merging an identified subset, called a run, with existing runs until certain criteria are fulfilled. Timsort has been Python’s standard sorting algorithm since version 2.3. It is now also used to sort arrays in Java SE 7, and on the Android platform.

Quick Sort

Randomized algo

from random import randint

def partition(arr, L, R):
    pivot = arr[L]
    i = L + 1
    for j in range(i, R + 1):
        if arr[j] < pivot:
            arr[j], arr[i] = arr[i], arr[j]
            i += 1
    arr[L], arr[i - 1] = arr[i - 1], arr[L]
    return i - 1

def quickSortHelper(arr, L, R):
    if R - L < 1:
        return arr
    pos = randint(L, R)
    arr[L], arr[pos] = arr[pos], arr[L]
    pos = partition(arr, L, R)
    quickSortHelper(arr, L, pos - 1)
    quickSortHelper(arr, pos + 1, R)

def quickSort(arr):
    return quickSortHelper(arr, 0, len(arr) - 1)

Sorting 排序算法

2018/01/01

Visualization of Sorting Algorithms

References

Summary

Can We Sort Faster?

Bubble Sort

Selection Sort

Insertion Sort

Binary Insertion Sort

Inversions

Merge Sort

Merge Sort versus Insertion Sort

Insertion Sort on Small Arrays in Merge Sort

Timsort

Quick Sort