Sorting 排序算法

2018/01/01

Categories: Python algorithms sorting Tags: algorithms

Visualization of Sorting Algorithms

https://www.cs.usfca.edu/~galles/visualization/ComparisonSort.html

https://visualgo.net/bn/sorting

https://www.toptal.com/developers/sorting-algorithms

References

Summary

Best-Case $T(n)$ Worst-Case $T(n)$ Average-Case $T(n)$? Space Complexity
Bubble Sort $O(n)$ $O(n^2)$ $O(n^2)$ $O(1)$
Selection Sort $O(n^2)$ $O(n^2)$ $O(n^2)$ $O(1)$
Insertion Sort $O(n)$ $O(n^2)$ $O(n^2)$ $O(1)$
Merge Sort $O(n\log n)$ $O(n\log n)$ $O(n\log n)$ $O(n)$
Quick Sort $O(n\log n)$ $O(n^2)$ $O(n\log n)$ $O(1)$

Can We Sort Faster?

Bubble Sort

def bubbleSort(arr):
    n = len(arr)
    for i in range(n-1):
        for j in range(n-i-1):
            if arr[j] > arr[j+1]:
                arr[j], arr[j+1] = arr[j+1], arr[j]
    return arr

$$ T(n) = (n-1) + (n-2)+\cdots +1=\frac{n(n-1)}{2}=O(n^2) $$

def quickBubble(arr):
    n = len(arr)
    unsorted = True
    i = 0
    while unsorted:
        unsorted = False
        for j in range(n-i-1):
            if arr[j] > arr[j+1]:
                arr[j], arr[j+1] = arr[j+1], arr[j]
                unsorted = True
        i += 1  
    return arr

Selection Sort

def selectionSort(arr):
    n = len(arr)
    for i in range(n-1):
        minIdx = i
        for j in range(i+1, n):
            if arr[j] < arr[minIdx]:
                minIdx = j
        if minIdx != i:
            arr[i], arr[minIdx] = arr[minIdx], arr[i]
    return arr

Insertion Sort

def insertionSort(arr):
    n = len(arr)
    for i in range(1, n):
        val, j = arr[i], i
        while j:
            if val < arr[j-1]:
                arr[j], j = arr[j-1], j-1
            else:
                break
        arr[j] = val
    return arr

Binary Insertion Sort

def binarySearch(arr, target, low, high):
    if target < arr[low]:
        return low
    elif target > arr[high]:
        return high + 1
    
    mid = (low + high)//2
    
    if arr[mid] == target:
        return mid
    elif arr[mid] > target:
        return binarySearch(arr, target, low, mid-1)
    else:
        return binarySearch(arr, target, mid+1, high)

def insertionSort(arr):
    n = len(arr)
    for i in range(1, n):
        val, j = arr[i], i
        pos = binarySearch(arr, val, 0, i-1)
        while j > pos:
            arr[j], j = arr[j-1], j-1
        arr[pos] = val
    return arr
Comparison Cost Swap Cost
Insertion Sort $O(n^2)$ $O(n^2)$
Binary Insertion Sort $O(n\log n)$ $O(n^2)$
Comparison Cost Best-Case Average-Case Worst-Case
Insertion Sort $O(n)$ $O(n^2)$ $O(n^2)$
Binary Insertion Sort $O(n\log n)$ $O(n^2)$ $O(n^2)$

Inversions

PROOF.

Let $Inv(i)=\# \{ j < i : arr[j] > arr[i]\}$, then $$ \text{# of inversion} = \sum_{i=1}^{n-1} Inv(i) $$ Inversion Sort: for i = 1 to n-1: do $Inv(i)$ comparisions and swaps

Merge Sort

merge sort

def mergeSort(arr):
    n = len(arr)
    # base case
    if n < 2:
        return arr[:]

    m = n//2
    left, right = mergeSort(arr[:m]), mergeSort(arr[m:])

    i = j = 0
    result = []
    for k in range(n):
        if j == n-m or (i < m and left[i] < right[j]):
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    return result
def mergeSort2(arr):
    n = len(arr)
    # base case
    if n < 2:
        return arr[:]

    m = n//2
    left, right = mergeSort(arr[:m]), mergeSort(arr[m:])
    left.append(float('Inf'))
    right.append(float('Inf'))
    i = j = 0
    result = []
    for k in range(n):
        if left[i] < right[j]:
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    return result

Merge Sort versus Insertion Sort

Insertion Sort on Small Arrays in Merge Sort

CLRS, Problem 2-1

Although merge sort runs in $\Theta(n\log n)$ worst-case time and insertion sort runs in $\Theta(n^2)$ worst-case time, the constant factor in insertion sort can make it faster in practice for small problem sizes on many machines.

Thus, it makes sense to coarsen the leaves of the recursion by using insertion sort within merge sort when subproblems become sufficiently small.

Consider a modification to merge sort in which $n/k$ sublists of length $k$ are sorted using insertion sort and then merged using the standard merging mechanism, where $k$ is a value to be determined.

$$ \frac{n}{k}\times \Theta(k^2) =\Theta(nk) $$

$$ f(n) = Ank(n) +Bn\log\left(\frac{n}{k(n)}\right) $$

Timsort

Python uses an algorithm called Timsort:

Timsort is a hybrid sorting algorithm, derived from merge sort and insertion sort, designed to perform well on many kinds of real-world data. It was invented by Tim Peters in 2002 for use in the Python programming language. The algorithm finds subsets of the data that are already ordered, and uses the subsets to sort the data more efficiently. This is done by merging an identified subset, called a run, with existing runs until certain criteria are fulfilled. Timsort has been Python’s standard sorting algorithm since version 2.3. It is now also used to sort arrays in Java SE 7, and on the Android platform.

Quick Sort

from random import randint

def partition(arr, L, R):
    pivot = arr[L]
    i = L + 1
    for j in range(i, R + 1):
        if arr[j] < pivot:
            arr[j], arr[i] = arr[i], arr[j]
            i += 1
    arr[L], arr[i - 1] = arr[i - 1], arr[L]
    return i - 1

def quickSortHelper(arr, L, R):
    if R - L < 1:
        return arr
    pos = randint(L, R)
    arr[L], arr[pos] = arr[pos], arr[L]
    pos = partition(arr, L, R)
    quickSortHelper(arr, L, pos - 1)
    quickSortHelper(arr, pos + 1, R)

def quickSort(arr):
    return quickSortHelper(arr, 0, len(arr) - 1)