Algorithms-Design and Analysis(Stanford) Notes

2018/01/25

Categories: Python algorithms graph Tags: dataStructures algorithms sorting searching hashing bloomFilters selection graph

PDF格式笔记见:Notes

Divide and Conquer 分而治之

  1. DIVIDE into smaller sub-problems
  2. CONQUER via recursive calls
  3. COMBINE solutions of sub-problems into one for the original problem

Master Method

$$ T(n)= \begin{cases} O(n^d\log n), &a=b^d \text{ (case 1)}\newline O(n^d), &a<b^d\text{ (case 2)}\newline O(n^{\log_b a}), &\text{otherwise (case 3)} \end{cases} $$ ​ Case 1: base of logarithm does not matter

​ Case 3: base of logarithm matters

If $T(n)= aT(\frac{n}{b})+\Theta(n^d)$, then (with similar proof)

$$ T(n)= \begin{cases} \Theta(n^d\log n), &a=b^d \text{ (case 1)}\newline \Theta(n^d), &a<b^d\text{ (case 2)}\newline \Theta(n^{\log_b a}), &\text{otherwise (case 3)} \end{cases} $$

Proof (Recursion Tree Approach)

3 cases $\Leftrightarrow$ 3 types of recursion trees

For simplicity, we assume:

At each level $j=0,1,\ldots, \log_b n$, there are

$(1+\log_b n)$ levels, level-0: root, level-$\log_b n$: leaves

Then $$ \text{Total work at level-}j \leq a^j \times C\left(\frac{n}{b^j}\right)^d $$ Thus, $$ \text{Total work} \leq Cn^d\sum_{j=0}^{\log_bn}\left(\frac{a}{b^d}\right)^j \ \ (\Delta) $$ $\Rightarrow$ all depends on the relationship between $a$ and $b^d$

For constant $r>0$, $$ 1+r+r^2+\cdots+r^k = \frac{1-r^{k+1}}{1-r} \leq \begin{cases} \frac{1}{1-r}, &r<1 \newline \frac{r}{r-1} r^k, &r>1 \end{cases} $$

Case 1. $a=b^d$ $$ (\Delta) \leq Cn^d\times log_b n=O(n^d\log n) $$

Case 2. $a<b^d$

$$ (\Delta) \leq O(n^d) $$ Case 3. $a>b^d$ $$ (\Delta)\leq Cn^d \left(\frac{a}{b^d}\right)^{\log_b n}= Ca^{\log_b n}=Cn^{\log_b a}=O(n^{\log_b a}) $$

$n^{\log_ba}=a^{\log_bn} \Leftrightarrow\log_ba\log_bn=\log_bn\log_ba$

$a^{\log_b n}=$ # of leaves

Interpretations

Upper bound on the work at level $j$: $$ Cn^d\times\left(\frac{a}{b^d}\right)^j $$

$$ a:\text{ rate of sub-problem poliferation, RSP - force of evil} \\
b^d: \text{rate of work shrinkage (per sub-problem), RWS - force of good} $$

Counting Inversions

enter image description here

A = input array [length = n]

D = output [length=n]

B = 1st sorted array [n/2], C = 2nd sorted array [n/2]

SortAndCount(A)
  if n=1, return 0
  else
      (B,X) = SortAndCount(1st half of A)
      (C,Y) = SortAndCount(2nd half of A)
      (D,Z) = MergeAndCountSplitInv(A)
  return X + Y + Z

Goal: implement MergeAndCountSplitInv in linear time $O(n)$

The split inversions involving an element y of the 2nd array C are precisely the elements left in the 1st array B when y is copied to the output D.

$$ T(n)\leq 2T\left(\frac{n}{2}\right)+O(n) \Rightarrow T(n)=O(n\log n) $$

Python Code

def sortAndCount(arr):
        n = len(arr)
        # base case
        if n < 2:
            return arr, 0

        m = n//2
        left, x = sortAndCount(arr[:m])
        right, y = sortAndCount(arr[m:])

        i = j = z = 0
        for k in range(n):
            if j == n-m or (i < m and left[i] < right[j]):
                arr[k] = left[i]
                i += 1
            else:
                arr[k] = right[j]
                j += 1
                z += (m - i)
        return arr, x + y + z

Karatsuba Multiplication

$$ x = 10^{n/2}a+b, y = 10^{n/2}c+d, \text{ where }a,b,c,d\text{ are }\frac{n}{2}-\text{digit numbers} $$

$$ \Rightarrow xy = 10^nac+ 10^{n/2}(ad+bc)+bd\ \ \ \ \ \ (*) $$

Algorithm #1 (Naive method)

Running time $$ T(n) \begin{cases} =O(1), &n=1(\text{base case})\newline \leq 4T(\frac{n}{2}) + O(n), &n\geq 1(\text{4 subproblems and linear time bit addition}) \end{cases} $$ By master method,

$$ a = 4 > 2^1 = b^d (\text{case 3}) \Rightarrow T(n) = O(n^{\log_ba})=O(n^2) $$ Algorithm #2 (Gauss method)

Running time $$ T(n) \begin{cases} =O(1), &n=1(\text{base case})\newline \leq 3T(\lceil \frac{n}{2} \rceil) + O(n), &n\geq 1(\text{3 subproblems and linear time bit addition}) \end{cases} $$ By master method,

$$ a = 3 > 2^1 = b^d (\text{case 3}) \Rightarrow T(n) = O(n^{\log_ba})=O(n^{\log_23})=O(n^{1.59}) $$ Better than simple method!

Python Code

def karatsuba(x, y):
    nx, ny = len(str(x)), len(str(y))
    n = max(nx, ny)

    # base case
    if n == 1:
        return x*y

    m = n//2
    bm = 10**m
    a, b = x//bm, x%bm
    c, d = y//bm, y%bm
    term1 = karatsuba(a, c)
    term2 = karatsuba(b, d)
    term3 = karatsuba(a+b, c+d)
    result = (bm**2)*term1 + bm*(term3 - term1 - term2) + term2
    return result

$$ a = 1=2^0=b^d \text{ (case 1) }\Rightarrow T(n)\leq O(n^d\log n)=O(\log n) $$

Python Code

See link

Strassen’s Matrix Multiplication

$$ z_{ij} = \sum_{k=1}^n x_{ik}\cdot y_{kj} $$

$$ X = \begin{pmatrix} A &B\newline C &D \end{pmatrix}, Y = \begin{pmatrix} E &F\newline G &H \end{pmatrix} $$

where $A$ through $H$ are all $\frac{n}{2}\times \frac{n}{2}$ matrices. Then $$ X\cdot Y = \begin{pmatrix} AE+BG &AF+BH\newline CE+DG &CF+DH \end{pmatrix} $$

$$ a = 8>2^2=b^d \text{ (case 3) }\Rightarrow T(n)\leq O(n^{\log_b a})=O(n^{\log_2 8})=O(n^{3}) $$

$$ a = 7>2^2=b^d \text{ (case 3) }\Rightarrow T(n)\leq O(n^{\log_b a})=O(n^{\log_2 7})=O(n^{2.81}) $$

The 7 products: $$ P_1 =A(F-H)\\
P_2 =(A+B)H\\
P_3=(C+D)E\\
P_4=D(G-E)\\
P_5=(A+D)(E+H)\\
P_6=(B-D)(G+H)\\
P_7=(A-C)(E+F) $$

Then $$ \begin{align} X\cdot Y &= \begin{pmatrix} AE+BG &AF+BH\newline CE+DG &CF+DH \end{pmatrix}\newline &= \begin{pmatrix} P_5+P_4-P_2+P_6 &P_1+P_2\newline P_3+P_4 &P_1+P_5-P_3-P_7 \end{pmatrix} \end{align} $$

Python Code

import numpy as np
def strassen(X, Y):
    n = X.shape[0]
    if n == 1:
        return np.array([X[0, 0]*Y[0, 0]])
    
    if n%2 == 1: # padding with zeros
        Xpad = np.zeros((n + 1, n + 1))
        Ypad = np.zeros((n + 1, n + 1))
        Xpad[:n, :n], Ypad[:n, :n] = X, Y
        return strassen(Xpad, Ypad)[:n, :n]
    
    m = n//2
    A, B, C, D = X[:m, :m], X[:m, m:], X[m:, :m], X[m:, m:]
    E, F, G, H = Y[:m, :m], Y[:m, m:], Y[m:, :m], Y[m:, m:]
  
    P1 = strassen(A, F - H)
    P2 = strassen(A + B, H)
    P3 = strassen(C + D, E)
    P4 = strassen(D, G - E)
    P5 = strassen(A + D, E + H)
    P6 = strassen(B - D, G + H)
    P7 = strassen(A - C, E + F)
    
    Z = np.zeros((n, n))
    Z11 = P5 + P4 - P2 + P6
    Z12 = P1 + P2
    Z21 = P3 + P4
    Z22 = P1 + P5 - P3 - P7
    
    Z[:m, :m], Z[:m, m:], Z[m:, :m], Z[m:, m:] = Z11, Z12, Z21, Z22

    return Z

Closest Pair

ClosestPair $(P_x, P_y)$

  1. $Q$ = left half of $P$, $R$ = right half of $P$. Form $Q_x, Q_y, R_x, R_y$ [$O(n)$ time]
  2. $(p_1,q_1)=$ ClosestPair $(Q_x, Q_y)$
  3. $(p_2,q_2)=$ ClosestPair $(R_x, R_y)$
  4. let $\delta = \min\{d(p_1,q_1), d(p_2,q_2)\}$
  5. $(p_3,q_3)=$ ClosestSplitPair $(P_x, P_y, \delta)$. Requirements:
    • $O(n)$ time
    • correct whenever CP of $P$ is a split pair
  6. return best of $(p_1,q_1),(p_2,q_2),(p_3,q_3)$

ClosestSplitPair $(P_x, P_y, \delta)$

filtering + linear scan

  1. Let $\bar{x}$ = biggest $x$-coordinate in the left of $P$ [$O(1)$ time]
  2. Let $S_y$ = points of $P$ with $x$-coordinate in $[\bar{x}-\delta,\bar{x}+\delta]$, sorted by $y$-coordinate
  3. Initialize best = $\delta$, best point = NULL, $m=\vert S_y\vert$.
   For i = 1 to m-1
      For j = 1 to min(7, m-i)
          Let p, q = ith, (i+j)th points of P
          If d(p, q) < best
              best point = (p, q)
              best = d(p, q)

[$O(n)$ time]

Claim. Let $p\in Q$, $q\in R$ be a split pair with $d(p,q)<\delta$. Then

(A) $p,q\in S_y$

(B) $p,q$ are at most 7 positions apart in $S_y$

Proof.

(A) Let $p=(x_1,y_1)\in Q, q = (x_2, y_2)\in R$ and $d(p,q)<\delta$. Thus, $|x_1-x_2|<\delta$. $$ \Rightarrow x_1 \leq \bar{x} \leq x_2 \Rightarrow \begin{cases} \bar{x}-x_1\leq x_2-x_1 <\delta &\Rightarrow x_1 >\bar{x}-\delta \newline x_2-\bar{x}\leq x_2-x_1 <\delta &\Rightarrow x_2 <\bar{x} +\delta \end{cases} $$ (B) Evenly divide $[\bar{x}-\delta, \bar{x}+\delta]\times [\tilde{y}, \tilde{y}+\delta]$ into 8 $\frac{\delta}{2}\times \frac{\delta}{2}$ boxes, where $\tilde{y}=\min\{y_1,y_2\}$.

Then at most 1 point in each of the 8 boxes.

Proof by contradiction.

Let $a,b$ be in the same box. Then $a,b\in Q$ or $a,b\in R$. $$ d(a,b)\leq \frac{\delta}{\sqrt{2}}<\delta. \ \ \text{contradiction to definition of }\delta. $$

Corollary1. If CP of $P$ is a split pair, then ClosestSplitPair can find it.

Corollary2. ClosestPair is correct, and runs in $O(n\log n)$ time.

Python Code

import numpy as np
import numpy.linalg as LA
# find closet pair of n 2-dim points
# use n * 2 dim np array P
# each row: 1 point

def CP(P):
    Px = P[P[:, 0].argsort(kind = 'mergesort')]
    Py = P[P[:, 1].argsort(kind = 'mergesort')]
    return closestPair(Px, Py)

def closestPair(Px, Py):
    n = Px.shape[0] # no of points
    if n < 4:
        return bruteForceCP(Px)

    mid = n//2
    Qx, Rx = Px[:mid], Px[mid:]
    midx = Px[mid, 0]
    idx = (Py[:, 0] < midx)
    Qy, Ry = Py[idx], Py[~idx]
    (p1, q1), r1 = closestPair(Qx, Qy)
    (p2, q2), r2 = closestPair(Rx, Ry)

    if r1 < r2:
        best = r1
        bestpoint = (p1, q1)
    else:
        best = r2
        bestpoint = (p2, q2)

    (p3, q3), r3 = closestSplitPair(Px, Py, best, bestpoint)
    if r3 < best:
        best = r3
        bestpoint = (p3, q3)
    return bestpoint, best

def bruteForceCP(P):
    best = float('Inf')
    bestpoint = None
    n = P.shape[0]
    for i in range(n):
        for j in range(i + 1, n):
            d = LA.norm(P[i] - P[j])
            if d < best:
                best = d
                bestpoint = (P[i], P[j])
    return bestpoint, best

def closestSplitPair(Px, Py, best, bestpoint):
    n = Px.shape[0]
    mid = n//2
    midx = Px[mid, 0]
    idx = np.logical_and(midx - best <= Py[:, 0], Py[:, 0] <= midx + best)
    Sy = Py[idx]

    m = Sy.shape[0] # no of points in Sy
    for i in range(m - 1):
        for j in range(i + 1, min(i + 8, m)):
            p, q = Sy[i], Sy[j]
            d = LA.norm(p - q)
            if d < best:
                best = d
                bestpoint = (p, q)
    return bestpoint, best

# Test
import time
a = np.random.randint(1e5, size = (1000, 2))
tic = time.time()
b = bruteForceCP(a)
toc = time.time()
print("Brute Force takes time:", toc - tic)
print("ans:", b)
# Brute Force takes time: 4.079403400421143
# ans: ((array([29326, 36462]), array([29246, 36561])), 127.283148923964)

tic = time.time()
c = CP(a)
toc = time.time()
print("D&C takes time:", toc - tic)
print("ans:", c)
# D&C takes time: 0.14911723136901855
# ans: ((array([29326, 36462]), array([29246, 36561])), 127.283148923964)

Fictitious Example

$$ T(n)=2T\left(\frac{n}{2}\right) + O(n^2) \\
\Rightarrow a=2<2^2=b^d (\text{case 2}) \Rightarrow T(n)=O(n^2) $$

Recursion depth = $O(\log n)$, each level $O(n^2)$.

$T(n)=O(n^2)=o(n^2\log n)$! Master method gives a tighter upper bound!

Sorting & Selection

SORTING SELECTION
Randomized Quick Sort RSelect
Deterministic Merge Sort DSelect

Merge Sort versus Quick Sort

Running Time Average Case Worst Case
Merge Sort $O(n\log n)$ $O(n\log n)$
Quick Sort $O(n\log n)$ $O(n^2)$

Merge Sort (D&C)

C = output [length=n]

A = 1st sorted array [n/2], B = 2nd sorted array [n/2]

i = 1, j = 1

for k = 1 to n
  if A(i)<B(j)
      C(k)=A(i)
      i++
  else [B(j)<A(i)]
      C(k)=B(j)
      j++
end 

(ignores end cases)

Running Time

i = 1, j = 1 [2 operations]

each for loop: comparison, assignment, increment of i or j, increment of k, comparison of k to the upper bound n

Proof by Recursion Tree Method

Proof by Master Method

$$ a = 2=2^1=b^d \text{ (case 1) }\Rightarrow T(n)\leq O(n^d\log n)=O(n\log n) $$

Python Code

def mergeSort(arr):
    n = len(arr)
    # base case
    if n < 2:
        return arr[:]

    m = n//2
    left, right = mergeSort(arr[:m]), mergeSort(arr[m:])
    left.append(float('Inf'))
    right.append(float('Inf'))
    i = j = 0
    result = []
    for k in range(n):
        if left[i] < right[j]:
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
    return result

Quick Sort

Quick Sort

Partition (In-Place Implementation)

Partition(A, l, r)

p := A[l]
i : = l + 1
for j = l + 1 to r:
  if A[j] < p:
      swap A[j], A[i]
      i += 1
swap A[l] and A[i-1]

Correctness of Quicksort

$P(n)$ = “Quick Sort correctly sorts every input array of length $n$

Claim: $P(n)$ holds for every $n\geq 1$. [no matter how pivot is chosen]

Proof by induction.

Need to show: if $P(k)$ holds $\forall k\leq n$, then $P(n)$ holds as well.

Quick Sort first partitions A around some pivot p.

Pivot winds up in correct position.

Let $k_1, k_2$ be lengths of 1st, 2nd parts of partitioned array.

Since $k_1<n, k_2<n$, by inductive hypothesis, 1st, 2nd parts get sorted correctly by recursive calls.

So after recursive calls, entire array is correctly sorted.

Choice of Pivot

Quick Sort Theorem (Use Decomposition Principle)

Theorem: for every input array length $n$, the average running time of Quick Sort (with random pivots) is $O(n\log n)$. 【”average” is over random choices made by the algorithm (i.e. pivot choices)】

Proof.


Fix input array A of length $n$.

Sample space $\Omega$ = all possible outcomes of random choices in Quick Sort. (i.e. pivot sequences)

Key random variable: for $\sigma\in\Omega$, $$ C(\sigma)=\text{# of comparisons between 2 input elements made by Quick Sort given random choices }\sigma $$

Lemma. Running time of Quick Sort is dominated by comparisons. $$ \Rightarrow \exists \text{ constant } d, \ \forall \sigma\in\Omega, \ RT(\sigma) \leq d\cdot C(\sigma) $$ Remaining goal: prove $E[C]=O(n\log n)$.

Notation: $z_i= ith$ smallest element of A.

For $\sigma\in \Omega$, indices $i<j$, let $$ X_{ij}=\text{ # of times } z_i, z_j \text{ get compared in Quick Sort with pivot sequence }\sigma $$

$$ \Rightarrow X_{ij} = 0 \text{ or } 1 \text{ (indicator variable)} = \mathbb{1}(z_i, z_j \text{ get compared with }\sigma) $$

Then, $\forall \sigma$, $$ C(\sigma) = \sum_{i=1}^n\sum_{j=i+1}^n X_{ij}(\sigma) $$ By linearity of expectation, $$ E[C] = \sum_{i=1}^{n-1}\sum_{j=i+1}^n E[X_{ij}]= \sum_{i=1}^{n-1} \sum_{j=i+1}^n\Pr(X_{ij}=1)\ \ \ (*) $$ Key claim: $\forall i<j$, $\Pr(X_{ij}=1)=\Pr(z_i, z_j \text{ get compared})=\frac{2}{j-i+1}$

As long as none of $z_i,z_{i+1},\ldots,z_j$ are chosen as a pivot, all are passed to the same recursive call.

Consider the first among $z_i,z_{i+1},\ldots,z_j$ that gets chosen as a pivot.

Note: since pivots always chosen uniformly at random, each of $z_i,z_{i+1},\ldots,z_j$ is equally likely to be the first. $$ \Rightarrow \Pr(z_i, z_j \text{ get compared}) = \frac{2}{j-i+1} $$

$$ \Rightarrow E[C]=2\sum_{i=1}^{n-1}\sum_{j=i+1}^n \frac{1}{j-i+1}\leq 2n\sum_{k=2}^n\frac{1}{k}\leq 2n\log n $$

$$ \sum_{k=2}^n\frac{1}{k}=\sum_{k=2}^n\int_{k-1}^k\frac{1}{k}dx< \sum_{k=2}^n\int_{k-1}^k\frac{1}{x}dx=\int_{1}^n\frac{1}{x}dx=\log n. $$


Python Code

from random import randint

def partition(arr, L, R):
    pivot = arr[L]
    i = L + 1
    for j in range(i, R + 1):
        if arr[j] < pivot:
            arr[j], arr[i] = arr[i], arr[j]
            i += 1
    arr[L], arr[i - 1] = arr[i - 1], arr[L]
    return i - 1

def quickSortHelper(arr, L, R):
    if R - L < 1:
        return arr
    pos = randint(L, R)
    arr[L], arr[pos] = arr[pos], arr[L]
    pos = partition(arr, L, R)
    quickSortHelper(arr, L, pos - 1)
    quickSortHelper(arr, pos + 1, R)

def quickSort(arr):
    return quickSortHelper(arr, 0, len(arr) - 1)

RSelect

DSelect

Graph

Graph Representation

Adjacency Matrix

$$ A_{ij}=1 \Leftrightarrow G\text{ has an }i-j \text{ edge} $$

Adjacency List

Minimum Cuts Problem

Input: undirected graph $G=(V,E)$ [parallel edges allowed]

Output: a cut with fewest number of crossing edges

Random Contraction Algorithm

While $\exists > 2$ vertices,


Fix a graph $G=(V,E)$ with $n$ vertices and $m$ edges.

Fix a minimum cut $(A,B)$.

Let $k$= # of edges crossing $(A,B)$. Call these edges $F$.

  1. Suppose an edge of $F$ is contracted at some point $\Rightarrow$ algorithm will output $(A,B)$.
  2. Suppose only edge inside $A,B$ get contracted $\Rightarrow$ algorithm will not output $(A,B)$.

Thus, $$ \Pr(\text{output is } (A,B)) = \Pr(\text{never contract an edge of }F) $$ Let $S_i=$ event that an edge of $F$ contracted in iteration $i$.

Goal: compute $\Pr\left(\overline{S_1}\cap \overline{S_2}\cap\cdots\cap \overline{S_{n-2}}\right)$ $$ \Pr(S_i)=\frac{\text{# of crossing edges}}{\text{# of edges}}=\frac{k}{m} $$ Key observation: degree of each vertex $\geq k$.

Reason: each vertex $v$ defines a cut $(\{v\}, V-\{v\})$.

Since $\sum_{v\in V}\deg(v)=2m$, $$ \Rightarrow 2m\geq kn\Rightarrow \Pr(S_i)\leq \frac{2}{n}. $$

$$ \begin{aligned} \Pr(\overline{S_1}\cap\overline{S_2})&=\Pr(\overline{S_2}\vert \overline{S_1})\Pr(\overline{S_1})\newline &\geq \left[ 1-\Pr(S_2\vert \overline{S_1}) \right]\left(1-\frac{2}{n}\right)\newline &= \left[ 1-\frac{k}{\text{# of remaining edges}} \right]\left(1-\frac{2}{n}\right) \end{aligned} $$

Note: all nodes in contracted graph define cuts in $G$ (with at least $k$ crossing edges)

$\Rightarrow$ all degrees in contracted graph$\geq k$

$\Rightarrow$ # of remaining edges $\geq \frac{1}{2}k(n-1)$

$\Rightarrow \Pr(\overline{S_2}\vert \overline{S_1})\geq 1-\frac{2}{n-1}$

In general, $$ \begin{aligned} \Pr\left(\overline{S_1}\cap \overline{S_2}\cap\cdots\cap \overline{S_{n-2}}\right)&= \Pr(\overline{S_1})\Pr(\overline{S_2}|\overline{S_1})\Pr(\overline{S_3}|\overline{S_1}\cap\overline{S_2}) \cdots\Pr(\overline{S_{n-2}}|\overline{S_1}\cap\overline{S_2}\cap\cdots\cap\overline{S_{n-3}})\newline &\geq \left(1-\frac{2}{n}\right) \left(1-\frac{2}{n-1}\right) \cdots \left(1-\frac{2}{4}\right) \left(1-\frac{2}{3}\right)\newline &=\frac{n-2}{n} \frac{n-3}{n-1} \frac{n-4}{n-2} \cdots \frac{2}{4} \frac{1}{3}\newline &=\frac{2}{n(n-1)}=\frac{1}{\binom{n}{2}}\geq \frac{1}{n^2} \end{aligned} $$


Run the basic algorithm a large $N$ times, remember the smallest cut found.

Let $T_i$ = event that min cut $(A,B)$ is found on the $i$th trial.

By definition, $T_i$ iid. $$ \begin{aligned} \Pr(\text{all trials fail}) &=\Pr(\overline{T_1}\cap\overline{T_2}\cap\cdots\overline{T_N})\newline &=\prod_{i=1}^N\Pr(\overline{T_i})=\left(1-\frac{1}{n^2}\right)^N \end{aligned} $$

$\forall x\in\mathbb{R},\ 1+x\leq e^x$

$N=n^2$ $$ \Pr(\text{all trials fail})\leq e^{-\frac{1}{n^2}n^2}=\frac{1}{e} $$ $N=n^2\log n$ $$ \Pr(\text{all trials fail})\leq e^{-\frac{\log n}{n^2}n^2}=\frac{1}{n} $$ Running time: $\Omega(n^2m)$$n^2$ trails, each time $O(m)$

Better than brute force method $O(2^n)$.

Counting Minimum Cuts

Note: a graph can have multiple min cuts.

Claim. Largest # of min cuts that a graph with $n$ vertices can have: $\binom{n}{2}$

Let $\{(A_j,B_j)\}_{j=1}^J$ be min cuts of a graph with $n$ vertices.

Event $E_j=$ output $(A_j,B_j)$.By contraction algorithm, $$ \Pr(\text{output}=(A_j,B_j))\geq\frac{1}{\binom{n}{2}} $$ Since $E_j$ ind $\Rightarrow J\leq\binom{n}{2}$.

Python Code

import random
def findMinCut(g):
    """
    input: graph g, 
           adjacency list representation using dictionary
           keys: nodes
           values: list of adjacent nodes
    """
    if len(g) < 3:
        return len(g[list(g.keys())[0]])
    v1, v2 = chooseEdge(g)
    contract(g, v1, v2)
    return findMinCut(g)

# edge might not be uniformly chosen        
def chooseEdge(g):
    """
    choose random edge (v1, v2)
    """
    v1 = random.choice(list(g.keys()))
    v2 = random.choice(g[v1])
    return v1, v2

def contract(g, v1, v2):
    """
    merge node v2 with v1
    """
    g[v1].extend(g[v2])
    for k in g[v2]:
        g[k] = [v1 if x == v2 else x for x in g[k]]
    del g[v2]
    g[v1] = [x for x in g[v1] if x != v1]
    

from copy import deepcopy
mincuts = []
for i in range(10):
    mincuts.append(findMinCut(deepcopy(g)))
print("min cuts = ", min(mincuts))

BFS

DFS

Hashing

Bloom Filters