Knowns and Unknowns: Algorithms

Bubble sort

Bubble sort is a straightforward and simplistic method of sorting data that is used in computer science education. The algorithm starts at the beginning of the data set. It compares the first two elements, and if the first is greater than the second, then it swaps them. It continues doing this for each pair of adjacent elements to the end of the data set. It then starts again with the first two elements, repeating until no swaps have occurred on the last pass. This algorithm is highly inefficient, and is rarely used^, except as a simplistic example.
Bubble sort average case and worst case are both O(n²).


ResultCode BubbleSort(int a[], int nSize)
{
    ResultCode res = SORT_FAILED;
    int i, j = 0;
    bool bSwap = true;
    int tmp;
    if (a)
    {
        while (bSwap)
        {
            bSwap = false;
            j++;
            for (i = 0; i < (nSize - j); i++)
            {
                if (a[i] > a[i + 1])
                {
                    tmp = a[i];
                    a[i] = a[i + 1];
                    a[i + 1] = tmp;
                    bSwap = true;
                }
            }
        }
        res = SORT_OK;
    }
    return res;
}

Selection sort

Selection sort is a sorting algorithm, specifically an in-place comparison sort. It has O(n2) complexity, making it inefficient on large lists, and generally performs worse than the similar insertion sort. Selection sort is noted for its simplicity, and also has performance advantages over more complicated algorithms in certain situations.


ResultCode SelectionSort(int numbers[], int array_size)
{
    int i, j;
    int min, temp;
    ResultCode res = SORT_FAILED;
    if (numbers)
    {
        // for each number, find the min from the list of i to the end
        // and then swap the min with i;
        for (i = 0; i < array_size-1; i++)
        {
            // set min to i
            min = i;
            // check each one from i + 1 to the end of the array
            // to find the minimum number from the list of numbers
            // from i+1 to the end
            for (j = i+1; j < array_size; j++)
            {
                // if any number is less than the min, set min to the new index
                if (numbers[j] < numbers[min])
                {
                    min = j;
                }
            }
            // swap the minimum with i
            temp = numbers[i];
            numbers[i] = numbers[min];
            numbers[min] = temp;
        }
         res = SORT_OK;
    }
    return res;
}

Insertion sort

Main article: Insertion sort

Insertion sort is a simple sorting algorithm that is relatively efficient for small lists and mostly-sorted lists, and often is used as part of more sophisticated algorithms. It works by taking elements from the list one by one and inserting them in their correct position into a new sorted list. In arrays, the new list and the remaining elements can share the array's space, but insertion is expensive, requiring shifting all following elements over by one. Shell sort (see below) is a variant of insertion sort that is more efficient for larger lists.



ResultCode InsertionSort(int arr[], int length)

{

    int i, j, tmp;

    ResultCode res = SORT_FAILED;

    if (arr)

    {

        for (i = 1; i < length; i++)

        {

            j = i;

            while (j > 0 && arr[j - 1] > arr[j])

            {

                tmp = arr[j];

                arr[j] = arr[j - 1];

                arr[j - 1] = tmp;

                j--;

            }



        }

        res = SORT_OK;

    }

    return res;

}

Shell sort

Main article: Shell sort

Shell sort was invented by Donald Shell in 1959. It improves upon bubble sort and insertion sort by moving out of order elements more than one position at a time. One implementation can be described as arranging the data sequence in a two-dimensional array and then sorting the columns of the array using insertion sort.


ResultCode ShellSort(int num[], int length)
{
    int i, temp, flag = 1;
    int d = length;
    ResultCode res = SORT_FAILED;
    if (num)
    {
        while(flag || (d > 1))
        {
            flag = 0;                     
            d = (d + 1) / 2; // use (d + 1)/2 as the gap
            // starting from positon 0 to length - d - 1, comparing num[i] 
            //and num[i+d]
            for (i = 0; i < (length - d); i++)  
            {
                if (num[i + d] > num[i])
                {
                    // swap positions i+d and i
                    temp = num[i + d];         
                    num[i + d] = num[i];
                    num[i] = temp;
                    // tells swap has occurred
                    flag = 1;                                   }
            }
        }
        res = SORT_OK;
    }
    return res;
}

Binary tree sort


typedef struct tree_el {
   int val;
   struct tree_el * right, * left;
} node;

void BinaryTreeSort(node **tree, node *item)
{
   if(!(*tree)) {
      *tree = item;
      return;
   }
   if(item->val <= (*tree)->val)
      BinaryTreeSort(&(*tree)->left, item);
   else if(item->val>(*tree)->val)
      BinaryTreeSort(&(*tree)->right, item);
}

void printout(node * tree) {
   if(tree->left) printout(tree->left);
   printf("%d\n",tree->val);
   if(tree->right) printout(tree->right);
}

Merge sort

Main article: Merge sort

Merge sort takes advantage of the ease of merging already sorted lists into a new sorted list. It starts by comparing every two elements (i.e., 1 with 2, then 3 with 4...) and swapping them if the first should come after the second. It then merges each of the resulting lists of two into lists of four, then merges those lists of four, and so on; until at last two lists are merged into the final sorted list. Of the algorithms described here, this is the first that scales well to very large lists, because its worst-case running time is O(n log n). Merge sort has seen a relatively recent surge in popularity for practical implementations, being used for the standard sort routine in the programming languages Perl^[5], Python (as timsort^[6]), and Java (also uses timsort as of JDK7^[7]), among others. Merge sort has been used in Java at least since 2000 in JDK1.3.^[8]^[9]


void Merge(int a[], int low, int high, int mid)
{
    int i, j, k, c[high - low];
    i = low;
    j = mid + 1;
    k = 0;

    while((i <= mid) && (j <= high))
    {
        if (a[i] < a[j])
        {
            c[k] = a[i];
            k++;
            i++;
        }
        else
        {
            c[k] = a[j];
            k++;
            j++;
        }
    }
    while (i <= mid)
    {
        c[k] = a[i];
        k++;
        i++;
    }
    while(j <= high)
    {
        c[k] = a[j];
        k++;
        j++;
    }
    for(i = low; i < (high + 1); i++)
    {
        a[i] = c[i - low];
    }
}


ResultCode MergeSort(int a[], int low, int high)
{
    ResultCode res = SORT_FAILED;
    int mid;

    if (a)
    {
        if (low < high)
        {
            mid = (low + high) / 2;
            MergeSort (a, low, mid);
            MergeSort(a, mid + 1, high);
            Merge(a, low, high, mid);
        }
        res = SORT_OK;
    }
    return res;
}

Heapsort

Main article: Heapsort

Heapsort is a much more efficient version of selection sort. It also works by determining the largest (or smallest) element of the list, placing that at the end (or beginning) of the list, then continuing with the rest of the list, but accomplishes this task efficiently by using a data structure called a heap, a special type of binary tree. Once the data list has been made into a heap, the root node is guaranteed to be the largest(or smallest) element. When it is removed and placed at the end of the list, the heap is rearranged so the largest element remaining moves to the root. Using the heap, finding the next largest element takes O(log n) time, instead of O(n) for a linear scan as in simple selection sort. This allows Heapsort to run in O(n log n) time, and this is also the worst case complexity.


// node index starts at 0
#define left(i) 2*i+1
#define right(i) 2*i+2

// In computer science, a heap is a specialized tree-based data structure
// that satisfies the heap property: if B is a child node of A, then
// key(A) = key(B). This implies that an element with the greatest key is
// always in the root node, and so such a heap is sometimes called a max-heap.
void max_heapify(int a[],int i,int size)
{
    int l,r,largest,temp;
    l=left(i);
    r=right(i);
    if(l<=size-1 && a[l]>a[i])
        largest = l;
    else largest = i;
    if(r<=size-1 && a[r]>a[largest])
        largest = r;
    if(largest != i)
    {
        temp = a[i];
        a[i]=a[largest];
        a[largest]=temp;
        // since we change the value of the "largest" (index) node,
        // we have to make sure there is still a max heap below that.
        max_heapify(a,largest,size);
    }
}
void build_max_heap(int a[],int size)
{
    int i;
    // starting from rightmost parent going upward.
    for(i=size/2 - 1 ;i>=0;i--)
        max_heapify(a,i,size);
}
ResultCode HeapSort(int a[],int size)
{
    int i,temp;
    ResultCode res = SORT_FAILED;
    if (a)
    {

        build_max_heap(a,size);
        for(i=size-1;i>0;i--)
        {
            // swap the root (largest value) and the end
            temp=a[i];
            a[i]=a[0];
            a[0]=temp;
            size=size - 1;
            max_heapify(a,0,size);
        }
        res = SORT_OK;
    }
    return res;
}

Quicksort

Main article: Quicksort

Quicksort is a divide and conquer algorithm which relies on a partition operation: to partition an array, we choose an element, called a pivot, move all smaller elements before the pivot, and move all greater elements after it. This can be done efficiently in linear time and in-place. We then recursively sort the lesser and greater sublists. Efficient implementations of quicksort (with in-place partitioning) are typically unstable sorts and somewhat complex, but are among the fastest sorting algorithms in practice. Together with its modest O(log n) space usage, this makes quicksort one of the most popular sorting algorithms, available in many standard libraries. The most complex issue in quicksort is choosing a good pivot element; consistently poor choices of pivots can result in drastically slower O(n²) performance, but if at each step we choose the median as the pivot then it works in O(n log n). Finding the median, however, is an O(n) operation on unsorted lists, and therefore exacts its own penalty.


ResultCode QuickSort(int arr[], int left, int right)
{

    int i = left, j = right;
    int tmp;
    ResultCode res = SORT_FAILED;
    if (arr)
    {
        int pivot = arr[(left + right) / 2];
        /* partition */
        while (i <= j)
        {
            while (arr[i] < pivot)
            {
                i++;
            }
            while (arr[j] > pivot)
            {
                j--;
            }
            if (i <= j)
            {
                tmp = arr[i];
                arr[i] = arr[j];
                arr[j] = tmp;
                i++;
                j--;
            }
        };
        /* recursion */
        if (left < j)
        {
            QuickSort(arr, left, j);
        }
        if (i < right)
        {
            QuickSort(arr, i, right);
        }
        res = SORT_OK;
    }
    return res;
}

Counting Sort

Main article: Counting sort

Counting sort is applicable when each input is known to belong to a particular set, S, of possibilities. The algorithm runs in O(|S| + n) time and O(|S|) memory where n is the length of the input. It works by creating an integer array of size |S| and using the ith bin to count the occurrences of the ith member of S in the input. Each input is then counted by incrementing the value of its corresponding bin. Afterward, the counting array is looped through to arrange all of the inputs in order. This sorting algorithm cannot often be used because S needs to be reasonably small for it to be efficient, but the algorithm is extremely fast and demonstrates great asymptotic behavior as n increases. It also can be modified to provide stable behavior.


ResultCode CountingSort (int array[], int size)
{
    int i, min, max;
    ResultCode res = SORT_FAILED;
    if (array)
    {
        min = max = array[0];
        for(i = 1; i < size; i++)
        {
            if (array[i] < min)
                min = array[i];
            else if (array[i] > max)
                max = array[i];
        }
        // under the assumption of range being smaller than size, or it's 
        // not worth it.
        int range = max - min + 1;
        int *count = (int *)malloc(range * sizeof(int));

        for(i = 0; i < range; i++)
        {
            count[i] = 0;
        }
        // count index + 2 is the value,
        // counting freqency of that value.
        for(i = 0; i < size; i++)
        {
            count[ array[i] - min ]++;
        }
        int j, z = 0;
        for(i = min; i <= max; i++)
        {
            for(j = 0; j < count[ i - min ]; j++)
            {
                array[z++] = i;
            }
        }
        free(count);
        res = SORT_OK;
    }
    return res;
}

Bucket sort

Main article: Bucket sort

Bucket sort is a divide and conquer sorting algorithm that generalizes Counting sort by partitioning an array into a finite number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. Thus this is most effective on data whose values are limited (e.g. a sort of a million integers ranging from 1 to 1000). A variation of this method called the single buffered count sort is faster than quicksort and takes about the same time to run on any set of data.



// Here array is the array to be sorted and n is the number of buckets to 
// use. The function msbits(x,k) returns the k most significant bits of x 
// (msbits = floor(x/2^(size(x)-k)), floor function returns the largest 
// integral value not greater than x.)
// uucket sort assumes that the input is generated by a random process 
// that distributes elements uniformly over the interval [0, 1), in this 
// code, the range is [0, 999]
ResultCode BucketSort(int *a,int n)
{
    vector bucket[10];
    int i, j;
    int idx = 0;

    ResultCode res = SORT_FAILED;
    if (a)
    {
        for (i = 0; i < n; i++)
        {
            bucket[(a[i]/100)].push_back(a[i]);
        }
        for (i = 0; i < 10; i++)
        {
            // creating a new array.
            int arr[bucket[i].size()];

            // copy the vector data to array because our InsertionSort 
            // accepts array.
            for (j = 0; j < bucket[i].size(); j++)
            {
                arr[j] = bucket[i][j];
            }

            InsertionSort(arr, bucket[i].size());
            // put the sorted bucket to the original array.
            for (j = 0; j < bucket[i].size(); j++)
            {
                a[idx++] = arr[j];
            }
        }
        res = SORT_OK;
    }
    return res;
}

Radix sort

Main article: Radix sort

Radix sort is an algorithm that sorts a list of fixed-size numbers of length k in O(n · k) time by treating them as bit strings. We first sort the list by the least significant bit while preserving their relative order using a stable sort. Then we sort them by the next bit, and so on from right to left, and the list will end up sorted. Most often, the counting sort algorithm is used to accomplish the bitwise sorting, since the number of values a bit can have is minimal - only '1' or '0'.


ResultCode RadixSort(int *a,int n)
{
    int i;
    int b[n], m=0, exp=1;
    ResultCode res = SORT_FAILED;

    if (a)
    {
        // Find the maximum number
        for (i=0;i m)
            {
                m = a[i];
            }
        }

        // starting from the last digit
        while ((m/exp) > 0)
        {
            int bucket[10]={0};

            // put each number in the bucket according to the value in the 
            // inspected digit
            for(i = 0; i < n; i++)
            {
                bucket[(a[i]/exp) % 10]++;
            }

            // accumulate the values to the buckets
            for(i = 1;i < 10; i++)
            {
                bucket[i] += bucket[i-1];
            }

            // smart!
            // use the bucket value - 1 (because it's the sequence!) to 
            // for indices of the new array.
            for(i = n-1;i >= 0; i--)
            {
                b[--bucket[a[i]/exp%10]] = a[i];
            }
            for(i = 0;i < n;i++)
            {
                a[i] = b[i];
            }
            exp*=10;
        }
        res = SORT_OK;

    }

    return res;
}

Name	Average	Worst	Memory	Stable	Method	Notes
Bubble Sort	n²	n²	1	Yes	Exchanging	Tiny code
Selection Sort	n²	n²	1	Depends	Selection	Its stability depends on the implementation. Used to sort this table in Safari or other Webkit web browser
Insertion Sort	n²	n²	1	Yes	Insertion	Average case is also O(n + d), where d is the number of inversions
Shell Sort	-	nlog²n	1	No	Insertion
Binary Tree Sort	nlogn	nlogn	n	Yes	Insertion	When using a self-balancing binary search tree
Merge Sort	nlogn	nlogn	n	Yes	Merging	Used to sort this table in Firefox
Heapsort	nlogn	nlogn	1	No	Selection
Quicksort	nlogn	n²	logn	Depends	Partitioning	Can be implemented as a stable sort depending on how the pivot is handled. Naïve variants use O(n) space

The following table describes sorting algorithms that are not comparison sorts. As such, they are not limited by a O(nlogn) lower bound. Complexities below are in terms of n, the number of items to be sorted, k, the size of each key, and s, the chunk size used by the implementation. Many of them are based on the assumption that the key size is large enough that all entries have unique key values, and hence that n << 2^k, where << means "much less than."

Name	Average	Worst	Memory	Stable	n << 2^k	Notes
Bucket Sort	n * k	n²* k	n * k	Yes	No	Assumes uniform distribution of elements from the domain in the array.
Counting Sort	n + k	n + k	n + 2^k	Yes	Yes	If k = O(n) then RT = O(n)
LSD Radix Sort	n * k/s	n * k/s	n	Yes	No

Knowns and Unknowns

Monday, July 12, 2010

Algorithms - Sorting