Daniel's Blog: Algorithms and Data Structures

Topics:

Binary search
Quicksort, Heapsort, Introsort, Mergesort
Random number generators
Array, Linked List, Stacks and queues, Hashtable
Binary search tree, Red-black tree, AVL tree, B-tree, B+ tree
See also Algorithms and Data Structures - Part 2

Approaches and strategies:
The divide-and-conquer algorithm design paradigm divides the problem into a number of subproblems. Then it conquers the subproblems by solving them recursively. Finally, it combines the solutions to the subproblems into a solution for the original problem.
This algorithm design method is e.g. used by Quicksort and Mergesort.

Recursion in computer engineering is a method where the solution to a problem depends on solutions to smaller instances of the same (self-similar) problem.
A classic example of recursion is the definition of the factorial function, given here in Java:

public static int factorial(int n) {
    if (n <= 1) {
        return 1;
    } else {
        return n * factorial(n - 1);
    }
}

Search algorithms:
Binary search finds the position of a specified value within a sorted array.
Worst case performance: O(log n)

Java source code:

public static int search(final char character,
  final char[] alphabet) {
  int leftIndex = 0;
  int rightIndex = alphabet.length - 1;        
 
  while (leftIndex <= rightIndex) {
    final int middleIndex = leftIndex + 
      ((rightIndex - leftIndex) / 2);
    if (alphabet[middleIndex] < character) {
      leftIndex = middleIndex + 1;
    } else if (alphabet[middleIndex] > character) {
      rightIndex = middleIndex - 1;
    } else {
      return middleIndex;
    }
  }
 
  return -1;
}

Sorting algorithms:
Quicksort is a divide-and-conquer sorting algorithm developed by Tony Hoare that, on average, makes O(n log n) comparisons to sort n items. In the worst case, it makes O(n^2) comparisons, though this behavior is rare. Quicksort is often faster in practice than other O(n log n) algorithms. Quicksort can be implemented as an in-place sort, requiring only O(log n) additional space. The recursion call stack has height n in the worst case and height log n in the best case.

Quicksort selects a random pivot element from the n items to sort. (Here the pivot element is always the rightmost element of the pile.) Then Quicksort separates the n - 1 other elements into two piles: a low pile containing all elements that appear before the pivot element and a high pile that contains all elements that appear after the pivot element. By doing that recursively for smaller and smaller piles the whole pile is being sorted.

Java source code:

public static void quicksort(char[] string,
    int leftIndex, int rightIndex) {
    if (leftIndex < rightIndex) {
        int pivotIndex = partition(string,
            leftIndex, rightIndex);
        quicksort(string, leftIndex, pivotIndex-1);
        quicksort(string, pivotIndex+1, rightIndex);
    }
}
 
static int partition(char[] string, int leftIndex,
    int rightIndex) {
    int pivotIndex = rightIndex;
    // divider index for the pivot element
    int storageIndex = leftIndex;
  
    for (int i = leftIndex; i < rightIndex; i++) {
        if (string[i] < string[pivotIndex]) {
            swap(string, i, storageIndex);
            storageIndex++;
        }
    }
    swap(string, pivotIndex, storageIndex);
  
    return storageIndex;
}

Heapsort is a comparison-based sorting algorithm. Although somewhat slower in practice on most machines than a well implemented Quicksort, it has the advantage of a more favorable worst-case O(n log n) runtime. Heapsort is an in-place algorithm, but is not a stable sort. The Java source code for Heapsort is available here.

Introsort or introspective sort is a sorting algorithm designed by David Musser in 1997. It begins with Quicksort and switches to Heapsort when the recursion depth exceeds a level based on (the logarithm of) the number of elements being sorted. It is the best of both worlds, with a worst-case O(n log n) runtime and a practical performance comparable to Quicksort on typical data sets. Introsort is used by the GCC C++ STL and the SGI C++ STL.

Mergesort is an O(n log n) divide-and-conquer sorting algorithm. Most implementations produce a stable sort, meaning that the implementation preserves the input order of equal elements in the sorted output. Mergesort was invented by John von Neumann in 1945.
Mergesort does not sort in place but it has the advantage that it may be distributed across multiple machines to sort parts of really huge data set. Therefore mergesort is used a lot by web search engines. Mergesort is a typical recursive algorithms that reduces large problems into smaller ones. The recursion call stack always has height log n.

Java source code:

public static void mergesort(char[] string,
  int leftIndex, int rightIndex) {
  if (leftIndex < rightIndex) {
    int middleIndex = (leftIndex + rightIndex) / 2;
    mergesort(string, leftIndex, middleIndex);
    mergesort(string, middleIndex + 1, rightIndex);
    merge(string, leftIndex, middleIndex,
      rightIndex);
  }
}

static void merge(char[] string, int leftIndex,
  int middleIndex, int rightIndex) {
  Queue<Character> string1 = 
    new LinkedList<Character>();
  Queue<Character> string2 = 
    new LinkedList<Character>();
 
  for (int i=leftIndex; i<=middleIndex; i++) {
    string1.add(string[i]);
  }  
  for (int i=middleIndex+1; i<=rightIndex; i++) {
    string2.add(string[i]);
  }
 
  int i = leftIndex;
  while (!string1.isEmpty() && !string2.isEmpty()) {
    if (string1.peek() <= string2.peek()) {
      string[i++] = string1.poll();
    } else {
      string[i++] = string2.poll();
    }
  }
 
  while (!string1.isEmpty()) {
    string[i++] = string1.poll();
  }
  while (!string2.isEmpty()) {
    string[i++] = string2.poll();
  }
}

Random numbers
Random number generators
x(t+1) = R * x(t) * (1 - x(t))

With the initial values of R = 4.0 and x(0) = 0.2 you will get the chaotic trajectory as shown in the map above. The values for x(t+1) are chaotic and therefore like random numbers.
Dynamical systems theory characterizes the behavior of the above equation as chaotic attractor.
For more details see chapter 2 of "Complexity: A Guided Tour" by Melanie Mitchell.

Data Structures
Array
Indexing performance: O(1)
Adding or deleting values: O(1)
Search performance (unsorted array): O(n)

Linked list
Indexing performance: O(n)
Adding or deleting items: O(1)
Search performance: O(n)

Java source code (singly-linked list):

public class LinkedList<T> {
    public void put(T item) {    
        Node node = new Node(item);
        Node curNode = mHeadNode;
        if (curNode == null) {
            node.nextNode = curNode;
            mHeadNode = node;             
        } else {
            Node prevNode = null;
            while (curNode != null) {
                prevNode = curNode;
                curNode = curNode.nextNode;
            }
            node.nextNode = prevNode.nextNode;
            prevNode.nextNode = node;            
        }     
    }

    public T take() {
        Node node = getHeadNode();
        if (node != null) {
            T item = node.item;                    
            return item;
        } else {
            return null;
        }  
    }
    
    class Node
    {
        public T item;
        public Node nextNode;
        public Node(T t) { item = t; }
    };

    Node getHeadNode() {
        Node node = mHeadNode;
        if (node != null) {
            mHeadNode = node.nextNode;
            return node;
        }
        return null;
    }

    private Node mHeadNode;
}

Stacks (LIFO) and queues (FIFO)
Stacks support retrieval of data items by last-in, first-out (LIFO) order.
Queues support retrieval of data items in first-in, first out (FIFO) order.
Both stack and queue implementations are normally based on arrays or linked lists.

Hashtable / Dictionary
In computer science, a hash table or dictionary is a data structure that uses a hash function to map identifying values, known as keys (e.g., a person's name), to their associated values (e.g., their telephone number). Thus, a hash table implements an associative array. The hash function is used to transform the key into the index (the hash) of an array element (the slot or bucket) where the corresponding value is to be sought. For further details also see hashing with chaining, hashing with open addressing and hash functions.
The best case performance for adding or deleting items into or from hash tables is O(1). The search (indexing) performance is also O(1). You get the best case performance e.g. if the hash table is implemented using an array and if there are no collisions when inserting items.
If there are collisions the performance depends on the data structure that is used for managing the data items of a hash table. E.g. if a self-balancing tree is used to manage them you will get a worst-case performance of O(log n) for adding, deleting and searching items.
For more details about which underlying data structure should be used to implement hash tables see Skiena's "The Algorithm Design Manual" page 368.

Java source code (simple Hashtable with open addressing):

public class Hashtable {
    private static final int SIZE = 16;
    private Object mKeys[] = new Object[SIZE];
    private Object mObjects[] = new Object[SIZE];
 
    public void add(Object key, Object object)
        throws Exception {
        int i = 0;
        int index;
        do {
            index = hashCode(key, i);
            if (mKeys[index] == null) {
                mKeys[index] = key;
                mObjects[index] = object;
                return;
            } else {
                i++;
            }            
        } while (i < SIZE); 
        throw new Exception("hash table overflow");
    }
 
    public Object get(Object key) {
        int i = 0;
        int index;
        do {
            index = hashCode(key, i);
            if (mKeys[index].equals(key)) {
                return mObjects[index];
            }
            i++;
        } while (i < SIZE); 
        return null;
    }
 
    int hashCode(Object key, int i) {
        int k = Math.abs(key.hashCode());
        int hashCode1 = k % 701;
        int hashCode2 = 1 + (k % 700);
        int hastCode =
            (hashCode1 + i * hashCode2) % 701;
        return hastCode % SIZE;
    }
}

Binary search tree
A binary search tree is a node-based binary tree data structure which has the following properties:

The left subtree of a node contains only nodes with keys less than the node's key.
The right subtree of a node contains only nodes with keys greater than the node's key.
Both the left and right subtrees must also be binary search trees.

Generally, the information represented by each node is a record rather than a single data element. However, for sequencing purposes, nodes are compared according to their keys rather than any part of their associated records.
The major advantage of binary search trees over other data structures is that the related sorting algorithms and search algorithms such as in-order traversal can be very efficient.
Binary search trees are a fundamental data structure used to construct more abstract data structures such as sets, multisets, and associative arrays.

Adding or deleting items: O(h), where h is the height of the tree.
Search performance: O(h), where h is the height of the tree.

Java source code (see also "Introduction to Algorithms", chapter 12):

public class BinarySearchTree {
    class Node {
        public int mKey;
        public Object mObject;
        public Node mParentNode;
        public Node mLeftNode;
        public Node mRightNode;
    }
 
    private Node mRootNode;
 
    // Recursive add method.
    public void add(Node node, Node parentNode,
        int key, Object object) {
        if (node == null) {
            Node newNode = new Node();
            newNode.mKey = key;
            newNode.mObject = object;
            newNode.mParentNode = parentNode;
            if (parentNode != null) {
                if (key < parentNode.mKey) {
                    parentNode.mLeftNode = newNode;
                } else {
                    parentNode.mRightNode = newNode;
                }
            } else {
                mRootNode = newNode;
            }
            return;
        }
  
        if (key < node.mKey) {
            add(node.mLeftNode, node, key, object);
        } else {
            add(node.mRightNode, node, key, object);
        }
    }
 
    public void add(int key, Object object) {
        add(mRootNode, null, key, object);
    }
 
    // Iterative add method.
    public void add2(Node node, int key,
        Object object) {
        Node prevNode = null;  
        while (node != null) {
            prevNode = node;
            if (key < node.mKey) {
                node = node.mLeftNode;
            } else {
                node = node.mRightNode;
            }
        }
        Node newNode = new Node();
        newNode.mKey = key;
        newNode.mObject = object;
        newNode.mParentNode = prevNode;
        if (prevNode == null) {
            mRootNode = newNode;
        } else {
            if (key < prevNode.mKey) {
                prevNode.mLeftNode = newNode;
            } else {
                prevNode.mRightNode = newNode;
            }
        }
    }
 
    public void add2(int key, Object object) {
        add2(mRootNode, key, object);
    } 
 
    // Recursive search method.
    public Object search(Node node, int key) {
        if(node == null) {
            return null;
        }
        if (node.mKey == key) {
            return node.mObject;
        }
  
        if (key < node.mKey) {
            return search(node.mLeftNode, key);
        } else {
            return search(node.mRightNode, key);
        }
    }
 
    public Object search(int key) {
        return search(mRootNode, key);
    }
 
    // Iterative search method.
    public Object search2(Node node, int key) {
        if(node == null) {
            return null;
        }
  
        while (node != null && node.mKey != key) {
            if (key < node.mKey) {
                node = node.mLeftNode;
            } else {
                node = node.mRightNode;
            }
        }
        if (node != null) {
            return node.mObject;
        } else {
            return null;
        }
    }
 
    public Object search2(int key) {
        return search2(mRootNode, key);
    }
 
    // Inorder walk over the tree.
    String printBST(Node node) {
        String string = "";
        if (node != null) {
            string += printBST(node.mLeftNode);
            string += node.mObject + ", ";
            string += printBST(node.mRightNode);
        }
        return string;
    }
 
    public String toString() {
        return printBST(mRootNode);
    }
}

Red-black tree is a type of self-balancing binary search tree, a data structure that is typically used to implement associative arrays. The original structure was invented in 1972 by Rudolf Bayer and named "symmetric binary B-tree," but acquired its modern name in a paper in 1978 by Leonidas J. Guibas and Robert Sedgewick. It is complex, but has good worst-case running time for its operations and is efficient in practice: it can search, insert, and delete in O(log n) time, where n is the total number of elements in the tree. Put very simply, a red–black tree is a binary search tree that inserts and deletes in such a way that the tree is always reasonably balanced. For further details on red-black trees I suggest chapter 13 of the "Introduction to Algorithms" book.

AVL tree is a self-balancing binary search tree, and it was the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any node differ by at most one. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where n is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.
The AVL tree is named after its two Soviet inventors, G.M. Adelson-Velskii and E.M. Landis, who published it in their 1962 paper "An algorithm for the organization of information."
AVL trees are often compared with red-black trees because they support the same set of operations and because red-black trees also take O(log n) time for the basic operations. Because AVL trees are more rigidly balanced, they are faster than red-black trees for lookup intensive applications. However, red-black trees are faster for insertion and removal.

B-tree
A B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time [O(log n)]. The B-tree is a generalization of a binary search tree in that a node can have more than two children. Unlike self-balancing binary search trees, the B-tree is optimized for systems that read and write large blocks of data. It is commonly used in databases and filesystems.
In B-trees, nodes can have a variable number of keys (elements) and children. The keys of a node are stored in non-decreasing order. Each node either is a leaf node or it has some associated children that are the root nodes of subtrees. The left child node of a node's element contains all nodes (elements) with keys less than or equal to the node element's key but greater than the preceding node element's key. When data is inserted to or removed from a node, its number of keys (elements) or child nodes changes. In order to maintain the pre-defined range, nodes may be joined or split. Because a range of child nodes is permitted, B-trees do not need re-balancing as frequently as other self-balancing search trees, but may waste some space, since nodes are not entirely full. The lower and upper bounds on the number of child nodes are typically fixed for a particular implementation.
Each node of a B-tree will contain a number of keys. Usually, the number of keys is chosen to vary between t-1 and 2t-1. In practice, the keys (elements) take up the most space in a node. The factor of 2 will guarantee that nodes can be split or combined. If a node has 2t-1 keys, then adding a key to that node can be accomplished by splitting the 2t-1 key node into two t-1 key nodes and adding the median (middle) key of the original node to the parent node. Each splitted node has the required minimum number of keys.
If a new element is added into a B-tree it will always be inserted into a leaf node while the median key of the original node is shifted up into the parent node when the original node is already full.
A B-tree is kept balanced by requiring that all leaf nodes are at the same depth. This depth will increase slowly as elements are added to the tree, but an increase in the overall depth is infrequent, and results in all leaf nodes being one more node further away from the root.
B-trees have substantial advantages over alternative implementations when node access times far exceed access times within nodes, because then the cost of accessing the node may be amortized over multiple operations within the node. This usually occurs when the nodes are in secondary storage such as disk drives. By maximizing the number of child nodes within each internal node, the height of the tree decreases and the number of expensive node accesses is reduced. In addition, rebalancing the tree occurs less often. The maximum number of child nodes depends on the information that must be stored for each child node and the size of a full disk block or an analogous size in secondary storage. Practical B-trees using secondary storage want a large number of child nodes to improve performance.

Java source code (including delete) (see also "Introduction to Algorithms", chapter 18):

public class BTree {
  class Node {
    public int mNumKeys = 0;
    public int[] mKeys = new int[2*T-1];
    public Object[] mObjects = new Object[2*T-1];
    public Node[] mChildNodes = new Node[2*T];
    public boolean mIsLeafNode;
  }

  private Node mRootNode;
  private static final int T = 4;

  public BTree() {
    mRootNode = new Node();
    mRootNode.mIsLeafNode = true;
  }

  public void add(int key, Object object) {
    Node rootNode = mRootNode;
    if (rootNode.mNumKeys == (2 * T - 1)) {
      Node newRootNode = new Node();
      mRootNode = newRootNode;
      newRootNode.mIsLeafNode = false;
      mRootNode.mChildNodes[0] = rootNode;
      // Split rootNode and move its median
      // key up into newRootNode.
      splitChildNode(newRootNode, 0, rootNode);
      // Insert the key into the B-Tree
      // with root newRootNode.
      insertIntoNonFullNode(newRootNode, key,
        object);
    } else {
      // Insert the key into the B-Tree
      // with root rootNode.
      insertIntoNonFullNode(rootNode, key, object);
    }
  }

  // Split the node, node, of a B-Tree into
  // two nodes that both contain T-1 elements
  // and move node's median key up
  // to the parentNode. This method will
  // only be called if node is full; node is the
  // i-th child of parentNode.
  void splitChildNode(Node parentNode, int i,
    Node node) {
    Node newNode = new Node();
    newNode.mIsLeafNode = node.mIsLeafNode;
    newNode.mNumKeys = T - 1;
    // Copy the last T-1 elements of node
    // into newNode.
    for (int j = 0; j < T - 1; j++) {
      newNode.mKeys[j] = node.mKeys[j + T];
      newNode.mObjects[j] = node.mObjects[j + T];
    }
    if (!newNode.mIsLeafNode) {
      // Copy the last T pointers of node
      // into newNode.
      for (int j = 0; j < T; j++) {
        newNode.mChildNodes[j] =
          node.mChildNodes[j + T];
      }
      for (int j = T; j <= node.mNumKeys; j++) {
        node.mChildNodes[j] = null;
      }
    }
    for (int j = T; j < node.mNumKeys; j++) {
      node.mKeys[j] = 0;
      node.mObjects[j] = null;
    }
    node.mNumKeys = T - 1;

    // Insert a (child) pointer to node newNode
    // into the parentNode, moving other keys
    // and pointers as necessary.
    for (int j = parentNode.mNumKeys; j >= i + 1;
      j--) {
      parentNode.mChildNodes[j + 1] =
        parentNode.mChildNodes[j];
    }
    parentNode.mChildNodes[i + 1] = newNode;
    for (int j = parentNode.mNumKeys - 1; j >= i;
        j--) {
      parentNode.mKeys[j + 1] =
        parentNode.mKeys[j];
      parentNode.mObjects[j + 1] =
        parentNode.mObjects[j];
    }
    parentNode.mKeys[i] = node.mKeys[T - 1];
    parentNode.mObjects[i] = node.mObjects[T - 1];
    node.mKeys[T - 1] = 0;
    node.mObjects[T - 1] = null;
    parentNode.mNumKeys++;
  }

  // Insert an element into a B-Tree. (The element
  // will ultimately be inserted into a leaf node).
  void insertIntoNonFullNode(Node node, int key,
    Object object) {
    int i = node.mNumKeys - 1;
    if (node.mIsLeafNode) {
      // Since node is not a full node insert the
      // new element into its proper place
      // within node.
      while (i >= 0 && key < node.mKeys[i]) {
        node.mKeys[i + 1] = node.mKeys[i];
        node.mObjects[i + 1] = node.mObjects[i];
        i--;
      }
      i++;
      node.mKeys[i] = key;
      node.mObjects[i] = object;
      node.mNumKeys++;
    } else {
      // Move back from the last key of node until
      // we find the child pointer to the node
      // that is the root node of the subtree
      // where the new element should be placed.
      while (i >= 0 && key < node.mKeys[i]) {
        i--;
      }
      i++;
      if (node.mChildNodes[i].mNumKeys ==
        (2 * T - 1)) {
        splitChildNode(node, i,
          node.mChildNodes[i]);
        if (key > node.mKeys[i]) {
          i++;
        }
      }
      insertIntoNonFullNode(node.mChildNodes[i],
        key, object);
    }
  }

  // Recursive search method.
  public Object search(Node node, int key) {
    int i = 0;
    while (i < node.mNumKeys && key >
      node.mKeys[i]) {
      i++;
    }
    if (i < node.mNumKeys && key ==
      node.mKeys[i]) {
      return node.mObjects[i];
    }
    if (node.mIsLeafNode) {
      return null;
    } else {
      return search(node.mChildNodes[i], key);
    }
  }

  public Object search(int key) {
    return search(mRootNode, key);
  }

  // Iterative search method.
  public Object search2(Node node, int key) {
    while (node != null) {
      int i = 0;
      while (i < node.mNumKeys && key >
        node.mKeys[i]) {
        i++;
      }
      if (i < node.mNumKeys && key ==
        node.mKeys[i]) {
        return node.mObjects[i];
      }
      if (node.mIsLeafNode) {
        return null;
      } else {
        node = node.mChildNodes[i];
      }
    }
    return null;
  }

  public Object search2(int key) {
    return search2(mRootNode, key);
  }
}

B+ tree or B plus tree is a type of balanced tree which represents sorted data in a way that allows for efficient insertion, retrieval and removal of records, each of which is identified by a key. It is a dynamic, multilevel index, with maximum and minimum bounds on the number of keys in each index segment (usually called a "block" or "node"). In a B+ tree, in contrast to a B-tree, all records are stored at the leaf level of the tree; only keys are stored in interior nodes.
The primary value of a B+ tree is in storing data for efficient retrieval in a block-oriented storage context—in particular, filesystems. This is primarily because unlike binary search trees, B+ trees have a very high fanout (typically on the order of 100 or more), which reduces the number of I/O operations required to find an element in the tree.
The NTFS, XFS, and JFS filesystems all use this type of tree for metadata indexing. Relational database management systems such as IBM DB2, Microsoft SQL Server, MySQL and SQLite support this type of tree for table indices. Key-value database management systems such as CouchDB support this type of tree for data access.
The leaves of the B+ tree are often linked to one another in a linked list; this makes range queries or an (ordered) iteration through the blocks simpler and more efficient.

Java source code (for differences to the B-tree data structure take a look at the splitChildNode method):

public class BPlusTree {
  class Node {
    public int mNumKeys = 0;
    public int[] mKeys = new int[2*T-1];
    public Object[] mObjects = new Object[2*T-1];
    public Node[] mChildNodes = new Node[2*T];
    public boolean mIsLeafNode;
    public Node mNextNode;
  }

  private Node mRootNode;
  private static final int T = 4;

  public BPlusTree() {
    mRootNode = new Node();
    mRootNode.mIsLeafNode = true;
  }

  public void add(int key, Object object) {
    Node rootNode = mRootNode;
    if (rootNode.mNumKeys == (2 * T - 1)) {
      Node newRootNode = new Node();
      mRootNode = newRootNode;
      newRootNode.mIsLeafNode = false;
      mRootNode.mChildNodes[0] = rootNode;
      // Split rootNode and move its median
      // key up into newRootNode.
      splitChildNode(newRootNode, 0, rootNode);
      // Insert the key into the B+-Tree
      // with root newRootNode.
      insertIntoNonFullNode(newRootNode, key,
        object);
    } else {
      // Insert the key into the B+-Tree
      // with root rootNode.
      insertIntoNonFullNode(rootNode, key, object);
    }
  }

  // Split the node, node, of a B+-Tree into two
  // nodes that contain T-1 (and T) elements
  // and move node's median key up to the
  // parentNode.
  // This method will only be called if node is full;
  // node is the i-th child of parentNode.
  // All internal keys (elements) will have
  // duplicates within the leaf nodes.
  void splitChildNode(Node parentNode, int i,
    Node node) {
    Node newNode = new Node();
    newNode.mIsLeafNode = node.mIsLeafNode;
    newNode.mNumKeys = T;
    // Copy the last T elements of node
    // into newNode. Keep the median key
    // as duplicate in the first key of newNode.
    for (int j = 0; j < T; j++) {
      newNode.mKeys[j] = node.mKeys[j + T - 1];
      newNode.mObjects[j] = node.mObjects[j + T - 1];
    }
    if (!newNode.mIsLeafNode) {
      // Copy the last T + 1 pointers of node
      // into newNode.
      for (int j = 0; j < T + 1; j++) {
        newNode.mChildNodes[j] =
          node.mChildNodes[j + T - 1];
      }
      for (int j = T; j <= node.mNumKeys; j++) {
        node.mChildNodes[j] = null;
      }
    } else {
      // Manage the linked list that is used e.g.
      // for doing fast range queries.
      newNode.mNextNode = node.mNextNode;
      node.mNextNode = newNode;
    }
    for (int j = T - 1; j < node.mNumKeys; j++) {
      node.mKeys[j] = 0;
      node.mObjects[j] = null;
    }
    node.mNumKeys = T - 1;

    // Insert a (child) pointer to node newNode
    // into the parentNode, moving other keys
    // and pointers as necessary.
    for (int j = parentNode.mNumKeys; j >= i + 1;
      j--) {
      parentNode.mChildNodes[j + 1] =
        parentNode.mChildNodes[j];
    }
    parentNode.mChildNodes[i + 1] = newNode;
    for (int j = parentNode.mNumKeys - 1; j >= i;
        j--) {
      parentNode.mKeys[j + 1] =
        parentNode.mKeys[j];
      parentNode.mObjects[j + 1] =
        parentNode.mObjects[j];
    }
    parentNode.mKeys[i] = newNode.mKeys[0];
    parentNode.mObjects[i] = newNode.mObjects[0];
    parentNode.mNumKeys++;
  }

  // Insert an element into a B-Tree. (The element
  // will ultimately be inserted into a leaf node).
  void insertIntoNonFullNode(Node node, int key,
    Object object) {
    int i = node.mNumKeys - 1;
    if (node.mIsLeafNode) {
      // Since node is not a full node insert the
      // new element into its proper place
      // within node.
      while (i >= 0 && key < node.mKeys[i]) {
        node.mKeys[i + 1] = node.mKeys[i];
        node.mObjects[i + 1] = node.mObjects[i];
        i--;
      }
      i++;
      node.mKeys[i] = key;
      node.mObjects[i] = object;
      node.mNumKeys++;
    } else {
      // Move back from the last key of node until
      // we find the child pointer to the node
      // that is the root node of the subtree
      // where the new element should be placed.
      while (i >= 0 && key < node.mKeys[i]) {
        i--;
      }
      i++;
      if (node.mChildNodes[i].mNumKeys ==
        (2 * T - 1)) {
        splitChildNode(node, i,
          node.mChildNodes[i]);
        if (key > node.mKeys[i]) {
          i++;
        }
      }
      insertIntoNonFullNode(node.mChildNodes[i],
        key, object);
    }
  }

  // Recursive search method.
  public Object search(Node node, int key) {
    int i = 0;
    while (i < node.mNumKeys && key >
      node.mKeys[i]) {
      i++;
    }
    if (i < node.mNumKeys && key ==
      node.mKeys[i]) {
      return node.mObjects[i];
    }
    if (node.mIsLeafNode) {
      return null;
    } else {
      return search(node.mChildNodes[i], key);
    }
  }

  public Object search(int key) {
    return search(mRootNode, key);
  }

  // Iterative search method.
  public Object search2(Node node, int key) {
    while (node != null) {
      int i = 0;
      while (i < node.mNumKeys && key >
        node.mKeys[i]) {
        i++;
      }
      if (i < node.mNumKeys && key ==
        node.mKeys[i]) {
        return node.mObjects[i];
      }
      if (node.mIsLeafNode) {
        return null;
      } else {
        node = node.mChildNodes[i];
      }
    }
    return null;
  }

  public Object search2(int key) {
    return search2(mRootNode, key);
  }

  // Inorder walk over the tree.
  public String toString() {
    String string = "";
    Node node = mRootNode;  
    while (!node.mIsLeafNode) {   
      node = node.mChildNodes[0];
    }  
    while (node != null) {
      for (int i = 0; i < node.mNumKeys; i++) {
        string += node.mObjects[i] + ", ";
      }
      node = node.mNextNode;
    }
    return string;
  }
 
  // Inorder walk over parts of the tree.
  public String toString(int fromKey, int toKey) {
    String string = "";
    Node node = getLeafNodeForKey(fromKey);
    while (node != null) {
      for (int j = 0; j < node.mNumKeys; j++) {
        string += node.mObjects[j] + ", ";
        if (node.mKeys[j] == toKey) {
          return string;
        }
      }
      node = node.mNextNode;
    }
    return string;
  }
 
  Node getLeafNodeForKey(int key) {
    Node node = mRootNode;
    while (node != null) {
      int i = 0;
      while (i < node.mNumKeys &&
        key > node.mKeys[i]) {
        i++;
      }
      if (i < node.mNumKeys &&
        key == node.mKeys[i]) {
        node = node.mChildNodes[i + 1];
        while (!node.mIsLeafNode) {   
          node = node.mChildNodes[0];
        }
        return node;
      }
      if (node.mIsLeafNode) {
        return null;
      } else {
        node = node.mChildNodes[i];
      }
    }
    return null;
  }
}

Books about algorithms and data structures:

The Algorithm Design Manual by Steven S. Skiena.
Introduction to Algorithms by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein.

The source code of the examples above can be found here in the blog repository.

A lot of texts have been taken from Wikipedia, the free encyclopedia.

Daniel's Blog

Wednesday, November 2, 2011

Algorithms and Data Structures

3 comments:

About Me

Latest Tweets

Popular Posts

Blog Archive

Repos