Daniel's Blog

Wednesday, November 2, 2011

Algorithms and Data Structures

Topics:

Binary search
Quicksort, Heapsort, Introsort, Mergesort
Random number generators
Array, Linked List, Stacks and queues, Hashtable
Binary search tree, Red-black tree, AVL tree, B-tree, B+ tree
See also Algorithms and Data Structures - Part 2

Approaches and strategies:
The divide-and-conquer algorithm design paradigm divides the problem into a number of subproblems. Then it conquers the subproblems by solving them recursively. Finally, it combines the solutions to the subproblems into a solution for the original problem.
This algorithm design method is e.g. used by Quicksort and Mergesort.

Recursion in computer engineering is a method where the solution to a problem depends on solutions to smaller instances of the same (self-similar) problem.
A classic example of recursion is the definition of the factorial function, given here in Java:

public static int factorial(int n) {
    if (n <= 1) {
        return 1;
    } else {
        return n * factorial(n - 1);
    }
}

Search algorithms:
Binary search finds the position of a specified value within a sorted array.
Worst case performance: O(log n)

Java source code:

public static int search(final char character,
  final char[] alphabet) {
  int leftIndex = 0;
  int rightIndex = alphabet.length - 1;        
 
  while (leftIndex <= rightIndex) {
    final int middleIndex = leftIndex + 
      ((rightIndex - leftIndex) / 2);
    if (alphabet[middleIndex] < character) {
      leftIndex = middleIndex + 1;
    } else if (alphabet[middleIndex] > character) {
      rightIndex = middleIndex - 1;
    } else {
      return middleIndex;
    }
  }
 
  return -1;
}

Sorting algorithms:
Quicksort is a divide-and-conquer sorting algorithm developed by Tony Hoare that, on average, makes O(n log n) comparisons to sort n items. In the worst case, it makes O(n^2) comparisons, though this behavior is rare. Quicksort is often faster in practice than other O(n log n) algorithms. Quicksort can be implemented as an in-place sort, requiring only O(log n) additional space. The recursion call stack has height n in the worst case and height log n in the best case.

Quicksort selects a random pivot element from the n items to sort. (Here the pivot element is always the rightmost element of the pile.) Then Quicksort separates the n - 1 other elements into two piles: a low pile containing all elements that appear before the pivot element and a high pile that contains all elements that appear after the pivot element. By doing that recursively for smaller and smaller piles the whole pile is being sorted.

Java source code:

public static void quicksort(char[] string,
    int leftIndex, int rightIndex) {
    if (leftIndex < rightIndex) {
        int pivotIndex = partition(string,
            leftIndex, rightIndex);
        quicksort(string, leftIndex, pivotIndex-1);
        quicksort(string, pivotIndex+1, rightIndex);
    }
}
 
static int partition(char[] string, int leftIndex,
    int rightIndex) {
    int pivotIndex = rightIndex;
    // divider index for the pivot element
    int storageIndex = leftIndex;
  
    for (int i = leftIndex; i < rightIndex; i++) {
        if (string[i] < string[pivotIndex]) {
            swap(string, i, storageIndex);
            storageIndex++;
        }
    }
    swap(string, pivotIndex, storageIndex);
  
    return storageIndex;
}

Heapsort is a comparison-based sorting algorithm. Although somewhat slower in practice on most machines than a well implemented Quicksort, it has the advantage of a more favorable worst-case O(n log n) runtime. Heapsort is an in-place algorithm, but is not a stable sort. The Java source code for Heapsort is available here.

Introsort or introspective sort is a sorting algorithm designed by David Musser in 1997. It begins with Quicksort and switches to Heapsort when the recursion depth exceeds a level based on (the logarithm of) the number of elements being sorted. It is the best of both worlds, with a worst-case O(n log n) runtime and a practical performance comparable to Quicksort on typical data sets. Introsort is used by the GCC C++ STL and the SGI C++ STL.

Mergesort is an O(n log n) divide-and-conquer sorting algorithm. Most implementations produce a stable sort, meaning that the implementation preserves the input order of equal elements in the sorted output. Mergesort was invented by John von Neumann in 1945.
Mergesort does not sort in place but it has the advantage that it may be distributed across multiple machines to sort parts of really huge data set. Therefore mergesort is used a lot by web search engines. Mergesort is a typical recursive algorithms that reduces large problems into smaller ones. The recursion call stack always has height log n.

Java source code:

public static void mergesort(char[] string,
  int leftIndex, int rightIndex) {
  if (leftIndex < rightIndex) {
    int middleIndex = (leftIndex + rightIndex) / 2;
    mergesort(string, leftIndex, middleIndex);
    mergesort(string, middleIndex + 1, rightIndex);
    merge(string, leftIndex, middleIndex,
      rightIndex);
  }
}

static void merge(char[] string, int leftIndex,
  int middleIndex, int rightIndex) {
  Queue<Character> string1 = 
    new LinkedList<Character>();
  Queue<Character> string2 = 
    new LinkedList<Character>();
 
  for (int i=leftIndex; i<=middleIndex; i++) {
    string1.add(string[i]);
  }  
  for (int i=middleIndex+1; i<=rightIndex; i++) {
    string2.add(string[i]);
  }
 
  int i = leftIndex;
  while (!string1.isEmpty() && !string2.isEmpty()) {
    if (string1.peek() <= string2.peek()) {
      string[i++] = string1.poll();
    } else {
      string[i++] = string2.poll();
    }
  }
 
  while (!string1.isEmpty()) {
    string[i++] = string1.poll();
  }
  while (!string2.isEmpty()) {
    string[i++] = string2.poll();
  }
}

Random numbers
Random number generators
x(t+1) = R * x(t) * (1 - x(t))

With the initial values of R = 4.0 and x(0) = 0.2 you will get the chaotic trajectory as shown in the map above. The values for x(t+1) are chaotic and therefore like random numbers.
Dynamical systems theory characterizes the behavior of the above equation as chaotic attractor.
For more details see chapter 2 of "Complexity: A Guided Tour" by Melanie Mitchell.

Data Structures
Array
Indexing performance: O(1)
Adding or deleting values: O(1)
Search performance (unsorted array): O(n)

Linked list
Indexing performance: O(n)
Adding or deleting items: O(1)
Search performance: O(n)

Java source code (singly-linked list):

public class LinkedList<T> {
    public void put(T item) {    
        Node node = new Node(item);
        Node curNode = mHeadNode;
        if (curNode == null) {
            node.nextNode = curNode;
            mHeadNode = node;             
        } else {
            Node prevNode = null;
            while (curNode != null) {
                prevNode = curNode;
                curNode = curNode.nextNode;
            }
            node.nextNode = prevNode.nextNode;
            prevNode.nextNode = node;            
        }     
    }

    public T take() {
        Node node = getHeadNode();
        if (node != null) {
            T item = node.item;                    
            return item;
        } else {
            return null;
        }  
    }
    
    class Node
    {
        public T item;
        public Node nextNode;
        public Node(T t) { item = t; }
    };

    Node getHeadNode() {
        Node node = mHeadNode;
        if (node != null) {
            mHeadNode = node.nextNode;
            return node;
        }
        return null;
    }

    private Node mHeadNode;
}

Stacks (LIFO) and queues (FIFO)
Stacks support retrieval of data items by last-in, first-out (LIFO) order.
Queues support retrieval of data items in first-in, first out (FIFO) order.
Both stack and queue implementations are normally based on arrays or linked lists.

Hashtable / Dictionary
In computer science, a hash table or dictionary is a data structure that uses a hash function to map identifying values, known as keys (e.g., a person's name), to their associated values (e.g., their telephone number). Thus, a hash table implements an associative array. The hash function is used to transform the key into the index (the hash) of an array element (the slot or bucket) where the corresponding value is to be sought. For further details also see hashing with chaining, hashing with open addressing and hash functions.
The best case performance for adding or deleting items into or from hash tables is O(1). The search (indexing) performance is also O(1). You get the best case performance e.g. if the hash table is implemented using an array and if there are no collisions when inserting items.
If there are collisions the performance depends on the data structure that is used for managing the data items of a hash table. E.g. if a self-balancing tree is used to manage them you will get a worst-case performance of O(log n) for adding, deleting and searching items.
For more details about which underlying data structure should be used to implement hash tables see Skiena's "The Algorithm Design Manual" page 368.

Java source code (simple Hashtable with open addressing):

public class Hashtable {
    private static final int SIZE = 16;
    private Object mKeys[] = new Object[SIZE];
    private Object mObjects[] = new Object[SIZE];
 
    public void add(Object key, Object object)
        throws Exception {
        int i = 0;
        int index;
        do {
            index = hashCode(key, i);
            if (mKeys[index] == null) {
                mKeys[index] = key;
                mObjects[index] = object;
                return;
            } else {
                i++;
            }            
        } while (i < SIZE); 
        throw new Exception("hash table overflow");
    }
 
    public Object get(Object key) {
        int i = 0;
        int index;
        do {
            index = hashCode(key, i);
            if (mKeys[index].equals(key)) {
                return mObjects[index];
            }
            i++;
        } while (i < SIZE); 
        return null;
    }
 
    int hashCode(Object key, int i) {
        int k = Math.abs(key.hashCode());
        int hashCode1 = k % 701;
        int hashCode2 = 1 + (k % 700);
        int hastCode =
            (hashCode1 + i * hashCode2) % 701;
        return hastCode % SIZE;
    }
}

Binary search tree
A binary search tree is a node-based binary tree data structure which has the following properties:

The left subtree of a node contains only nodes with keys less than the node's key.
The right subtree of a node contains only nodes with keys greater than the node's key.
Both the left and right subtrees must also be binary search trees.

Generally, the information represented by each node is a record rather than a single data element. However, for sequencing purposes, nodes are compared according to their keys rather than any part of their associated records.
The major advantage of binary search trees over other data structures is that the related sorting algorithms and search algorithms such as in-order traversal can be very efficient.
Binary search trees are a fundamental data structure used to construct more abstract data structures such as sets, multisets, and associative arrays.

Adding or deleting items: O(h), where h is the height of the tree.
Search performance: O(h), where h is the height of the tree.

Java source code (see also "Introduction to Algorithms", chapter 12):

public class BinarySearchTree {
    class Node {
        public int mKey;
        public Object mObject;
        public Node mParentNode;
        public Node mLeftNode;
        public Node mRightNode;
    }
 
    private Node mRootNode;
 
    // Recursive add method.
    public void add(Node node, Node parentNode,
        int key, Object object) {
        if (node == null) {
            Node newNode = new Node();
            newNode.mKey = key;
            newNode.mObject = object;
            newNode.mParentNode = parentNode;
            if (parentNode != null) {
                if (key < parentNode.mKey) {
                    parentNode.mLeftNode = newNode;
                } else {
                    parentNode.mRightNode = newNode;
                }
            } else {
                mRootNode = newNode;
            }
            return;
        }
  
        if (key < node.mKey) {
            add(node.mLeftNode, node, key, object);
        } else {
            add(node.mRightNode, node, key, object);
        }
    }
 
    public void add(int key, Object object) {
        add(mRootNode, null, key, object);
    }
 
    // Iterative add method.
    public void add2(Node node, int key,
        Object object) {
        Node prevNode = null;  
        while (node != null) {
            prevNode = node;
            if (key < node.mKey) {
                node = node.mLeftNode;
            } else {
                node = node.mRightNode;
            }
        }
        Node newNode = new Node();
        newNode.mKey = key;
        newNode.mObject = object;
        newNode.mParentNode = prevNode;
        if (prevNode == null) {
            mRootNode = newNode;
        } else {
            if (key < prevNode.mKey) {
                prevNode.mLeftNode = newNode;
            } else {
                prevNode.mRightNode = newNode;
            }
        }
    }
 
    public void add2(int key, Object object) {
        add2(mRootNode, key, object);
    } 
 
    // Recursive search method.
    public Object search(Node node, int key) {
        if(node == null) {
            return null;
        }
        if (node.mKey == key) {
            return node.mObject;
        }
  
        if (key < node.mKey) {
            return search(node.mLeftNode, key);
        } else {
            return search(node.mRightNode, key);
        }
    }
 
    public Object search(int key) {
        return search(mRootNode, key);
    }
 
    // Iterative search method.
    public Object search2(Node node, int key) {
        if(node == null) {
            return null;
        }
  
        while (node != null && node.mKey != key) {
            if (key < node.mKey) {
                node = node.mLeftNode;
            } else {
                node = node.mRightNode;
            }
        }
        if (node != null) {
            return node.mObject;
        } else {
            return null;
        }
    }
 
    public Object search2(int key) {
        return search2(mRootNode, key);
    }
 
    // Inorder walk over the tree.
    String printBST(Node node) {
        String string = "";
        if (node != null) {
            string += printBST(node.mLeftNode);
            string += node.mObject + ", ";
            string += printBST(node.mRightNode);
        }
        return string;
    }
 
    public String toString() {
        return printBST(mRootNode);
    }
}

Red-black tree is a type of self-balancing binary search tree, a data structure that is typically used to implement associative arrays. The original structure was invented in 1972 by Rudolf Bayer and named "symmetric binary B-tree," but acquired its modern name in a paper in 1978 by Leonidas J. Guibas and Robert Sedgewick. It is complex, but has good worst-case running time for its operations and is efficient in practice: it can search, insert, and delete in O(log n) time, where n is the total number of elements in the tree. Put very simply, a red–black tree is a binary search tree that inserts and deletes in such a way that the tree is always reasonably balanced. For further details on red-black trees I suggest chapter 13 of the "Introduction to Algorithms" book.

AVL tree is a self-balancing binary search tree, and it was the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any node differ by at most one. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where n is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.
The AVL tree is named after its two Soviet inventors, G.M. Adelson-Velskii and E.M. Landis, who published it in their 1962 paper "An algorithm for the organization of information."
AVL trees are often compared with red-black trees because they support the same set of operations and because red-black trees also take O(log n) time for the basic operations. Because AVL trees are more rigidly balanced, they are faster than red-black trees for lookup intensive applications. However, red-black trees are faster for insertion and removal.

B-tree
A B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time [O(log n)]. The B-tree is a generalization of a binary search tree in that a node can have more than two children. Unlike self-balancing binary search trees, the B-tree is optimized for systems that read and write large blocks of data. It is commonly used in databases and filesystems.
In B-trees, nodes can have a variable number of keys (elements) and children. The keys of a node are stored in non-decreasing order. Each node either is a leaf node or it has some associated children that are the root nodes of subtrees. The left child node of a node's element contains all nodes (elements) with keys less than or equal to the node element's key but greater than the preceding node element's key. When data is inserted to or removed from a node, its number of keys (elements) or child nodes changes. In order to maintain the pre-defined range, nodes may be joined or split. Because a range of child nodes is permitted, B-trees do not need re-balancing as frequently as other self-balancing search trees, but may waste some space, since nodes are not entirely full. The lower and upper bounds on the number of child nodes are typically fixed for a particular implementation.
Each node of a B-tree will contain a number of keys. Usually, the number of keys is chosen to vary between t-1 and 2t-1. In practice, the keys (elements) take up the most space in a node. The factor of 2 will guarantee that nodes can be split or combined. If a node has 2t-1 keys, then adding a key to that node can be accomplished by splitting the 2t-1 key node into two t-1 key nodes and adding the median (middle) key of the original node to the parent node. Each splitted node has the required minimum number of keys.
If a new element is added into a B-tree it will always be inserted into a leaf node while the median key of the original node is shifted up into the parent node when the original node is already full.
A B-tree is kept balanced by requiring that all leaf nodes are at the same depth. This depth will increase slowly as elements are added to the tree, but an increase in the overall depth is infrequent, and results in all leaf nodes being one more node further away from the root.
B-trees have substantial advantages over alternative implementations when node access times far exceed access times within nodes, because then the cost of accessing the node may be amortized over multiple operations within the node. This usually occurs when the nodes are in secondary storage such as disk drives. By maximizing the number of child nodes within each internal node, the height of the tree decreases and the number of expensive node accesses is reduced. In addition, rebalancing the tree occurs less often. The maximum number of child nodes depends on the information that must be stored for each child node and the size of a full disk block or an analogous size in secondary storage. Practical B-trees using secondary storage want a large number of child nodes to improve performance.

Java source code (including delete) (see also "Introduction to Algorithms", chapter 18):

public class BTree {
  class Node {
    public int mNumKeys = 0;
    public int[] mKeys = new int[2*T-1];
    public Object[] mObjects = new Object[2*T-1];
    public Node[] mChildNodes = new Node[2*T];
    public boolean mIsLeafNode;
  }

  private Node mRootNode;
  private static final int T = 4;

  public BTree() {
    mRootNode = new Node();
    mRootNode.mIsLeafNode = true;
  }

  public void add(int key, Object object) {
    Node rootNode = mRootNode;
    if (rootNode.mNumKeys == (2 * T - 1)) {
      Node newRootNode = new Node();
      mRootNode = newRootNode;
      newRootNode.mIsLeafNode = false;
      mRootNode.mChildNodes[0] = rootNode;
      // Split rootNode and move its median
      // key up into newRootNode.
      splitChildNode(newRootNode, 0, rootNode);
      // Insert the key into the B-Tree
      // with root newRootNode.
      insertIntoNonFullNode(newRootNode, key,
        object);
    } else {
      // Insert the key into the B-Tree
      // with root rootNode.
      insertIntoNonFullNode(rootNode, key, object);
    }
  }

  // Split the node, node, of a B-Tree into
  // two nodes that both contain T-1 elements
  // and move node's median key up
  // to the parentNode. This method will
  // only be called if node is full; node is the
  // i-th child of parentNode.
  void splitChildNode(Node parentNode, int i,
    Node node) {
    Node newNode = new Node();
    newNode.mIsLeafNode = node.mIsLeafNode;
    newNode.mNumKeys = T - 1;
    // Copy the last T-1 elements of node
    // into newNode.
    for (int j = 0; j < T - 1; j++) {
      newNode.mKeys[j] = node.mKeys[j + T];
      newNode.mObjects[j] = node.mObjects[j + T];
    }
    if (!newNode.mIsLeafNode) {
      // Copy the last T pointers of node
      // into newNode.
      for (int j = 0; j < T; j++) {
        newNode.mChildNodes[j] =
          node.mChildNodes[j + T];
      }
      for (int j = T; j <= node.mNumKeys; j++) {
        node.mChildNodes[j] = null;
      }
    }
    for (int j = T; j < node.mNumKeys; j++) {
      node.mKeys[j] = 0;
      node.mObjects[j] = null;
    }
    node.mNumKeys = T - 1;

    // Insert a (child) pointer to node newNode
    // into the parentNode, moving other keys
    // and pointers as necessary.
    for (int j = parentNode.mNumKeys; j >= i + 1;
      j--) {
      parentNode.mChildNodes[j + 1] =
        parentNode.mChildNodes[j];
    }
    parentNode.mChildNodes[i + 1] = newNode;
    for (int j = parentNode.mNumKeys - 1; j >= i;
        j--) {
      parentNode.mKeys[j + 1] =
        parentNode.mKeys[j];
      parentNode.mObjects[j + 1] =
        parentNode.mObjects[j];
    }
    parentNode.mKeys[i] = node.mKeys[T - 1];
    parentNode.mObjects[i] = node.mObjects[T - 1];
    node.mKeys[T - 1] = 0;
    node.mObjects[T - 1] = null;
    parentNode.mNumKeys++;
  }

  // Insert an element into a B-Tree. (The element
  // will ultimately be inserted into a leaf node).
  void insertIntoNonFullNode(Node node, int key,
    Object object) {
    int i = node.mNumKeys - 1;
    if (node.mIsLeafNode) {
      // Since node is not a full node insert the
      // new element into its proper place
      // within node.
      while (i >= 0 && key < node.mKeys[i]) {
        node.mKeys[i + 1] = node.mKeys[i];
        node.mObjects[i + 1] = node.mObjects[i];
        i--;
      }
      i++;
      node.mKeys[i] = key;
      node.mObjects[i] = object;
      node.mNumKeys++;
    } else {
      // Move back from the last key of node until
      // we find the child pointer to the node
      // that is the root node of the subtree
      // where the new element should be placed.
      while (i >= 0 && key < node.mKeys[i]) {
        i--;
      }
      i++;
      if (node.mChildNodes[i].mNumKeys ==
        (2 * T - 1)) {
        splitChildNode(node, i,
          node.mChildNodes[i]);
        if (key > node.mKeys[i]) {
          i++;
        }
      }
      insertIntoNonFullNode(node.mChildNodes[i],
        key, object);
    }
  }

  // Recursive search method.
  public Object search(Node node, int key) {
    int i = 0;
    while (i < node.mNumKeys && key >
      node.mKeys[i]) {
      i++;
    }
    if (i < node.mNumKeys && key ==
      node.mKeys[i]) {
      return node.mObjects[i];
    }
    if (node.mIsLeafNode) {
      return null;
    } else {
      return search(node.mChildNodes[i], key);
    }
  }

  public Object search(int key) {
    return search(mRootNode, key);
  }

  // Iterative search method.
  public Object search2(Node node, int key) {
    while (node != null) {
      int i = 0;
      while (i < node.mNumKeys && key >
        node.mKeys[i]) {
        i++;
      }
      if (i < node.mNumKeys && key ==
        node.mKeys[i]) {
        return node.mObjects[i];
      }
      if (node.mIsLeafNode) {
        return null;
      } else {
        node = node.mChildNodes[i];
      }
    }
    return null;
  }

  public Object search2(int key) {
    return search2(mRootNode, key);
  }
}

B+ tree or B plus tree is a type of balanced tree which represents sorted data in a way that allows for efficient insertion, retrieval and removal of records, each of which is identified by a key. It is a dynamic, multilevel index, with maximum and minimum bounds on the number of keys in each index segment (usually called a "block" or "node"). In a B+ tree, in contrast to a B-tree, all records are stored at the leaf level of the tree; only keys are stored in interior nodes.
The primary value of a B+ tree is in storing data for efficient retrieval in a block-oriented storage context—in particular, filesystems. This is primarily because unlike binary search trees, B+ trees have a very high fanout (typically on the order of 100 or more), which reduces the number of I/O operations required to find an element in the tree.
The NTFS, XFS, and JFS filesystems all use this type of tree for metadata indexing. Relational database management systems such as IBM DB2, Microsoft SQL Server, MySQL and SQLite support this type of tree for table indices. Key-value database management systems such as CouchDB support this type of tree for data access.
The leaves of the B+ tree are often linked to one another in a linked list; this makes range queries or an (ordered) iteration through the blocks simpler and more efficient.

Java source code (for differences to the B-tree data structure take a look at the splitChildNode method):

public class BPlusTree {
  class Node {
    public int mNumKeys = 0;
    public int[] mKeys = new int[2*T-1];
    public Object[] mObjects = new Object[2*T-1];
    public Node[] mChildNodes = new Node[2*T];
    public boolean mIsLeafNode;
    public Node mNextNode;
  }

  private Node mRootNode;
  private static final int T = 4;

  public BPlusTree() {
    mRootNode = new Node();
    mRootNode.mIsLeafNode = true;
  }

  public void add(int key, Object object) {
    Node rootNode = mRootNode;
    if (rootNode.mNumKeys == (2 * T - 1)) {
      Node newRootNode = new Node();
      mRootNode = newRootNode;
      newRootNode.mIsLeafNode = false;
      mRootNode.mChildNodes[0] = rootNode;
      // Split rootNode and move its median
      // key up into newRootNode.
      splitChildNode(newRootNode, 0, rootNode);
      // Insert the key into the B+-Tree
      // with root newRootNode.
      insertIntoNonFullNode(newRootNode, key,
        object);
    } else {
      // Insert the key into the B+-Tree
      // with root rootNode.
      insertIntoNonFullNode(rootNode, key, object);
    }
  }

  // Split the node, node, of a B+-Tree into two
  // nodes that contain T-1 (and T) elements
  // and move node's median key up to the
  // parentNode.
  // This method will only be called if node is full;
  // node is the i-th child of parentNode.
  // All internal keys (elements) will have
  // duplicates within the leaf nodes.
  void splitChildNode(Node parentNode, int i,
    Node node) {
    Node newNode = new Node();
    newNode.mIsLeafNode = node.mIsLeafNode;
    newNode.mNumKeys = T;
    // Copy the last T elements of node
    // into newNode. Keep the median key
    // as duplicate in the first key of newNode.
    for (int j = 0; j < T; j++) {
      newNode.mKeys[j] = node.mKeys[j + T - 1];
      newNode.mObjects[j] = node.mObjects[j + T - 1];
    }
    if (!newNode.mIsLeafNode) {
      // Copy the last T + 1 pointers of node
      // into newNode.
      for (int j = 0; j < T + 1; j++) {
        newNode.mChildNodes[j] =
          node.mChildNodes[j + T - 1];
      }
      for (int j = T; j <= node.mNumKeys; j++) {
        node.mChildNodes[j] = null;
      }
    } else {
      // Manage the linked list that is used e.g.
      // for doing fast range queries.
      newNode.mNextNode = node.mNextNode;
      node.mNextNode = newNode;
    }
    for (int j = T - 1; j < node.mNumKeys; j++) {
      node.mKeys[j] = 0;
      node.mObjects[j] = null;
    }
    node.mNumKeys = T - 1;

    // Insert a (child) pointer to node newNode
    // into the parentNode, moving other keys
    // and pointers as necessary.
    for (int j = parentNode.mNumKeys; j >= i + 1;
      j--) {
      parentNode.mChildNodes[j + 1] =
        parentNode.mChildNodes[j];
    }
    parentNode.mChildNodes[i + 1] = newNode;
    for (int j = parentNode.mNumKeys - 1; j >= i;
        j--) {
      parentNode.mKeys[j + 1] =
        parentNode.mKeys[j];
      parentNode.mObjects[j + 1] =
        parentNode.mObjects[j];
    }
    parentNode.mKeys[i] = newNode.mKeys[0];
    parentNode.mObjects[i] = newNode.mObjects[0];
    parentNode.mNumKeys++;
  }

  // Insert an element into a B-Tree. (The element
  // will ultimately be inserted into a leaf node).
  void insertIntoNonFullNode(Node node, int key,
    Object object) {
    int i = node.mNumKeys - 1;
    if (node.mIsLeafNode) {
      // Since node is not a full node insert the
      // new element into its proper place
      // within node.
      while (i >= 0 && key < node.mKeys[i]) {
        node.mKeys[i + 1] = node.mKeys[i];
        node.mObjects[i + 1] = node.mObjects[i];
        i--;
      }
      i++;
      node.mKeys[i] = key;
      node.mObjects[i] = object;
      node.mNumKeys++;
    } else {
      // Move back from the last key of node until
      // we find the child pointer to the node
      // that is the root node of the subtree
      // where the new element should be placed.
      while (i >= 0 && key < node.mKeys[i]) {
        i--;
      }
      i++;
      if (node.mChildNodes[i].mNumKeys ==
        (2 * T - 1)) {
        splitChildNode(node, i,
          node.mChildNodes[i]);
        if (key > node.mKeys[i]) {
          i++;
        }
      }
      insertIntoNonFullNode(node.mChildNodes[i],
        key, object);
    }
  }

  // Recursive search method.
  public Object search(Node node, int key) {
    int i = 0;
    while (i < node.mNumKeys && key >
      node.mKeys[i]) {
      i++;
    }
    if (i < node.mNumKeys && key ==
      node.mKeys[i]) {
      return node.mObjects[i];
    }
    if (node.mIsLeafNode) {
      return null;
    } else {
      return search(node.mChildNodes[i], key);
    }
  }

  public Object search(int key) {
    return search(mRootNode, key);
  }

  // Iterative search method.
  public Object search2(Node node, int key) {
    while (node != null) {
      int i = 0;
      while (i < node.mNumKeys && key >
        node.mKeys[i]) {
        i++;
      }
      if (i < node.mNumKeys && key ==
        node.mKeys[i]) {
        return node.mObjects[i];
      }
      if (node.mIsLeafNode) {
        return null;
      } else {
        node = node.mChildNodes[i];
      }
    }
    return null;
  }

  public Object search2(int key) {
    return search2(mRootNode, key);
  }

  // Inorder walk over the tree.
  public String toString() {
    String string = "";
    Node node = mRootNode;  
    while (!node.mIsLeafNode) {   
      node = node.mChildNodes[0];
    }  
    while (node != null) {
      for (int i = 0; i < node.mNumKeys; i++) {
        string += node.mObjects[i] + ", ";
      }
      node = node.mNextNode;
    }
    return string;
  }
 
  // Inorder walk over parts of the tree.
  public String toString(int fromKey, int toKey) {
    String string = "";
    Node node = getLeafNodeForKey(fromKey);
    while (node != null) {
      for (int j = 0; j < node.mNumKeys; j++) {
        string += node.mObjects[j] + ", ";
        if (node.mKeys[j] == toKey) {
          return string;
        }
      }
      node = node.mNextNode;
    }
    return string;
  }
 
  Node getLeafNodeForKey(int key) {
    Node node = mRootNode;
    while (node != null) {
      int i = 0;
      while (i < node.mNumKeys &&
        key > node.mKeys[i]) {
        i++;
      }
      if (i < node.mNumKeys &&
        key == node.mKeys[i]) {
        node = node.mChildNodes[i + 1];
        while (!node.mIsLeafNode) {   
          node = node.mChildNodes[0];
        }
        return node;
      }
      if (node.mIsLeafNode) {
        return null;
      } else {
        node = node.mChildNodes[i];
      }
    }
    return null;
  }
}

Books about algorithms and data structures:

The Algorithm Design Manual by Steven S. Skiena.
Introduction to Algorithms by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein.

The source code of the examples above can be found here in the blog repository.

A lot of texts have been taken from Wikipedia, the free encyclopedia.

Tuesday, November 1, 2011

Algorithms and Data Structures - Part 2

Topics:

Heap
Graphs
Dynamic programming
Memoization
Greedy algorithms
String matching algorithms
See also Algorithms and Data Structures - Part 1

Data Structures
Heap is always a perfectly-balanced fully-populated tree data structure that satisfies the heap property: If B is a child node of A, then key(A) ≥ key(B). This implies that an element with the greatest key is always in the root node, and so such a heap is sometimes called a max-heap. (Alternatively, if the comparison is reversed, the smallest element is always in the root node, which results in a min-heap.) There is no restriction as to how many children each node has in a heap, although in practice each node has at most two. The heap is one maximally-efficient implementation of an abstract data type called a priority queue with a worst-case performance of O(log n) for adding and removing elements and O(1) for getting the maximum or minimum element. Heaps are crucial in several efficient graph algorithms such as Dijkstra's algorithm, and in the sorting algorithm Heapsort.
A heap data structure should not be confused with the heap which is a common name for dynamically allocated memory. The term was originally used only for the data structure. Some early popular languages such as LISP provided dynamic memory allocation using heap data structures, which gave the memory area its name.

The binary tree data structure of the Heap in the following Java source code is organized as an array (see also "Introduction to Algorithms", chapter 6):

public class BinaryHeap {
    private Integer[] mArray;
    private int mHeapSize;
 
    public BinaryHeap(int maxSize) {
        mArray = new Integer[maxSize];
        mHeapSize = 0;
    }
 
    int parentIndex(int i) {
        return (i + 1) / 2 - 1;
    }
 
    int leftChildIndex(int i) {
        return 2 * i + 1;
    }
 
    int rightChildIndex(int i) {
        return 2 * i + 2;
    }
 
    // When maxHeapify is called, it is assumed that
    // the binary tree rooted at leftChildIndex(i)
    // and rightChildIndex(i) are max-heaps.
    // Worst-case performance: O(log n).
    void maxHeapify(int i) {
        int leftChildIndex = leftChildIndex(i);
        int rightChildIndex = rightChildIndex(i);
        int largestElementIndex;
        if (leftChildIndex < mHeapSize &&
            mArray[leftChildIndex] > mArray[i]) {
            largestElementIndex = leftChildIndex;
        } else {
            largestElementIndex = i;
        }
        if (rightChildIndex < mHeapSize &&
            mArray[rightChildIndex] >
            mArray[largestElementIndex]) {
            largestElementIndex = rightChildIndex;
        }
        if (largestElementIndex != i) {
            int tmpValue = mArray[i];
            mArray[i] = mArray[largestElementIndex];
            mArray[largestElementIndex] = tmpValue;
            maxHeapify(largestElementIndex);
        }
    }
 
    void buildMaxHeap() {
        int heapSize = mArray.length;
        for (int i = heapSize / 2; i >= 0; i--) {
            maxHeapify(i);
        }
    } 
 
    public int max() {
        return mArray[0];
    }
 
    // Worst-case performance: O(log n).
    public int extractMax() {
        int max = mArray[0];
        mArray[0] = mArray[mHeapSize - 1];
        mArray[mHeapSize - 1] = null;
        mHeapSize--;
        maxHeapify(0);
        return max;
    }
 
    // Worst-case performance: O(log n).
    void increaseKey(int i, int newValue)
        throws Exception {
        if (newValue < mArray[i]) {
            throw new Exception("New value is smaller
                than current value");
        }
        mArray[i] = newValue;
        while (i > 0 && mArray[parentIndex(i)] <
            mArray[i]) {
            int tmpValue = mArray[parentIndex(i)];
            mArray[parentIndex(i)] = mArray[i];
            mArray[i] = tmpValue;
            i = parentIndex(i);
        }  
    }
 
    // Worst-case performance: O(log n).
    public boolean insert(int value) {
        if (mHeapSize < mArray.length) {
            mHeapSize++;
            mArray[mHeapSize - 1] = value;
            try {
                increaseKey(mHeapSize - 1, value);
            } catch (Exception e) {
                return false;
            }
            return true;
        } else {
            return false;
        }
    }
 
    public int size() {
        return mHeapSize;
    }
}

Graphs
A graph data structure consists of a finite (and possibly mutable) set of ordered pairs, called edges, arcs or links, of certain entities called nodes or vertices. As in mathematics, an edge (x,y) is said to point or go from x to y. The nodes may be part of the graph structure, or may be external entities represented by integer indices or references. A graph data structure may also associate to each edge some edge value, such as a symbolic label or a numeric attribute (cost, capacity, length, etc.).
Graph algorithms are a significant field of interest within computer science. Typical higher-level operations associated with graphs are: finding a path between two nodes, like depth-first search and breadth-first search and finding the shortest path from one node to another, like Dijkstra's algorithm.
Graphs are normally represented as adjacency lists or by an adjacency matrix.
Adjacency list (which I used in my examples) are a set of vertices that are stored as records or objects, and every vertex stores a list of adjacent vertices. This data structure allows the storage of additional data on the vertices.
An adjacency matrix is a two-dimensional matrix, in which the rows represent source vertices and columns represent destination vertices. Data on edges and vertices must be stored externally. Only the cost for one edge can be stored between each pair of vertices.

Java source code for breadth-first search [O(|V| + |E|)]. See also "Introduction to Algorithms", chapter 22):

public Dictionary<Node, NodeAttributes>
  breadthFirstSearch(Node s) {
  Hashtable<Node, NodeAttributes> nodeAttributes =
    new Hashtable<Node, NodeAttributes>();
  
  for (Node u : mNodes) {   
    NodeAttributes attributes = new NodeAttributes();
    attributes.color = NodeAttributes.WHITE;
    attributes.distance = Integer.MAX_VALUE;
    attributes.predecessor = null;
    nodeAttributes.put(u, attributes);
  }
  NodeAttributes sAttributes = nodeAttributes.get(s);
  sAttributes.color = NodeAttributes.GRAY;
  sAttributes.distance = 0;
  sAttributes.predecessor = null;
  Queue<Node> queue = new ArrayDeque<Node>();
  queue.add(s);
  while (!queue.isEmpty()) {
    Node u = queue.poll();
    NodeAttributes uAttributes =
      nodeAttributes.get(u);
    for (Node v : u.getLinks()) {
      NodeAttributes vAttributes =
        nodeAttributes.get(v);
      if (vAttributes.color == NodeAttributes.WHITE) {
        vAttributes.color = NodeAttributes.GRAY;
        vAttributes.distance =
          uAttributes.distance + 1;
        vAttributes.predecessor = u;
        queue.add(v);
      }
    }
    Attributes.color = NodeAttributes.BLACK;
  }
  
  return nodeAttributes;
}

Java source code for depth-first search [O(|V| + |E|)]. For more details see also "Introduction to Algorithms", chapter 22.

public Dictionary<Node, NodeAttributes>
  depthFirstSearch() {
  Hashtable<Node, NodeAttributes> nodeAttributes =
    new Hashtable<Node, NodeAttributes>();
  
  for (Node u : mNodes) {   
    NodeAttributes attributes = new NodeAttributes();
    attributes.color = NodeAttributes.WHITE;
    attributes.predecessor = null;
    nodeAttributes.put(u, attributes);
  }
  mTime = 0;
  for (Node u : mNodes) {
    if (nodeAttributes.get(u).color ==
      NodeAttributes.WHITE) {
      dfsVisit(u, nodeAttributes);
    }
  }
  return nodeAttributes;
}
 
private void dfsVisit(Node u,
  Dictionary<Node, NodeAttributes> nodeAttributes) {
  NodeAttributes uAttributes = nodeAttributes.get(u);
  uAttributes.color = NodeAttributes.GRAY; 
  mTime++;
  uAttributes.startTime = mTime;
  
  for (Node v : u.getLinks()) {
    NodeAttributes vAttributes =
      nodeAttributes.get(v);
    if (vAttributes.color == NodeAttributes.WHITE) {
      vAttributes.predecessor = u;
      dfsVisit(v, nodeAttributes);
    }
  }
  
  uAttributes.color = NodeAttributes.BLACK;
  mTime++;
  uAttributes.finishTime = mTime;
  mTopologicalOrdering.addFirst(u);
}

What is also very interesting is the topological ordering of a directed graph. A topological sort creates a dependency graph. That's very useful e.g. if you want to implement your own dependency injection container that automatically resolves dependencies between objects to create them in the right order.
The topological sort is just a depth-first search within a graph that constructs a linked-list of dependencies. The Java source code containing topological sort is available here in my blog repository. The Java source code also provides an example of Dijkstra's algorithm.

Algorithms
Dynamic programming
Dynamic programming, like the divide-and-conquer method, solves problems by combining the solutions to subproblems. Divide-and-conquer algorithms partition the problem into independent subproblems, solve the subproblems recursively, and then combine their solutions to solve the original problem. In contrast, dynamic programming is applicable when the subproblems are not independent.
A dynamic programming algorithm solves every subproblem just once and then saves its answers in a table, thereby avoiding the work of recomputing the answer every time the subproblem is encountered.
Dynamic programming is typically applied to optimization problems. In such problems there can be many possible solutions. Each solution has a value, and we wish to find a solution with the optimal (minimum or maximum) value.
Top-down dynamic programming simply means storing the results of certain calculations, which are later used again since the completed calculation is a sub-problem of a larger calculation. Bottom-up dynamic programming involves formulating a complex calculation as a recursive series of simpler calculations.

One of the computer science areas where dynamic programming is used a lot is Bioinformatics.
An example is the longest common subsequence algorithm that is used to compare DNA strands. A DNA strand consists of a string of molecules called bases, where the possible bases are adenine, guanine, cytosine and thymine. The goal of comparing two DNA strands is to determine how similar the two strands are, as some measure of how closely related the two organisms are. So this is no exact substring method but one that allows for gaps while comparing the two input DNA strands.

Java source code for the longest common subsequence problem (see also "Introduction to Algorithms", chapter 15)

public class LongestCommonSubsequence {
  char[] mSeqA;
  char[] mSeqB;
  int[][] mC;
  int[][] mB;
  private static int UP = 0;
  private static int DIAG = 1;
  private static int LEFT = 2;
 
  public LongestCommonSubsequence(char[] seqA,
    char[] seqB) {
    mSeqA = new char[seqA.length + 1];
    for (int i = 0; i < seqA.length; i++) {
      mSeqA[i+1] = seqA[i];
    }  
    mSeqB = new char[seqB.length + 1];
    for (int i = 0; i < seqB.length; i++) {
      mSeqB[i+1] = seqB[i]; 
    }
    mC = new int[mSeqA.length][mSeqB.length];
    mB = new int[mSeqA.length][mSeqB.length];
    for (int i = 0; i < mSeqA.length; i++) {
      mC[i][0] = 0;
    }  
    for (int j = 0; j < mSeqB.length; j++) {
      mC[0][j] = 0;
    }
  }
 
  // O(m + n) -> O(mSeqA.length + mSeqB.length)
  public void lcsLength() {
    for (int i = 1; i < mSeqA.length; i++) {
      for (int j = 1; j < mSeqB.length; j++) {
        if (mSeqA[i] == mSeqB[j]) {
          mC[i][j] = mC[i-1][j-1] + 1;
          mB[i][j] = DIAG;
        } else if (mC[i-1][j] >= mC[i][j-1]) {
          mC[i][j] = mC[i-1][j];
          mB[i][j] = UP;
        } else {
          mC[i][j] = mC[i][j-1];
          mB[i][j] = LEFT;
        }
      }
    }
  }
 
  public void backtrack() {
    backtrack(mSeqA.length - 1, mSeqB.length - 1);
  }
 
  void backtrack(int i, int j) {
    if (i == 0 || j == 0) {
      return;
    }
    if (mB[i][j] == DIAG) {
      backtrack(i-1, j-1);
      System.out.print(mSeqA[i] + ", ");
    } else if (mB[i][j] == UP) {
      backtrack(i-1, j);   
    } else {
      backtrack(i, j-1);
    }
  }
}

There are also protein sequence comparison methods in Bioinformatics that are more sophisticated than the longest common subsequence, like the Needleman-Wunsch algorithm, the Smith-Waterman algorithm and HHsearch. My blog repository currently contains implementations of the Needleman-Wunsch algorithm and the Smith-Waterman algorithm.

Memoization is a variation of dynamic programming. A memoized recursive algorithm maintains an entry in a table for the solution to each subproblem. When a subproblem is first encountered during the execution of the recursive algorithm, its solution is computed an then stored in a table. Each subsequent time that the subproblem is encountered, the value stored in the table is simply looked up and returned.

Greedy algorithms always make the choice that looks best at the moment. That is, it makes a locally optimal choice in the hope that this choice will lead to a globally optimal solution.
For example, applying the greedy strategy to the traveling salesman problem yields the following algorithm: "At each stage visit the unvisited city nearest to the current city".
For many optimization problems, using dynamic programming to determine the best choice is overkill, often a more efficient greedy algorithm will do. Greedy algorithms do not always yield optimal solutions, but for many problems they do.
The following problem is about activity scheduling where several competing activities require exclusive use of a common resource. The goal is to select the maximum number of mutually compatible activities. The greedy algorithm solves the problem by sorting the activities by their finish time and then go once through the sorted list of activities and select the next possible activity based on the start time.

Java source code for the activity selector problem (see also "Introduction to Algorithms", chapter 16)

public class ActivitySelector {
  private List<Activity> mActivities =
    new ArrayList<Activity>();
 
  public static class Activity implements 
    Comparable<Activity> {
    private int mStartTime;
    private int mFinishTime;
  
    public Activity(int startTime, int finishTime) {
      mStartTime = startTime;
      mFinishTime = finishTime;
    }
  
    public int getStartTime() {
      return mStartTime;
    }
  
    public int getFinishTime() {
      return mFinishTime;
    }

    @Override
    public int compareTo(Activity activity) {
      if (this.mFinishTime == activity.mFinishTime) {
        return 0;
      } else if (this.mFinishTime >
        activity.mFinishTime) {
        return 1;
      } else {
        return -1;
      }
    }
  }

  public void addActivity(Activity activity) {
    mActivities.add(activity);
  }
 
  public List<Activity> greedyActivitySelector() {
    Collections.sort(mActivities);
    List<Activity> activities =
      new ArrayList<Activity>();
    activities.add(mActivities.get(0));
    int i = 0;
    for (int m = 1; m < mActivities.size(); m++) {
      if (mActivities.get(m).getStartTime() >=
        mActivities.get(i).mFinishTime) {
        activities.add(mActivities.get(m));
        i = m;
      }
    }
    return activities;
  }
}

Greedy algorithms versus dynamic programming
The 0-1 knapsack problem is the best way to compare the greedy strategy and dynamic programming.
A thief robbing a store finds n items; the i-th item is worth vi dollars and weighs wi pounds, where vi and wi are integers. He wants to take as valuable a load as possible, but he can carry at most W pounds in his knapsack for some integer W. Which items should he take? (This is called the 0-1 knapsack problem because each item must either be taken or left behind; the thief cannot take a fractional amount of an item.)
While the 0-1 knapsack problem is only solvable using dynamic programming but not with a greedy algorithm, the fractional knapsack problem is also solvable using a greedy strategy.
To solve the fractional knapsack problem, you first have to compute the value per pound for each item. The thief then begins by taking as much as possible of the item with the greatest value per pound. If the supply of that item is exhausted, the thief takes as much as possible of the item with the next greatest value per pound and so forth.

String matching algorithms
See this tutorial with great explanations and examples by Christian Charras and Thierry Lecroq.

Further reading

Introduction to Algorithms by Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein.
The Algorithm Design Manual by Steven S. Skiena.
Data Mining by Ian H. Witten, Eibe Frank and Mark A. Hall.
Statistik von Kopf bis Fuß von Dawn Griffiths und Jörg Beyer

A lot of texts have been taken from Wikipedia, the free encyclopedia.

Thursday, August 25, 2011

Virtual Machines

One of the best resources about virtual machines (both high-level language VMs and system VMs) is Jim Smith's and Ravi Nair's book Virtual Machines: Versatile Platforms for Systems and Processes.

The TinyVM is a small, fast and lightweight virtual machine written in pure ANSI C. The source code of this toy virtual machine can easily be read and understood in just a few minutes.
The create_vm (tvm.c) function creates a virtual machine instance by allocating its memory and stack. It also parses a TinyVM program that contains TinyVM assembly language opcodes from a text file. The interpret_program (tvm_program.c) function reads the TinyVM program into a list of instructions and into a further list containing the arguments for all instructions.
The TinyVM has some registers which are mapped to memory (see field registers in struct tvm_memory_t of file tvm_memory.h).
After parsing a program the TinyVM runs it using run_vm (tvm.c). This function contains the interpreter loop of the virtual machine which executes the program by interpreting the list of instructions.

Some more interesting resources about virtual machines:

Virtual Machine Showdown: Stack Versus Registers: Virtual machine architectures and performance considerations.
The evolution of the Erlang VM: The BEAM Erlang VM replaces a lot of traditional operating system concepts. E.g. Erlang provides no threads. Instead it allows for up to millions of concurrent Erlang processes. The Erlang virtual machine schedules this processes using a reduction count scheduling strategy (see erlang:bump_reductions). To make the best use of the underlying CPU capabilities the Erlang VM creates one OS thread on each available CPU core. The Erlang VM scheduler then runs all its Erlang processes on those OS threads. For more details about the Erlang programming language take a look into Joe Armstrong's Erlang book.
Google Android's Dalvik VM: Internals about Android's virtual machine.
The History and Future of the CLR by Ian Carmichael: Microsoft's .NET CLR (Common Language Runtime) VM.
Inside V8 - A Javascript Virtual Machine: The JavaScript VM powering Google's Chrome browser.

Sunday, August 21, 2011

Android messaging and concurrency (for native code development)

Android's messaging and concurrency framework (together with the Binder IPC mechanism) forms the basis of all Android applications and services. The messaging and concurrency framework is mainly based on the Thread, Looper, Message, MessageQueue and Handler classes. For convenience there is also the AsyncTask class and for inter-process communication there are some other classes like Binder, Parcel and Messenger.

A Looper is used to run a message loop for a thread. Threads by default do not have a message loop associated with them; to create one, call Looper.prepare() in the thread that is to run the loop, and then Looper.loop() to have it process messages until the loop is stopped. Most interaction with a message loop is through the Handler class. A Handler allows you to send and process Message and Runnable objects associated with a thread's MessageQueue. Each Handler instance is associated with a single thread and that thread's message queue. When you create a new Handler, it is bound to the thread / message queue of the thread that is creating it - from that point on, it will deliver messages and runnables to that message queue and execute them as they come out of the message queue. Furthermore, one can bind as many Handlers as he or she wants to a single thread / message queue for message handling.
There are two main uses for a Handler. First a Handler allows you to enqueue an action to be performed on a different thread than your own and furthermore it also enables you to schedule messages and runnables to be executed at some point in the future.
The AsyncTask class enables proper and easy use of concurrency in a multithread environment. For more information about this class see Google's "Painless Threading" article. (Paragraph source: Google Android Developer Reference).

Since the Android messaging and concurrency classes are only available to Java developers, I rewrote (parts of) them in C++ for native code development. The native Android messaging and concurrency framework is available here under the Apache 2.0 license. The C++ classes not only work for Android but also for Linux and QNX Neutrino RTOS. I haven't done a Windows port up till now, but it should be pretty easy to do so (e.g. using Win32 Events and the WaitForSingleObject function).
For native code development I also added closures (from Google's protocol buffers project) and delegates. Here are some examples about how to use the native C++ classes:

Basic Android messaging and concurrency example:

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include "android/os/Looper.h"
#include "android/os/Handler.h"
#include "android/os/Thread.h"
#include "android/os/Message.h"
#include "android/os/CondVar.h"
#include "android/os/Closure.h"

using namespace android::os;

template<class T /*extends Handler*/>
class LooperThread :
    public Thread
{
public:
    LooperThread() :
        mLooper(NULL),
        mHandler(NULL),
        mCondVar(mLock),
        mIsDone(false) {  
    }

    virtual ~LooperThread() {
    }

    virtual void run() {
        Looper::prepare();
        mLock.lock();
        mLooper = Looper::myLooper();
        mHandler = new T();
        mCondVar.notifyAll();
        mLock.unlock();
        Looper::loop();
        mLock.lock();
        mIsDone = true;
        mHandler->removeCallbacksAndMessages();        
        mLooper = NULL;
        mHandler = NULL;
        mLock.unlock();
    }

    Looper* getLooper() {
        AutoLock autoLock(mLock);
        if (!mIsDone && mLooper == NULL) {
            mCondVar.wait();
        }
        return mLooper;
    }

    sp<T> getHandler() {
        AutoLock autoLock(mLock);
        if (!mIsDone && mHandler == NULL) {
            mCondVar.wait();
        }
        return mHandler;
    }

private:
    Looper* mLooper;
    sp<T> mHandler;
    Lock mLock;
    CondVar mCondVar;
    bool mIsDone;
};

class ExampleHandler : public Handler
{
public:
    virtual void handleMessage(const sp<Message>& message) {
        printf("ExampleHandler::handleMessage with"
            " ID %d by Looper %p\n",
            message->what, Looper::myLooper());
    }
};

class Test
{
public:
    void test(int32_t value) {
        printf("Test::test() called with value"
            " %d\n", value);
    }
};

static LooperThread<ExampleHandler> sLooperThread;
static Test sTest;

int main() {
    sLooperThread.start();
    sp<Handler> handler = sLooperThread.getHandler();

    handler->obtainMessage(1234)->sendToTarget();
    handler->postDelayed(newRunnable(sTest,
        &Test::test, 42), 1000);

    Thread::sleep(2000);

    sLooperThread.getLooper()->quit();
    sLooperThread.join();

    return 0;
}

AsyncTask example:

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include "android/os/Looper.h"
#include "android/os/Handler.h"
#include "android/os/Thread.h"
#include "android/os/Message.h"
#include "android/os/Closure.h"
#include "android/os/LooperThread.h"
#include "android/os/AsyncTask.h"

using namespace android::os;

class ExampleAsyncTask :
    public AsyncTask<int32_t, int32_t, int32_t>
{
public:
    virtual void onPreExecute() {
        printf("ExampleAsyncTask::onPreExecute"
            " on thread %d\n",
            (int32_t) pthread_self());
    }

    virtual int32_t doInBackground(int32_t param) {
        printf("ExampleAsyncTask::doInBackground"
            " on thread %d with param %d\n",
            (int32_t) pthread_self(), param);
        Thread::sleep(250);
        int32_t sum = 0;
        int32_t i;
        for (i = 0; i < param / 2; i++) {
            sum++;
        }
        publishProgress(sum);
        Thread::sleep(250);
        for (; i < param; i++) {
            sum++;
        }
        return sum;
    }

    virtual void onProgressUpdate(int32_t value) {
        printf("ExampleAsyncTask::onProgressUpdate"
            " on thread %d with value %d\n",
            (int32_t) pthread_self(), value);
    }

    virtual void onPostExecute(int32_t result) {
        printf("ExampleAsyncTask::onPostExecute"
            " on thread %d with result %d\n",
            (int32_t) pthread_self(), result);
    }

    virtual void onCancelled() {
        printf("ExampleAsyncTask::onCancelled"
            " on thread %d\n",
            (int32_t) pthread_self());
    }
};

class AsyncTaskStarter : public Runnable
{
    virtual void run() {
        // AsyncTasks must be executed by a
        // Looper thread because of the
        // callback message handling
        ExampleAsyncTask* at1 =
            new ExampleAsyncTask();
        at1->execute(1234567);

        ExampleAsyncTask* at2 =
            new ExampleAsyncTask();
        at2->executeOnExecutor(AsyncTaskBase::
            THREAD_POOL_EXECUTOR, 123);        
    }
};

static LooperThread<Handler> sLooperThread;

int main() {
    sLooperThread.start();
    sp<Handler> handler = sLooperThread.getHandler();

    handler->post(new AsyncTaskStarter());

    Thread::sleep(2000);

    sLooperThread.getLooper()->quit();
    sLooperThread.join();

    return 0;
}

Click here to see some more sample code that demonstrates the native Android messaging and concurrency framework usage.

Here is the link to the repository: Android messaging and concurrency framework for native code development

Here is the Android messaging and concurrency framework with automatic reference counting.

Here is another link to some Android native Binder IPC example code.

Update:
The Android messaging and concurrency framework is now named Mindroid and is hosted on GitHub.

Sunday, July 17, 2011

Build your own operating system

Ten years ago (around the year 2001) I wrote a simple x86 protected-mode OS just to find out how operating systems work. That was really great fun and I learned very much about OS concepts and paradigms (and also a lot of stuff about computer hardware, programming, software architecture, data structures etc.). Ever since the gathered experience and knowledge by building my own operating system has been extremely valuable.

I began by reading Andrew Tanenbaum's book "Operating Systems: Design and Implementation" during which I studied a lot of Minix source code. Later I also read his other books about modern operating systems and structured computer organization.
To get started with this challenging task, one needs some profound knowledge of the memory-management-unit (MMU) of the x86 CPU and its protected mode. To me, the best known tutorial on the web is the "Protected Mode Tutorial" by Jens Hohmuth of the Westsächsische Hochschule Zwickau. Sadly, this tutorial is only available in german. If you do not speak german you may try to translate it with Google Translate. The best parts of the tutorial are its great assembly language examples for Borland's TASM and Microsoft's MASM. I typed in all the examples by hand and played with them until I understood every single line of code :-). What is also helpful to understand the MMU and protected-mode of the x86 CPU is Intels Architecture Software Developer’s Manual and Hans-Peter Messmer's book The Indispensable PC Hardware Book.

The protected-mode tutorial starts by explaining how virtual addresses are mapped to physical addresses using the x86 MMU's two-level page table. Thus one will understand how a modern OS implements the process address spaces and how it protects processes from each other. You should also get the idea about how the OS lays out executables and shared libraries within the process address space and how it creates shared memory regions.
Two core abstractions of every modern OS are processes and threads. The multitasking chapter of the tutorial will give you some insights on how an OS may implement them for the x86 CPU. Therefor, the examples of the tutorial use segment descriptors and task state segments. The Global Descriptor Table (GDT) contains segment descriptors for the code, data and stack segments of a process. A Task State Segment (TSS) is used by the examples to implement a process (with one thread) on the x86 CPU. Each process or thread has a context which includes the code, data and stack segments, its program counter, a reference to its page table, etc.
Be aware, that modern OSes do not use disjoint segments anymore. They use a flat memory model where all x86 segments refer to the same virtual address space. The virtual address space is build using the MMU's two-level page table as explained above. Also be aware that e.g. Linux or Windows do not do hardware context switches using different task state segments. Instead they just create one TSS per CPU and do the switching stuff by themselves (see [1] and [2]). But if you understand the tutorial examples you will also get what Linux and Windows are doing and how they are doing it.
A modern OS kernel securely offers its features to user space programs via the system call interface. Since the system calls are the only valid entry points into the kernel for user space programs it is only possible to call them using special CPU features. Depending on the CPU architecture this may be done by call gates, special assembler instructions like syscall, or software interrupts.
Monolithic operating systems usually have a lot of system calls to offer a bunch of features to user mode programs while microkernel operating systems normally only have a minimal set of system calls. Hence, microkernel operating systems implement a lot of features by supplementary user mode OS services and provide them to user mode programs using inter-process communication (IPC) mechanisms.
Interrupts and exceptions are managed and handled on the x86 CPU using the Interrupt Descriptor Table (IDT). E.g. the timer interrupt triggers the scheduler from time to time to switch from one process to another to achieve true multitasking. Another example is the page-fault exception which is the starting point for the paging mechanism available in every modern desktop or server OS. This exception is risen every time a program tries to access a memory page that is currently swapped out to disk. The OS handles it by mapping the missing memory page back into the process address space before it executes the process again.

The protected-mode tutorial closes with a really great multitasking example where four processes draw pretty cool animations onto the display in VGA graphics mode (click here for the source code). Here is how this example x86 protected-mode OS looks like by running it from the MS-DOS mode of Windows 98 SE:

With the profound knowledge about the x86 protected-mode you should be able to start writing your own hobby OS or to understand what is going on under the hood of a Linux, Mac OS X, Windows or QNX Neutrino operating system.
I have put all the source code from the protected-mode tutorial and a simple boot sector (bootloader) program into my blog repository.
Modern operating systems offer lots of other features we haven't touched yet like dynamic memory management, dynamic linking, file systems, networking protocol stacks, etc. But once you have understood and implemented the core OS concepts you could evolve your OS step by step or you may start developing code for one of the well-known operating systems.

To dig deep into modern OS design and architecture I suggest the QNX Neutrino OS Architecture Guide. It provides a lot of insights into microkernel operating systems, IPC, the process manager, dynamic linking, resource managers and device drivers, networking, priority inheritance and much more interesting stuff. You may also take a look at Microsoft's Singularity research OS.
If you want to get familiar with the Linux operating system I recommend the books Professional Linux Kernel Architecture by Wolfgang Mauerer and Linux Kernel Development by Robert Love.
For Mac OS X enthusiasts I would suggest Mac OS X Internals by Amit Singh. People who are interested in Windows might like Windows Internals from Mark Russinovich and David A. Solomon.
You should also take a look at Google's Android operating system. It has a neat and scalable architecture where a monolithic Linux kernel is extended by a really nice inter-process communication (Binder IPC) and software component framework. Above the Linux kernel Google's Android looks very much like a microkernel OS that is similar to the QNX Neutrino RTOS. Another microkernel OS that is very interesting is Miray's Symobi. I worked at Miray Software on the Symobi project some years ago where I build the networking protocol stacks and the USB device management OS services from scratch. So I really like the Symobi OS which is similar to the QNX Neutrino RTOS.

Oh, by the way you now also know how the two OS security concepts Data Execution Prevention (DEP) and Address Space Layout Randomization (ASLR) work. The DEP is a feature of the x86 MMU where each page table entry has an additional NX bit, e.g. to mark stack and data pages as not executable.
The ASLR feature affects the layout of executable programs and shared libraries within the process address space.

Some more links about operating systems and related topics:
My Blog Repository
OSDev.org
Assembler Programming Topics
"Get into protected mode, enable paging, do something useful, get out again" by David Lindauer
Protected Mode Basics by Robert Collins
The LOADALL Instruction by Robert Collins
A Memory Allocator by Doug Lea
Doing a Kernel in C++
Making plain binary files using a C compiler (i386+)
Skelix OS Tutorial (Protected Mode)
Intel Architecture Software Developer’s Manual
i386 Atomic Operations
Semaphores
Mixing Assembly and C
NASM Docs
Writing a Kernel in C
Writing Your Own Toy OS (Part I)
Writing Your Own Toy OS (Part II)
Writing Your Own Toy OS (Part III)
BareMetal OS
MMURTL aka Developing Your own 32-Bit Operating System by Richard A. Burgess
Miray Symobi
Can We Make Operating Systems Reliable and Secure? by Andrew S. Tanenbaum
Executable and Linkable (ELF) File Format
Portable Executable (PE) File Format
ARM Architecture Reference Manual
ARM Processor MMU

Friday, July 1, 2011

TED Talks: How great leaders inspire action

The author of the open letter to the BlackBerry bosses linked to this superb video about how great leaders inspire action.

See also: The surprising truth about what motivates us.

Purpose (in The surprising truth about what motivates us) == Why are you doing what you do (in How great leaders inspire action).

Saturday, June 11, 2011

Hidden Champions

Als Heimliche Gewinner oder besser bekannt unter dem englischen Begriff Hidden Champions werden relativ unbekannte kleine oder mittelständische Unternehmen, die in ihrem Markt jedoch Marktrührer sind, verstanden. Aus dem Buch "Hidden Champions des 21. Jahrhunderts: Die Erfolgsstrategien unbekannter Weltmarktführer" von Hermann Simon hier einige (wenige) Auszüge:

Hidden Champions (im Bereich Softwareentwicklung):

Wachstum und Marktführerschaft

Ambitionierte Ziele sind das Fundament, auf dem große Erfolge aufbauen.
Hidden Champions wissen nicht nur was sie wollen, sondern haben auch die Willensstärke und Energie, manchmal auch die Besessenheit, ihre Ziele in Taten umzusetzen.
Die hierzu erforderliche Einsatzbereitschaft und das Durchhaltevermögen sind in unserer Wohlstandsgesellschaft und in manchen Firmen die wahren Begrenzungsfaktoren des Wachstums, nicht die fehlenden Marktchancen.
"Wenn du ein Schiff bauen willst, dann trommle nicht Leute zusammen, um Holz zu beschaffen, Aufträge zu vergeben und Arbeit zu verteilen, sondern lehre sie die Sehnsucht nach dem weiten, endlosen Meer."
Vision ist das gerade noch Machbare!
Die Ziele sind auf lange Fristen, eher auf Generationen als auf Quartale ausgerichtet.
Den diversifizierten Konzern der Zukunft stelle ich mir als Gruppe von Unternehmen vor, die sehr wenige Schlüsselressourcen (z. B. Finanzen, Managemententwicklung, Marke) gemeinsam nutzen, die jedoch darüber hinaus ihre Einheiten wie unabhängige Hidden Champions operieren lassen.

Markt und Fokus

Die Hidden Champions definieren ihre Märkte in der Regel eng und bauen in diesen Märkten starke Marktpositionen auf.
Normalen Menschen und Unternehmen, die etwas erreichen wollen, kann man nur dringend raten, sich auf ein Gebiet zu konzentrieren. "Nur das, das aber richtig."
Nur wenn man seine Ressourcen fokussiert, wird man ambitionierte Ziele realisieren.

Globalisierung

Die Globalisierung ist die zweite Säule der Strategie der Hidden Champions. Sie macht die engen Märkte groß.
Nichts wird in unserer Zeit und in den nächsten Jahrzehnten die Welt stärker verändern als die Globalisierung.

Kunden und Leistungsangebote

Die enge und interaktive Kundenbeziehung ist häufig durch komplexe Produkte, die für Hidden Champions typisch sind, bedingt.
Ein Projekt wird wie ein kleines, relativ autonomes Unternehmen geführt.
Die Hidden Champions lehren uns, dass Kundenorientierung wichtiger ist als Wettbewerbsorientierung.

Innovation

Innovation ist das auf Dauer einzig wirksame Mittel, um sich im Wettbewerb mit Erfolg zu behaupten. Innovation ist in erster Linie eine Frage der Kreativität und der Qualität, keineswegs nur eine Sache des Geldes.
Eitelkeiten und Machtspielchen zwischen Abteilungen sind nicht erlaubt.

Wettbewerb

Die gleichzeitige Erfüllung der drei Kriterien "wichtig für den Kunden", "vom Kunden tatsächlich wahrgenommen", und "dauerhaft/nicht leicht imitierbar" bilden eine große Herausforderung.
Die Märkte der Hidden Champions sind überwiegend oligopolistisch.

Finanzierung, Organisation und Umfeld

Die Selbstfinanzierung ist und bleibt die wichtigste Finanzierungsquelle.
Die funktionale Organisation ist für viele Hidden Champions die natürliche Organisationsform.
Die dünne Besetzung der Hidden Champions wirkt als erheblicher Prozessvereinfacher.
Optimale Wertschöpfungstiefe: Hinter der Präferenz der Hidden Champions für das Selbermachen steckt eine tiefe, allgemeine Wahrheit: Einzigartigkeit und Überlegenheit im Wettbewerb können nur intern geschaffen werden. Alles, was man auf dem offenen Markt zukauft, ist auch anderen zugänglich und begründet insofern keine Sonderstellung.
Einzigartigkeit erfordert deshalb Tiefe und eine gewisse Zurückhaltung gegenüber Outsourcing.
Hidden Champions reagieren auf Geschäftsfelderweiterungen konsequent und schnell mit Dezentralisierung der Bereiche. Nur kleine, dezentrale Einheiten gewährleisten eine optimale Kundennähe und bilden damit eine Basis für die erfolgreiche Umsetzung der Hidden Champions-Strategie.

Mitarbeiter

Hochleistung erreicht man nur mit einer Mannschaft, die eine starke Motivation und Identifikation mit dem Unternehmen aufweist.
Hohe Leistung erfordert Intoleranz gegenüber Drückebergerei und die frühe Trennung von Mitarbeitern, die nicht mitziehen.
Die Hidden Champions achten darauf, dass sie "mehr Arbeit als Köpfe" haben. Diese Bedingung minimiert unproduktive Tätigkeiten und Blindleistung und erweist sich als äußerst effektiver Produktivitätstreiber. Das Umgekehrte gilt ebenso: Überbesetzung wirkt als Produktivitätskiller und Treiber von Unzufriedenheit.
Aufgrund ihres Wachstums sind die Hidden Champions große Arbeitsplatzschaffer im Inland wie im Ausland.
Die Menschen sind immer weniger bereit, nur des Geldes wegen zu arbeiten. Sie suchen in der Arbeit vermehrt Sinn, Spaß und Erfüllung übergeordneter Ziele und Werte. Siehe dazu auch "The surprising truth about what motivates us".
Die Auswahl der richtigen Mitarbeiter ist wichtiger als alle Organisations-, Prozess- und Ausbildungsmaßnahmen.

Führung

Die Unternehmer des Aufbaus zeichnen sich durch eine Einheit von Person und Aufgabe, einer fokussierten Zielstrebigkeit, Furchtlosigkeit, Vitalität und Ausdauer sowie durch Inspiration für andere aus.
Unternehmensgründer leben, was sie sind und was sie sein wollen. Diese Einstellung zur Arbeit bedeutet, dass Geld nicht die Hauptantriebskraft dieser Menschen ist. Die Hauptmotivation resultiert aus der Identifikation mit dem Unternehmen und aus der Befriedigung durch ihre Arbeit, ökonomischer Erfolg dürfte demgegenüber eine sekundäre Rolle spielen. Die volle Hingabe und Verantwortung verleiht solchen Führern bei Mitarbeitern und Kunden eine enorme Glaubwürdigkeit.
"They exemplify to me the importance of being single-minded. The single-minded ones, the monomaniacs, are the only true achievers. The rest, the ones like me, may have more fun, but they fritter themselves away. The achievers carry out a mission; the rest of us have interests. Whenever anything is being accomplished, it is being done by a monomaniac with a mission." (Peter Drucker)
"Nothing energizes an individual or a company more than clear goals and a grand purpose."
Die Führungsstile der Hidden Champions sind ambivalent. Die Führung ist sowohl autoritär als auch partizipativ.
Zur Schaffung eines Weltmarktführers braucht man viele Mitstreiter - und zwar aus aller Welt. Der Unternehmer muss das Feuer, das in ihm brennt, in vielen Menschen unterschiedlicher Kulturen entzünden. Das ist Führung!
Die Unternehmensführer der Hidden Champions haben nicht die gleichen Hemmungen und Befürchtungen, die normale Menschen empfinden. Daher können sie ihre Fähigkeiten wirkungsvoller einsetzen.
Vermutlich gehen Hidden Champions bei der Korrektur eingetretener Fehler schneller und entschiedener vor als andere Unternehmen.
Die Aufgabe des Managements besteht darin, sich zwischen Umsatz und Kosten zu stellen und dafür zu sorgen, dass die beiden Abstand voneinander halten.

Strategieentwicklung

Strategieentwicklung ist keine Einmalentscheidung, sondern ein Prozess, der sowohl Top-down als auch Buttom-up verlaufen sollte. Nur Kreativität, Originalität und Querdenken produzieren überlegene Strategien.
"Find out what everybody else is doing, then do it differently."

"Die Hidden Champions beweisen, dass Management auch im 21. Jahrhundert vor allem dem gesunden Menschenverstand folgt. Indem man viele kleine Dinge etwas besser tut, kann man sogar Weltmarktführer werden." (Reinhold Würth)

Beim Lesen dieses faszinierenden Buchs habe ich einige weitere Hidden Champions, wie beispielsweise die Lauterbach GmbH oder QNX Software Systems ausgemacht.
Außerdem finde ich, dass Apple sehr viele Merkmale eines Hidden Champion aufweist und somit zu den Big Champions zählt.
Auch Google führt wohl das Android-Projekt wie unter "Kunden und Leistungsangebote" beschrieben wie ein kleines, relativ autonomes Unternehmen.

Saturday, May 7, 2011

Build your own internet search engine - Part 2

After having started to build my own internet search engine as described in a previous blog post, I now have read some papers and books about web search engine architecture and information retrieval to complete my hobby project. Here is a list of papers and books that I highly recommend to anybody who is interested in this topic:

1. Google: data structures and algorithms by Petteri Huuhka
2. The Anatomy of a Large-Scale Hypertextual Web Search Engine by the Google founders Sergey Brin and Lawrence Page
3. Introduction to Information Retrieval by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze
4. Effect of inverted index partitioning schemes on performance of query processing in parallel text retrieval systems by B. Barla Cambazoglu, Aytul Catal and Cevdet Aykanat
5. Distributed Web Crawling, Indexing, and Search by Ricardo Baeza-Yates and B. Barla Cambazoglu
6. Web Search for a Planet by Luiz André Barroso, Jeffrey Dean and Urs Hölzle
7. Building a Search Engine by David Evans and Sebastian Thrun

As described in my previous blog post I build the whole search engine using Erlang technologies. This worked out extremely well for the search bots. But using CouchDB for storing all the web documents, the forward search index and the inverted search index was a bad idea.
A NoSQL database like CouchDB is great for building a web store like Amazon but it is definitely not good at building a highly scalable web search engine.
The problem is that CouchDB is simply not specialized enough for this task. E.g. for every search query you have to look at millions of documents and rank them appropriately. But if you use something like CouchDB (which has a JSON interface) you just need too much resources of everything (CPU time, memory and network bandwidth) while merging and ranking the documents for multiple search keywords. Now I know that :-).

So I have to remove CouchDB from my search engine software stack and implement the required data structures and algorithms by myself, just as explained in [1] and [2].
One extremely important thing is to have compact data structures for storing the web documents, the lexicon, the forward index and the inverted index. This is because you have to keep a lot of data structures in memory for efficiency reasons. It also makes merging the documents of multiple search keywords by DocID much easier and faster.
My data structures will be similar to those in [1] and [2]. There will be an inverted search index which is sorted by WordID (keyword). For every keyword the inverted index contains a list of matching documents which are sorted by DocID. The lexicon contains an entry for every searchable word and links to the list of appropriate documents in the inverted index. The lexicon and inverted index are generated from the web documents using a MapReduce framework.
If the inverted index is queried for the keywords "earth" and "energy" the lexicon is first asked for the two lists of documents containing these words. Then these two listes are merged using mergesort. The mergesort phase generates a new temporary search result index that is sorted by the page rank of the contained documents. E.g. when a document (DocID) is included in both lists it gets a higher page rank in the temporary search result index for that search query. So it may appear further ahead in the list of search results than documents that only match a single keyword. Besides the number of matching keywords, also some other informations like proximity of keywords are used for calculating the final ranking.
The temporary index for the search keywords is then used to generate the search results page and therefore does not need to be larger than 1000 documents. It may also make sense to cache the temporary index of a search query for some minutes.

Now I have put together all data structures and algorithms to build a working web search engine. However, to build a highly scalable and fast search engine I have to distribute the lexicon and the inverted search index across multiple computers. Therefore, each computer gets a part of the lexicon and the inverted search index. To achieve this, one may either do a term-based partitioning of the inverted index or a document-based partitioning of the inverted index, as described in [3], [4] and [5]. I will use the document-based partitioning approach. The overall search result quality must not suffer from partitioning the inverted search index and thus the partitioning algorithm is a little tricky.
With a distributed inverted search index a single search query is performed on multiple computers simultaneously. E.g. when the complete inverted index is distributed across thousands of computers, one search query may be executed on hundreds of them. Thereby each computer is able to perform the search query on its local part of the inverted index (as explained above) very quickly. The temporary search result index (e.g. containing the top-ranked 1000 local documents) of each worker computer is sent to some master computer afterwards. This only requires minimal network bandwidth. The master computer then merges the temporary search result indexes of all worker computers participating in that search query and generates the overall list of best matching documents. This architecture makes the search engine very fast and fault tolerant.

What is really interesting is that processing a single search query gets highly concurrent within the search engine backend to achieve low response times and utilize the available hardware resources efficiently.
I found a website that quotes Marissa Mayer that a single search query on Google is performed by up to 1000 computers. This website also contains the Google I/O 2008 keynote of Marissa Mayer about "How Google Works" which gives some interesting insights into the Google search engine.

One interesting question that comes to my mind is if one could save energy by using Erlang technologies instead of Python and C++ for the search engine backend. Of course Erlang will not help to save energy for the I/O-bound tasks of the search engine backend. But maybe by using Erlang technologies one could achieve the same degree of distribution and concurrency that is needed to run the internet search engine backend with some less computers and therefore less energy. I really don't know if that is possible, but it would be nice to try that out...

Thursday, April 14, 2011

Scaling the Social Graph: Infrastructure at Facebook

There was a really interesting talk about Facebook's infrastructure at InfoQ some days ago. Jason Sobel presented the evolution of Facebook’s infrastructure over time, from the original LAMP stack to the present multi-datacenter configuration, the challenges faced and plans for the future.

Scaling the Social Graph: Infrastructure at Facebook @ InfoQ

The most interesting part of the talk is about Facebook's fbobj and assoc abstractions. Facebook places all information in Facebook objects (fbobj) that have IDs and then they interlink them using typed associations (assoc). E.g. there are associations (typed links) to friends, events, photos, etc.. That is really great when doing queries. I think HTML <a href> links should also be extended to allow for types and maybe properties. This would help building the semantic web a lot!

Sunday, April 3, 2011

Build your own internet search engine

If you are interested in how to really build a web search engine I suggest to read the second part of this article ("Build your own internet search engine - Part 2") and the section about Apache's search engine software stack at the end of this article.
The CouchDB attempt for the web search engine backend didn't work out, but nevertheless I think this article is quite interesting :-).

A few weeks ago I started building my own (naive) internet search engine using Erlang technologies. I have chosen Erlang projects for that because I think they are perfectly suited for internet backend systems. Now I am stuck at ranking the search results. I will have to read some papers about that before going on :-). Though, up to this point in time everything worked out extremely well.
The first part of the puzzle was to build the search bots that bring home the websites for generating the search index. The search bots were build using the Erlang OTP, ibrowse, mochiweb, mochiweb_xpath and couchbeam projects.
The search engine starts by sending out the first search bot to some website, e.g. http://www.nytimes.com. A search bot downloads a website, forks off new search bots for any links that are found on it and then processes the website. After processing a website each search bot creates (or updates) a CouchDB document that represents the original website along with some keywords, a page rank, etc.. This process is repeated over and over again by each search bot (and for each website). You may imagine that the whole thing gets massively parallel in a very short time.
This massive parallelism caused some headaches to my home router because it was only able to handle a few thousand concurrent HTTP connections. So I limited the concurrency using an Erlang supervisor process. Maybe I will try out Amazon's EC2 in the future. I'm pretty sure they will perform better at this point :-).
The next part of the puzzle was to build the search index from the CouchDB documents that were brought home by the search bots. This is done using a CouchDB design document. Here is a simplified design document that shows how to generate the search index:

"map": "function(doc) {
   var tokens;
   if (doc.keywords) {
   tokens = doc.keywords.split(/[^A-Z0-9_]+/i);
   tokens.map(function(token) {
   for (i = 1; i <= token.length; i += 1) {
   emit(token.slice(0, i), doc);
   }
   });
   }
}"
"reduce": "function(keys, values, rereduce) {
   var output = {};
   if (!rereduce) {
   for (var i in values) {
   output[i] = values[i].url;
   }
   }
   return output;
}"

Now, I was able to query the search index using HTTP requests. For example, the following HTTP POST request queries the search index for the keywords "earth" and "energy". As result you get links to all documents that match these keywords.

curl -X POST http://127.0.0.1:5984/search_index/_design/search_queries/_view/query?group=true -d '{"keys": ["earth", "energy"]}' -H "Content-Type: application/json"

At that point in time I got stuck due to insufficient knowledge about how to appropriately merge and rank the documents that are retrieved from the CouchDB inverted search index. But exactly this ranking is the crux of a good search engine. My idea is to first sort the suggested websites by the number of matching keywords and second by a page rank that is derived from Albert-Laszlo Barabasi's book "Linked": The more links refer to a specific website the higher the page rank for that website.

The web interface for the search engine will simply be a little Couch app.

One thing that I have learned from this hobby project up till now is that scalability really means specialism if you build huge systems like search engines. That is exactly what CouchDB does in order to be fast and scalable. I am sure that the same is true for Google's search engine infrastructure.

This story goes on in this blog post containing part II on the topic.

Update: The Apache search engine software stack
Before going on with the Erlang approach to build a search engine I now have looked at the Apache software stack to build search engines. It looks pretty complete and scalable. The information below is taken from Apache's project websites.

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java.

The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. It includes the HDFS, which is a distributed file system that provides high throughput access to application data. It also provides a MapReduce software framework for distributed processing of large data sets on compute clusters. The MapReduce framework can work on top of the HDFS. Check this link for a Hadoop MapReduce example.
Hadoop also contains the Hive framework which provides a mechanism to project structure onto data and query the data using a SQL-like language called HiveQL. It does so by transforming the HiveQL statements to algorithms for Hadoop's MapReduce framework. Hive was developed by Facebook.

Apache Nutch is a highly scalable and relatively feature rich (web) crawler. It contains search bots and other stuff. E.g. Nutch offers features like politeness (obeys robots.txt rules), robustness and scalability (Nutch runs on Apache Hadoop, so you can run Nutch on a single machine or on a cluster of 100 machines), quality (you can bias the crawling to fetch “important” pages first) and extendability. One of the most important single feature Nutch provides out of the box is a link database. Therefore Nutch tracks links between pages so that the relevancy of search results within a collection of interlinked documents goes well beyond the naive case where you index documents without link information and anchor texts.

Apache Solr is an enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language.

When combining Nutch with Solr, Solr will be used as the only source for serving search results (including snippets). This way you can totally decouple your search application from Nutch and still use Nutch where it is at its best: crawling and extracting the content. Using Solr as the search backend, on the other hand, allows you to use all of the advanced features of a Solr server – like query spell checking, “more like this” suggestions, data replication and easy query time relevancy tuning, etc..
So Nutch collects the data and Solr serves it via its search index.
See also this blog entry by Sami Siren for more details.

Wednesday, March 23, 2011

The Music of Life

What is Life? This is the question asked by Denis Noble in this very personal and at times deeply lyrical book. Nobel, a renowned physiologist and pioneer of the field of systems biology, argues that we must look beyond the reductionist gene's eye view of life to answer the question. The genome is not life itself. To understand what life is, we must make a radical switch of perception and view it at a variety of different levels, with interaction and feedback between gene, cell, organ, system, body and environment. It emerges as a process, no more and no less than the ebb and flow of activity in this intricate web of connections. This, Noble argues, is the music of life. [Noble, The Music of Life]

Monday, March 21, 2011

The Evolution of the Erlang VM

The Evolution of the Erlang VM @ InfoQ
A really really great talk about the internals of the Erlang VM and how it has evolved over time.
That's Joe Armstrong all over :-).