Quick Sort oh Quick Sort! Perhaps one of the harder algorithms to get right, and boy was that my experience. The first stumbling block I hit while trying to understand how it works was that there are different implementations of it. Some online resources choose to determine the pivot from the left (array[0]), some from the extreme right (array[array.length – 1]), some randomly sample from the array. Nonetheless, what a truly enjoyable process, so let’s begin.

Quick Sort is an in-place algorithm, means the input array is transformed using no other auxiliary data structure. Said differently, memory is over-written, or the input space is over-written. Another characteristic of Quick Sort is that it’s divide and conquer. It breaks the problem space into smaller and smaller chunks applying the same algorithm at each step to get closer and closer to the solution. Quick Sort is more efficient than Bubble Sort and Insertion sort (for most cases).

Quick sort is the simple process of ordering an array. It’s method involves 3 steps.

  1. Partition the array.
  2. Recurse the left sub-array to sort it
  3. Recurse the right sub array to sort it

Some added terminology, is that of “pivot”. This is a value we choose in the array that would breakdown our array into two parts (partitioning). Part A (or left sub-array) would be those values that are less than the pivot. Part B (or right sub-array) would be those values greater than or equal to the pivot. As an example:

[1, 21, 33, 14, 65, 6, 16 ]

If we selected 16 as our pivot, then the numbers partitioned would look like this:

1, 14, 6 | 16 | 21, 33, 65 # Note values on left smaller, and values on right greater than 16.

With a little more clarity:

irb(main):005:0> test =  [1, 21, 33, 14, 65, 6 ]
=>; [1, 21, 33, 14, 65, 6]
irb(main):006:0> test.partition { |x| x > 16 }
=>; [[1, 14, 6], [21, 33, 65]]

Now if you were paying attention, you would have realised we have just completed step 1, partitioning. The approach I’ve taken is to use the last element as the pivot, though as I said there are different and more efficient strategies online.

Step 2 and 3 are basically recursive calls using partitioning again and again, till there are no more values to partition. If you still struggling with the concept have a look at this absolute gem of a resource where you can literally and visually walk the Quick Sort process. The code will instantly make sense with this, but also you can walk the code through the diagram above.

Rspec:

describe Sorting::QuickSort do
  before :each do
    #@array = [ 3, 1, 4, 1, 5, 9, 2, 6, 5, 3 ]
    @array = [ 3, 7, 8, 5, 2, 1, 9, 5, 4 ]
    @qs = Sorting::QuickSort.new
  end

  context 'quick sort' do
    it 'should return an ordered array' do
      expect(@qs.quicksort(@array)).to eq [1, 2, 3, 4, 5, 5, 7, 8, 9]
    end
  end
end

Code:

# Amazing piece of code, tweaked a little
# Thank you - http://rosettacode.org/wiki/Sorting_algorithms/Quicksort#Ruby
# 
module Sorting
  class QuickSort
    def quicksort(array)
      return array if array.length <= 1
      pivot = array[array.length - 1]
      left, right = array[0..-2].partition { |x| x < pivot }
      quicksort(left) + [pivot] + quicksort(right)
    end
  end
end

This time we delve into the merry world of ‘Sorting’ algorithms. First up on the menu is Bubble Sort. By no means is this an efficient sorting algorithm but it definitely is one of the easier to translate from theory to code.

Definition: sometimes referred to as sinking sort, is a simple sorting algorithm that repeatedly steps through the list to be sorted, compares each pair of adjacent items and swaps them if they are in the wrong order. The pass through the list is repeated until no swaps are needed, which indicates that the list is sorted ~Wikipedia

To get a better understanding, consider an array (n) with 10 elements, ie. 3,0,1,8,7,2,5,4,6,9. One can clearly see that it is not in order at all (that is ascending). As such, following the bubble sort approach, we will compare every element on the left, to the element of the array on the right. When the element on the right is smaller then the element on the left, we will swap places. This again is a rinse and repeat until the process of sorting is complete. But not quite! An optimization to the algorithm is that after every iteration of sorting and swapping, the numbers on the extreme right are already sorted, called this ‘k’. As such, when we go through each iteration, we also increase k by 1, that is the remaining subsets of numbers at the end of the array or on the right hand that are already ordered and that we don’t need to even consider for swapping, and thus ignore in each iteration.

From “Start” to “Iteration X” you will see the flow of Bubble Sort as it makes it’s way. If you are still unclear, then I’ve done a terrible job of explaining something rather simple, and perhaps I can offer you this as a further explanation which is tremendous fun.

In terms of code, I’ve used the algorithm in Christopher Fox’s Concise Notes, which is publicly available on the Internet. It’s elegant and just darn efficient.

Before we dive into the code, Bubble Sort again is not efficient, but for small input sizes this may be just the thing you need. By way of Big O Notation, this algorithm is Big O(n*n) or O(n²).

Rspec:

describe Sorting::BubbleSort do
  before :each do
    @some_array = [3,2,1]
    @some_other_array = [2,4,51,23,45,1,17,20]
    @bs = Sorting::BubbleSort.new
  end

  context 'sort an array of numbers' do 
    it 'should array of numbers' do
      expect(@bs.bubble_sort(@some_array)).to eq [1,2,3]
      expect(@bs.bubble_sort(@some_other_array)).to eq @some_other_array.sort
    end
  end
end

Code:

class Sorting
  class BubbleSort
    def bubble_sort(array)
      (array.size - 1).downto(1).each do |k|
        1.upto(k).each do |n|
          if array[n] < array[n - 1]
            array[n], array[n - 1] = array[n - 1], array[n]
          end
        end
      end
      array
    end
  end
end

Code as always can be found here.

In the first part of the topic of Trie, we layed out the fundamentals of what it is and semantics of the source code. But you can’t get maximum benefit from the Trie without the functionality of searching.

The first implementation of a search will be simply searching for an exact match of the word in the trie. Consider searching for the word ‘aces’. To begin search, we would need to break down the word into individual characters, and in a indexed loop, check if our Hash contains the matching key. So our first character would be ‘a’, and we would then progress to determine if at the heighest level, the key itself is ‘a’; which we can see is true from the diagram above. We will then move to or if you prefer retrieve the child of ‘a’. We will iterate in our loop to the next character in the word ‘aces’ which would be ‘c’. As before, we will determine if at that height the key has the matching character ‘c’. Again in light of the diagram that proves true and we will rinse and repeat, and thus find the ‘aces’ character by character. However, if in the process instead of ‘e’ (in aces), we find a key of ‘k’, that would mean the word does not exist, and we would return nil.

Here’s the code to which this would make more sense.

private
    def search_node(string)
      children = @root.children
      trienode = TrieNode.new
      string.split('').each_with_index do |char, index|
        if children.has_key?(char)
          trienode = children[char]
          children = trienode.children
        else
          return nil
        end
      end
      trienode
    end

We can also apply another form of search, that is ‘starts_with’ to find a matching prefix in a Trie of words. For example, our Trie has ‘aces, acer and actor’, we would want to determine if the following prefix ‘act’ exists. The concept is exactly the same as above with the word search as you will see in the code below. I’ve included all the pieces including Rspec.

context 'search' do
    before :each do
      @trie.insert('ace') # initial prefix
      @trie.insert('acer')
      @trie.insert('aces')
      @trie.insert('actor')
    end

    it 'should determine if a word exists in the trie' do
      expect(@trie.search('aces')).to eq true
    end

    it 'should return false if the word does not exist in the trie' do
      expect(@trie.search('ant')).to eq false
    end

    it 'should determine if there is any word in the trie that starts with the given prefix' do
      expect(@trie.starts_with('act')).to eq true
    end
  end

The full code:

def search(word)
      trienode = search_node(word)
      return true if (not trienode.nil? and trienode.is_leaf)
      false
    end

    def starts_with(prefix)
      return false if search_node(prefix).nil?
      true
    end

Firstly a “trie” is pronounced tree, or in some circles “try”.

It’s a particularly interesting data structure and one overlooked. To put it into context, it is a tree, and known by a few other names, digital tree, radix or as I like to consider it, prefix tree.

Definition:
It is an ordered tree structure that is used to store an associative array. An associative array in Ruby is simply a hash or {} or { ‘friend’ = ‘andre’ }. According to NIST it is a tree for storing strings in which there is one node for every common prefix.

The benefit of a Trie which will give you the right mental picture is that of a ‘router’ for an MVC web framework. A router applies regular expression style matching to find the correct endpoint to route the request too.

What Does A Trie Structure Do?

So let’s being by inserting the word ‘ace’. This will result in the following tree below.


Now this is our base – the “prefix”. Let’s see what the picture looks like if we insert another word, this time ‘acer’. Immediately you can see there is no duplication, but a clear recognition of an already existing prefix and then an insert which now results in a new leaf node “r”.

 

To push our understanding even further, consider if we had to add the word ‘aces’. What do you think the Trie would look like?

Excellent, as you can see it detected a prior node, and created a new leaf node on the right.
This is basically a Trie. One can immediately see the efficiency here, that is there are three words which consist of 11 characters, of which we only store 5.

Again as always the rspec tests with accompanying code.

Rspec

# Pronounced "tree"
require 'byebug'

describe Trees::Trie do
  before :each do
    @trie = Trees::Trie.new
  end

  context 'initialization' do
    it 'should initialize with a new TrieNode' do
      expect(@trie.root.class).to eq Trees::Trie::TrieNode
    end

    it 'should initialize with a property isLeaf' do
      expect(@trie.root.respond_to?(:is_leaf)).to eq true
    end

    it 'should intialize with a property children' do
      expect(@trie.root.respond_to?(:children)).to eq true
    end

    it 'isLeaf should be boolean' do
      expect(@trie.root.is_leaf).to eq false
    end

    it 'children should be a hash' do
      expect(@trie.root.children.class).to eq Hash
    end
  end

  context 'insertion' do
    before :each do
      @trie.insert('ace') # initial prefix
      @trie.insert('acer')
      @trie.insert('aces')
      @trie.insert('actor')
    end
    it 'should insert a word' do
      # (a)
      #  |
      # (c)
      #  |
      # (e)
      expect(@trie.root.children.keys[0]).to eq 'a'
      expect(@trie.root.children['a'].children.keys[0]).to eq 'c'
      expect(@trie.root.children['a'].children['c'].children.keys[0]).to eq 'e'
    end

    it 'should only append if word already contains prefix' do
      #   (a)
      #    |
      #   (c)
      #    |
      #   (e)
      #   /
      # (r)
      expect(@trie.root.children['a'].children['c'].children['e'].children.keys[0]).to eq 'r'
    end

    it 'should add a second key if there is an existing key at same height' do
      #   (a)
      #    |
      #   (c)
      #    |
      #   (e)
      #   / \
      # (r) (s)
      expect(@trie.root.children['a'].children['c'].children['e'].children.keys[1]).to eq 's'
    end

    it 'should add a new word only where the prefix matches' do
      #   (a)
      #    |
      #   (c)
      #    |      \
      #   (e)    (t)
      #   / \     |
      # (r) (s)  (o)
      #           |
      #          (r)
      expect(@trie.root.children['a'].children['c'].children['t'].children['o'].children.keys[0]).to eq 'r'
    end
  end
end

Code:

module Trees
  class Trie

    attr_accessor :root

    class TrieNode
      attr_accessor :is_leaf, :children

      def initialize
        @children = Hash.new
        @is_leaf = false
      end
    end

    def initialize
      @root = TrieNode.new
    end

    def insert(word)
      children = @root.children

      word.split('').each_with_index  do |char, index|
        trienode = TrieNode.new
        if children.has_key?(char)
          trienode = children[char]
        else
          trienode = TrieNode.new
          children[char] = trienode
        end

        children = trienode.children
        trienode.is_leaf = true if (index == (word.length - 1))
      end
    end
  end
end

 

All code can be found here.

References:

Based on our previous post we are at a prime point to deal with InOrder and PostOrder traversals for Binary Trees.

The beauty of unlocking one of these Depth-first search methods, is that you quickly seee the pattern and recognize that cuttting and pasting existing methods with some minor re-adjustment in order will result in InOrder and PostOrder traversals. Now what I am saying may sound a little abstract, but it will make more sense when you see the code. The important thing is not to recognize the pattern and memorize it (restructuring of the code that is), but rather what the ordering process of traversing the tree actually entails.

Definition: InOrder

  1. Traverse the left tree
  2. Display the value of root
  3. Traverse the right tree

Said differently, first visit the left sub-tree of root, from the extreme left and traverse all it’s children. Then visit the root node and get it’s value. Then visit the right sub-tree of root, from the extreme left and traverse all it’s children. By “extreme” I mean, keep going left until there is no left-node anymore.

Seeing this visually and relational to the tree (image) that features this article, InOrder would result in the following tree traversal:
m, b, p, k, t, d, a, g, c, f, h

Here is the code, and what you will immediately notice, it’s exactly like PreOrder except

  • When the root node is visited has changed (inorder method itself)
  • When the value is read (in the private inorder_subtree method)

Unit Test:

it 'should return values in inorder' do
      expect(@bst.inorder(@bst)).to eq ['m', 'b', 'p', 'k', 't', 'd', 'a', 'g', 'c', 'f', 'h']
end

Implementation:

def inorder(node)
 result = []
 result << inorder_subtree(node.root.left) 
 result << @root.value
 result << inorder_subtree(node.root.right) 
 result.flatten
end
def inorder_subtree(node, result = [])
  inorder_subtree(node.left, result) if node.left != nil
  result << node.value
  inorder_subtree(node.right, result) if node.right != nil
  result
end

Excellent.

Definition: PostOrder

  1. Traverse the left tree
  2. Traverse the right tree
  3. Display the value of root

I’m not going to into much more detail than that as the pattern now should be rather obvious. Here’s the code.

Unit Test:

it 'should return values in postorder' do
      expect(@bst.postorder(@bst)).to eq ['m', 'p', 't', 'k', 'b', 'g', 'a', 'h', 'f', 'c', 'd']
end

Implementation:

def postorder(node)
      result = []
      result << postorder_subtree(node.root.left) # traverse left sub-tree
      result << postorder_subtree(node.root.right) # traverse right sub-treeresult.flatten
      result << @root.value # moved to end
      result.flatten
end
def postorder_subtree(node, result = [])
      postorder_subtree(node.left, result) if node.left != nil
      postorder_subtree(node.right, result) if node.right != nil
      result << node.value # value read after left and right leafs reached
      result
end

And there you have it.

Depth First

Today we are looking at Binary Tree traversal, that is, what strategy will we use to descend up and down the tree to update or check the value at a node. There are two primary categorical traversal strategies, one being Depth-first search and the other Breadth-first search. Depth-first is basically about going deeper or descending into the tree, while Breadth-first search is traversal at level-order, that is the nodes that are at the same height spatially. For the purposes of narrowing our scope and not overloading our approach to learning, let’s just focus on one of Depth-first search approaches and that of PreOrder. So if that intro felt like an overload, skip right ahead, collect your R200 and let’s just plod on……

Definition of PreOrder Traversal:

  1. Display value of root
  2. Traverse the left tree 
  3. Traverse the right tree
 
Said differently, first visit the root node (and get it’s value), then traverse it’s left sub-tree; starting from the left and traverse all the children. Then switch to the right sub-tree; starting from the left, and traverse all the children.
Make sense? Ok, let’s go visual.
If we applied PreOrder traversal to the Binary Tree listed above by following the 3 rules in its definition; our end result is
d, b, m, k, p, t, c, a, g, f, h
 
Now an important note if you still struggling is that when you switch-over to the right sub-tree, notice the imbalance at Node ‘a’, there is no left child, and similarly on the right Node ‘f’ has no left child. This could be the only real cause of confusion that may end you up with a different order.
Onto the meat, a peak into the code. Please note, the code is entirely my own and not aimed at being performant but through clear and clean code enable the reader to understand PreOrder Traversal.
Unit Test:
describe Trees::BinaryTree do
  before :each do
    @bst = Trees::BinaryTree.new('d') # set root

    # Left Subtree
    @bst.root.insert_left('b')
    @bst.root.left.insert_left('m')
    @bst.root.left.insert_right('k')
    @bst.root.left.right.insert_left('p')
    @bst.root.left.right.insert_right('t')

    # Right Subtree
    @bst.root.insert_right('c')

    @bst.root.right.insert_left('a')
    @bst.root.right.left.insert_right('g')

    @bst.root.right.insert_right('f')
    @bst.root.right.right.insert_right('h')

  end
  context 'traverse tree in specific order' do
    it 'should return values in preorder' do
      expect(@bst.preorder(@bst)).to eq ['d', 'b', 'm', 'k', 'p', 't', 'c', 'a', 'g', 'f', 'h']
    end
  end
end
Ruby Code:
module Trees
  class BinaryTree
    attr_reader :root

    class Node
      attr_reader :value, :left, :right

      def initialize(value)
        @value = value
        @left = nil
        @right = nil
      end

      def insert_left(value)
        @left = Node.new(value)
      end

      def insert_right(value)
        @right = Node.new(value)
      end
    end

    def initialize(root_value)
      @root = Node.new(root_value)
    end

    def preorder(node)
      result = []
      result << @root.value # start at root and add it's value
      result << preorder_subtree(node.root.left) # traverse left sub-tree
      result << preorder_subtree(node.root.right) # traverse right sub-tree
      result.flatten 
    end

    private 
    def preorder_subtree(node, result = [])
      result << node.value
      preorder_subtree(node.left, result) if node.left != nil
      preorder_subtree(node.right, result) if node.right != nil
      result
    end
  end
end

 

Balanced tree

This is perhaps a truly interesting and mind numbing aspect of Binary Trees. Note, I am not saying ‘Binary Search Tree’ but ‘Binary Trees’. If unsure of the differences they are explained here.

Ok, how do we determine if a binary tree is balanced? A tree is height-balanced (balanced) if the heights of the left and right subtrees of each node are within 1. Before we do that let’s look at some examples to make sure we get that definition.

Perfectly Balanced:

Perfect balance

This is perfectly balanced as the right and left subtrees of any node are the same height.

Height Balanced:

Height balanced

Consider we label the last 2 leaves, C and D as height 0. They thus are ‘height balanced’. However when we review, B does not have a counter node, or a fellow pair if you would like to put it rather loosely. However going back to our definition of ‘balanced’, A tree is height-balanced (balanced) if the heights of the left and right subtrees of each node are within 1. Wallah! This is balanced.

Not Balanced:

Un balanced tree

Just for extra clarity, this is not balanced. Why? Just looking at node B alone will show you that this left child sub-tree does not have a right child (and no not C, as C equally needs it’s own to the left).

Ok now that we have some rules in place, here’s the code. I’ve included RSpec tests for extra clarity. The algorithm was taken from Cracking the Coding Interview which written in C++ and ported to Ruby.

Spec File:

def build_balanced_tree
  bst = Trees::BinarySearchTree.new
  bst.insert(50) #root
  bst.insert(40) # left
  bst.insert(39) # left of 40
  bst.insert(41) # right of 40
  bst.insert(60) # right
  bst.insert(59) # left of 60
  bst.insert(61) # right of 60
  bst
end

def build_unbalanced_tree
  bst = Trees::BinarySearchTree.new
  bst.insert(50) #root
  bst.insert(60) #right
  bst.insert(12) # left
  bst.insert(40) # right
  bst.insert(13) # left
  bst.insert(44) # right
  bst.insert(55) # right
  bst
end

# -- snip snip from Cracking the Coding Interview **great book**
# The idea is very simple: the difference of min depth and max depth 
# should not exceed 1, since the difference of the min and the max depth 
# is the maximum distance difference possible in the tree.

describe Trees::BalancedTree  do
  before :each do
    @bst_balanced = build_balanced_tree
    @bst_unbalanced = build_unbalanced_tree
    @bst = Trees::BinarySearchTree.new
  end

  context 'is the tree balanced' do
    it 'should return true if tree is empty' do
      expect(Trees::BalancedTree.is_balanced?(@bst)).to be true
    end

    it 'should validate if tree is balanced' do
      bst = build_unbalanced_tree
      expect(Trees::BalancedTree.is_balanced?(@bst_balanced)).to be true
    end

    it 'should validate if the tree is unbalanced' do
      bst = build_unbalanced_tree
     expect(Trees::BalancedTree.is_balanced?(@bst_unbalanced)).to be false
    end
  end
end

[Actual Code: (based on a prior article)

module Trees
  module BalancedTree
    def self.is_balanced?(tree)
      (max_depth(tree.root) - min_depth(tree.root) <= 1)
    end

    private
    def self.max_depth(tree)
      return 0 if ((not tree.respond_to? :left) or (not tree.respond_to? :right))
      return 1 + [max_depth(tree.left), max_depth(tree.right)].max
    end

    def self.min_depth(tree)
      return 0 if ((not tree.respond_to? :left) or (not tree.respond_to? :right))
      return 1 + [min_depth(tree.left), min_depth(tree.right)].min
    end
  end
end

References:

  1. Images taken from these Lecture Notes

Binary search algorithm

Today’s post is a follow-on from the original Part 1, an introduction to Binary Search Tree. For the sake of simplicity and coherency I will use the same values in the nodes. As a different means to writing out content in a blog post, I’ve chosen to use RSpec to communicate intent and elaborate on the code as a form of self-blog. 

So to kick off, I will present the actual code for the Binary Search Tree, supporting only two main sets of functionality, insertion and search. Thereafter, the spec file.

# Kudos to Phil McClure (rubyalgorithms.com)

module Trees
  class BinarySearchTree
    attr_reader :root

    class Node
      attr_reader :parent, :left, :right

      def initialize(value)
        @parent = value
        @left = nil
        @right = nil
      end

      def insert(value)
        if value <= @parent
          @left.nil? ? @left = Node.new(value) : @left.insert(value)
        elsif value > @parent
          @right.nil? ? @right = Node.new(value) : @right.insert(value)
        end
      end
    end

    def insert(value)
      if @root.nil?
        @root = Node.new(value)
      else
        @root.insert(value)
      end
    end

    def search(value, node=@root)
      return nil if node.nil?
      if value < node.parent
        search(value, node.left)
      elsif value > node.parent
        search(value, node.right)
      else
        return node
      end
    end

    def initialize
      @root = nil
    end
  end
end

And the spec file

describe Trees::BinarySearchTree do
  before :each do
    @bst = Trees::BinarySearchTree.new
  end

  context 'During initialization' do
    it 'should initialize with a nil root value' do
      expect(@bst.root).to eq nil
    end
  end

  context 'Interface' do
    it 'should allow a new value to be inserted as a node' do
      @bst.insert(15)
      @bst.insert(10)
      bottom_left_of_root = @bst.root.left
      expect(bottom_left_of_root.parent).to eq 10
      expect(bottom_left_of_root.left).to be nil
      expect(bottom_left_of_root.right).to be nil
    end

    it 'should add values less than parent to the left as nodes' do
      @bst.insert(15)
      @bst.insert(10)
      bottom_left_of_root = @bst.root.left
      expect(bottom_left_of_root.parent).to eq 10
      expect(bottom_left_of_root.left).to be nil
      expect(bottom_left_of_root.right).to be nil
    end

    it 'should add values greater than parent to the right as nodes' do
      @bst.insert(15)
      @bst.insert(20)
      bottom_right_of_root = @bst.root.right
      expect(bottom_right_of_root.parent).to eq 20
      expect(bottom_right_of_root.left).to be nil
      expect(bottom_right_of_root.right).to be nil
    end
  end

  context 'Inserting' do
    it 'should assign first value to root, if root is nil' do
      @bst.insert(1)
      expect(@bst.root.parent).to eq 1
    end

    it 'should build a binary search tree' do
      @bst.insert(15)
      @bst.insert(10)
      @bst.insert(8)
      @bst.insert(12)
      @bst.insert(20)
      @bst.insert(17)
      @bst.insert(25)
      expect(@bst.root.left.left.parent).to eq 8
      expect(@bst.root.left.right.parent).to eq 12
      expect(@bst.root.right.left.parent).to eq 17
      expect(@bst.root.right.right.parent).to eq 25
    end
  end

  context 'Search' do
    it 'should traverse the tree and find a matching leaf on the right' do
      @bst.insert(15)
      @bst.insert(10)
      @bst.insert(8)
      @bst.insert(12)
      @bst.insert(20)
      @bst.insert(17)
      @bst.insert(25)
      expect(@bst.search(17).parent).to eq 17
    end

    it 'should traverse the tree and find a matching leaf on the left' do
      @bst.insert(15)
      @bst.insert(10)
      @bst.insert(8)
      @bst.insert(12)
      @bst.insert(20)
      @bst.insert(17)
      @bst.insert(25)
      expect(@bst.search(12).parent).to eq 12
    end

    it 'should traverse the tree and find the root node if it is the search value' do
      @bst.insert(15)
      @bst.insert(10)
      @bst.insert(8)
      @bst.insert(12)
      @bst.insert(20)
      @bst.insert(17)
      @bst.insert(25)
      expect(@bst.search(15).parent).to eq 15
    end
  end
end

Binary search basics

 

What is a Binary Search Tree?

Stop. Before we even commence to look at understanding what this is, let’s build up a mental model first, so when the definition is presented we have some context and secondly we have a pool of terms that we can use and that makes sense. Above in the funky diagram above is a basic tree. It has a root node (parent node) and child nodes (A and F). Obviously based on context the role can change. So if we traversed the tree to Node A, it would be the parent node (not root node), with two child nodes, node B on the left and node C on the right. We can call A and F sub-trees if you will. Now that’s pretty basic. 

 

Binary search numeric

Now consider the same tree above, but instead of letters we insert numbers. Now what’s interesting to note, that the value of all nodes in the left subtree is lesser or equal than the value of all nodes in the right subtree

LEFT => Let’s check quickly, 10 (parent) + 8 (left child) + 12 (right child) = 20. 

RIGHT => 20 (parent) + 17 (left child) + 25 (right child) = 62.

Going back to what we said above, is that true? (value of all nodes in left subtree, less or equal to value of all nodes on the right?)

20 < 62 = TRUE. 

Excellent, we making progress! But, that principle should also apply to the subtrees.If we traverse left to node 10, it is parent to 8 and 12. 8 < 12 = TRUE. So there is no violation. When we traverse to the right to node 20, it is a parent to 17 and 25. 17 < 25 = TRUE. This principle is the important attribute that defines a Binary Search Tree. But there is one more! 

Binary search trees in using the binary search algorithm requires “ordering” and as such, each parent node will have a value, whose children on the left, are smaller than it. On the right, will have children whose values are greater. So if we traverse and visit node 10. It has a left-node of 8 (which is < 10) and a right-node of 12 (which is > 10). The same applies if we visit node 20 on the right. Being parent, it has a left-node of 17, which is less than itself (20). And it has a right child node of 25, which is greater than itself. Awesome!

These two principles applied, is a Binary Search Tree.

Now let’s take in Wikipedia’s definition:

“In computer science, binary search trees (BST), sometimes called ordered or sorted binary trees, are a particular type of containers: data structures that store “items” (such as numbers, names etc.) in memory. They allow fast lookup, addition and removal of items, and can be used to implement either dynamic sets of items, or lookup tables that allow finding an item by its key (e.g., finding the phone number of a person by name).

 

Binary search trees keep their keys in sorted order, so that lookup and other operations can use the principle of binary search: when looking for a key in a tree (or a place to insert a new key), they traverse the tree from root to leaf, making comparisons to keys stored in the nodes of the tree and deciding, based on the comparison, to continue searching in the left or right subtrees. On average, this means that each comparison allows the operations to skip about half of the tree, so that each lookup, insertion or deletion takes time proportional to the logarithm of the number of items stored in the tree. This is much better than the linear time required to find items by key in an (unsorted) array, but slower than the corresponding operations on hash tables.”


Before we leave off, binary search trees have a best running time of O(log n). That is really efficient algorithm, where at the growth of the input over time, the time of the algorithm wanes.

 

References:

If you are a self-taught programmer you hit a point in your career where Big-O Notation bites you. Either in a programming interview or in a given problem in the work place like N+1 SQL queries. In falling into this trap if you will, I’ve taken some strides into trying to grapple with the beast (Big-Oh), without having to delve into the intricacies of the math involved. This is my journey and a dissemination of information which I hope could help you too, dear reader.

So the first question, what is Big-O Notation:

Basically it’s used to determine how well a computer algorithm scales as the data increases. More explicitly it measures the efficiency of an algorithm based on the time it takes for the algorithm to run as a function of the input size.

Big-O notation has a symbol notation, listed below in tabular form:

Notation

Name

O(1)

constant

O(log(n))     

logarithmic

O((log(n))c)  

polylogarithmic

O(n)  

linear

O(n 2 ) 

quadratic

O(n c )

polynomial

O(c n ) 

exponential

Don’t be thrown off just yet, just get your eye used to the format. 

O(1) is said to be, Order of 1.

O(n) is said to be, Order of N and so on.

What is the benefit to knowing Big-O Notation?

We just mentioned two reasons at the start of this discussion, but a driving factor is being able to compare algorithms and decide on which one is more efficient and thus the best to implement. Big-O Notation gives us some basis for measuring the efficiency.

Apply Big-O Notation with some loose rules of thumb:

  1. Big-O Notation cares about the worst-case scenario (in which the algorithm is run).
  2. Single values or single iterations of a value over a worst-case scenario often don’t have that much of a bearing to consider
  3. Get to the heart of the complexity (either a comparison or loops), then use that to determine worst-case
  4. n could be the actual input, or the size of the input
  5. When comparing algorithms, throw-away the constants (or things that don’t matter over time)
  6. Big-O Notation is concerned with three ideas. The number of steps converted to a formula, then only the highest power of n is used to represent the entire algorithm
  7. Determine what you trying to optimise for, for example; time or memory.
  8. Time can also be considered as the number of steps it takes to complete a problem
  9. When we are trying to find the complexity of a function/procedure/algorithm, we are not interested in the exact number of operations that are being performed. Instead we are interested in the relation of the number of operations to the problem size.

Working examples:

O(1) – execution time of the algorithm is constant.

FIELD_LIMIT = 24

def valid_length?(input_array)
  input_array.length == FIELD_LIMIT
end

In the example, you can see that despite that the input size of the array may be, the operation is independent, only focusing on the length of the input, as opposed to it’s actual contents.

O(n) – execution tile will grow linearly depending on the input size.

def dump_it(customers_array)
  for customer_name in customers_array do    
    puts "#{customer_name}"
  end
end

In this example, you can see that there is a direct relationship to the input and the number of steps or rather the time taken. This growth is considered linear. If the input is an array of 100 elements, then the algorithm (however simple) will print out 100 lines of output. If the input is an array of 100 000 elements, then so will the output match.

*Hint – there is only one iteration

O(n^2) – for each increment of the input size, the time of the algorithm will double

def print_chessboard(size)
  for i in (1..size)
    for j in (1..size)
      print '*'
    end
    puts
  end
end

In this example, you can see there is a nested loop used to print out just the general form of a chessboard (no indicators for black or white). If the size of chessboard to be printed out is 2, then there will be 4 asterisks (*). If the size is then 6, then there will be 36 asterisks. If the size requested of board is 8, then there will be 64 asterisks. The pattern should be clear here of n^2. That is, 2^2 = 4. 6^2 = 36. 8^2 = 64

O(2n) – for each increment of the input size, the time of the algorithm will grow times two

def dump_it(customers_array)
  for customer in customers_array do    
    puts "#{customer[:name]}"
  end

  for customer in customers_array do
    puts "#{customer[:country]}"
  end
end

Lastly in this example, we see two separate loops, one for the printing out of customer names, and the other for printing out customer locations (which city they are in). 

The one thing about Big-O Notation is that it takes some time to get, and as such there may have been things repeated with intent. But if this has not assisted, here are some really good links:

  1. History and a more mathematical perspective of Big-O
  2. Brief overview of Big-O from a Rubyist
  3. Programmer Interview Big-O Notation
  4. Big-O explained beautifully
  5. Beginners Guide To Big-O
  6. StackOverflow Explanation
  7. Very handy common cases
  8. Big-O cheatsheet
  9. Ruby Algorithms and Data Structures