Using heaps to solve computer science problems

Complete guide along with c++ code examples

Heap is a truly elegant data structure discovered by W. Williams and R. W. Floyd. It facilitates many use cases with low computational complexity. Heap handles sequences of data objects by partially ordering its elements.

Per dictionary, heap is a disorderly collection of objects placed haphazardly on top of each other … heap of stones or fruits.

Photo by Artem Beliaikin on Unsplash

Heap data structure use cases

Heap is used to find solutions in wide variety of problems and it’s one of the most versatile data structure. Following are some of the use cases of heap to develop little bit appreciation before we deep dive into details. Also there are many variants of heap depending on optimizations required in specific use cases such as Fibonacci heap: uses Fibonacci numbers in its structure to improve run time efficiency, Radix heap: monotone priority queues when arranging events in order of time such as network timeouts, K-D heap: efficient multi-dimensional priority queue … so on.

  1. Priority queue or task scheduler: As name defines, scheduler selects the tasks to work on next as per the priority. Priority scheduling involves assigning priority to every task or process. The processes with higher priorities are executed out first. While the tasks can be ready to be executed in any order, higher order tasks are chosen before lower priority tasks. The tasks are maintained in task priority queue in the form of heap and scheduler picks the next task from heap root node.
  2. Sorting: Heapsort is a comparison based sorting technique that divides its elements into a fully sorted and unsorted regions. It iteratively shrinks the unsorted region by extracting the largest or smallest element and inserting it into its sorted region. Heapsort does not perform linear-time search of the non sorted region, rather it maintains the unsorted data in a heap to efficiently find the next element in O(1) step.
  3. Graph algorithms: One of the algorithm for searching minimum cost paths in graphs is called Dijkstra’s algorithm. Dijkstra’s algorithm involves visiting lowest cost unvisited nodes, then updating a set of records of information about the lowest-cost path to reach each node. Total cost is tracked by which each the vertex can be reached and the previous vertex to get there. Dijkstra’s algorithm uses min heap to extract minimum cost node while transversing through the graph.
  4. K-way merge: The k-way merge consists of combining k unsorted lists into a single sorted list. This problem can be solved by heapifying individual lists and selecting the minimum or maximum element from top of all the lists. Generally binary or 2-way merges are solved by merge sort algorithm where heap can be effective to deal with k-way merge that can be scaled easily across number of lists to be merged together.
  5. Ordered statistics: Order statistics involves finding the k-th largest value given an input of n values. This information can be used track tail latency or quality of service. Finding the k-th order statistic when values are sorted is of O(1) complexity while using direct access, however it requires at least O(n log n) time to sort all of the elements in prior the lookup. Heap can be efficient way to avoid sorting of all elements. Insertion complexity can be further reduced by just tracking k largest or smallest elements.
  6. Compute median: Keeping track of median for a data stream can be achieved by maintaining two heaps, where a max-heap stores the smaller half of the input number and a min-heap stores the larger half of the input. Adding elements to both heap takes logarithmic order of time and gains direct access to the maximal and minimal elements in the groups. Elements can be inserted to keep both the heaps balanced thus maintaining median based on top elements from both heaps.

Heap value proposition

Heap provides quick access to largest or smallest element without explicitely sorting all elements. Heap is partially sorted and insertion/deletion of the elements can be performed with O(log n) complexity. A highest or lowest priority element can be obtained with O(1) complexity.

Heap terminology

  • Root: For min heap, root node holds smallest element. For max heap, root node holds largest element. Root node index is always 0 in a heap. Root position element is removed when heap size needs to be reduced and last node is moved at the position of root. After the last node is moved to the root position, it is trickled down to the correct position. Every heap includes only one root node.
  • Parent: Heap ordering property states that parent always precedes children. There is no precedence requirement between the children. The parent precedence must be an order relationship depending on type of heap. Parent node have at least one child and a node at the index of left or right child before or at the last node index.
  • Children: For max heap, child node must contain smaller value or lower priority than parent and vice versa for min heap. As stated above, within child nodes of the same parent there is no ordering requirement present within a heap.
  • Leaf: When a node do not have both left and right children then it is called as leaf node. Number of leaf nodes in a complete binary tree can be calculated by (n+1)/2 where n is the total number of nodes in the tree.
  • Last node: Within heap, node is always inserted at (last position + 1) index and then its trickled up at the correct position. Also fully sorted elements stored in a heap are stored beyond last index. This is because last node sometimes may not signify completion of heap data structure, especially during sorting process.

Heap data structure definition

Heap is a specialized form of complete binary tree where parent node is compared with children nodes and arranged accordingly the type of heap.

Heap types

  • Min Heap: value, priority or weight of the parent node is less than or equal to either of its children.
  • Max heap: value, priority or weight of the parent node is greater than or equal to either of its children.

Heap characteristics

  • Level order traversal: nodes accessed on each level similar to breadth first search except heap arrange nodes in array format and increments of node numbers follows level order access pattern.
  • Parent to children distance: parent node number can be computed based on following formula. Brackets around the formula denote floor and floor function takes real number as input and provides greatest integer less than or equal to its input value.
  • Children to parent distance: left and right child node numbers can be computed based on following formulas
  • Complete binary tree: Heaps are represented by complete binary tree where all levels must be completely filled except the lowest one. Node level order transversal incremented from left to right. Complete binary tree don’t need be a full binary tree.
  • Height or depth: height for a heap with n nodes can be computed based on following formula

Heap primary algorithmic operations

  • Trickle up: place element to the correct position within the heap by swapping the nodes upwards. Trickle up is performed during addition of the node at the last position. When a node needs to be added in the heap, it’s always gets added at the last node position.

Following code snip demonstrates the trickle up operation. If node position is root then trickle up operation is not required. Otherwise parent position is computed based on above formula. When position that is getting trickle up is larger than parent, then its swapped with parent node. Recursively trickle up is called for the parent position.

  • Trickle down: place element to the correct position within the heap by swapping the nodes down. Trickle down is performed during removal of the root node from heap. When a node needs to be removed from heap, it’s always the root node that gets removed.

Following code snip demonstrates the trickle down operation. First left and right child positions are computed, then position that needs to be trickled down is compared to its left and right child. Largest child is swapped at parent position. Trickle down is recursively called at its largest child position.

  • Heapify: conversion randomly ordered elements into heap. Heapify is efficient way to create heap rather than using trickle up or down process. Because trickle up or down needs to loop through all elements of the heap to ensure all elements are at correct position. Heapify loops backwards from last parent, computes child positions of the parent and if children exists then swaps parent with largest child. Then recursively calls heapify or trickle down at the largest position. So basically heapify and trickle down are similar except heapify gets called only at the parent nodes.

Heap user driven operation

  • Node insertion: New node is inserted at last position and then its trickled up to the correct position. Pre-decrement in following code is done to convert heap array length to the last position for 0 based index.
  • Node deletion: Node deletion is performed at root node, then last node is swapped to root position. Pre-decrement done to convert heap array length to the last position for 0 based index. Then replaced root node is trickled down to correct position.

Heap complexity

  1. Get min or max node: O(1)
  2. Node insertion: O(log n)
  3. Node deletion: O(log n)
  4. Sort: O(n log n + log n)
  5. Trickle up: O(log n)
  6. Trickle down: O(log n)
  7. Heapify: O(n)

Use cases with sample implementation

  • Sorting using max heap: Heapsort first converts random number array into a heap by looping through all parent nodes and executing heapify to trickle down child nodes at correct position. Once heap is generated then root node is swapped to last position and help length is decremented.

Heapsort complexity comparison with other sorting algorithms:

  • Task scheduler using heap: Following code demonstrates priority scheduler using heap data structure. Priority is hard coded to keep the example simple, however it can be computed as a function of virtual time the function consumes CPU time. When any task is ready to be inserted in priority queue, it gets inserted to the correct position within the heap. When next task can be scheduled, scheduler executes the task at root position. Last position task is moved to root and trickled down to correct position.
  • Compute median using heap: Median of the unsorted data stream can be computed using min and max heaps. To start with, input stream index 0 can be assigned to median as for the data length of 1, median is same as available data. If next incoming data is smaller than median and if max heap length is not larger than min heap by some limit then data is inserted in max heap. Similarly, if next incoming data is larger than median and if min heap length is not larger than max heap by some limit then data is inserted in max heap. Once insertion is complete at the last nodes within min or max heap, data is trickled up the correct position. After this, median value is compared to the root of both node and ensured that median is larger than max heap root and smaller than min heap root. If swapped, root node is trickled down to correct position. This process is repeated for every incoming data. Average value of the median can be computed in case of mix and max heap lengths are not same by averaging median and root of heap that have longer length.
  • Find shortest path using heap: Code implementation flow to find shortest path to all nodes from source node using Dijkstra’s Algorithm
  1. First all nodes are marked as unvisited.
  2. Define source node current minimum distance to 0 and all rest of the nodes to infinity and set the source node as current node.
  3. For current node, loop through all of its unvisited neighbors and compute distances by adding current distance to the weight of the edge that connects the current node to the neighboring node.
  4. Compare newly calculated distance and current distance assigned to neighboring node. If it is smaller, then set it as new current distance otherwise, keep the previous distance.
  5. When done looping through all of the unvisited neighbors of the current node, the mark the current node as visited.
  6. Then, select next unvisited node that is marked with the smallest distance, set it as the new current node, and continue the process.

References:

  1. Heap video lecture by Abdul Bari
  2. Data structure and algorithm video lecture series by Dr. Rob Edwards, San Diego State University
  3. Educative tutorials on data structure and algorithms

Curious to understand reality and even more curious to understand if there is such thing as reality