Algorithms included: Prim's Algorithm, Kruskal Algorithm, Path Compression, Huffman coding, Set Cover Problem
A minimum spanning tree is a spanning tree in which the sum of the weight of the edges is as minimum as possible. If we have a linked undirected graph with a weight (or cost) combine with each edge. Then the cost of spanning tree would be the sum of the cost of its edges. Prim's algorithm is a minimum spanning tree algorithm that takes a graph as input and finds the subset of the edges of that graph which form a tree that includes every vertex and has the minimum sum of weights among all the trees that can be formed from the graph. We start from one vertex and keep adding edges with the lowest weight until we reach our goal. Algorithm of the minimum spanning tree is as follows:
INF = 9999999
# number of vertices in graph
V = 5
# create a 2d array of size 5x5
# for adjacency matrix to represent graph
G = [[0, 9, 75, 0, 0],
[9, 0, 95, 19, 42],
[75, 95, 0, 51, 66],
[0, 19, 51, 0, 31],
[0, 42, 66, 31, 0]]
# create a array to track selected vertex
# selected will become true otherwise false
selected = [0, 0, 0, 0, 0]
# set number of edge to 0
no_edge = 0
# the number of egde in minimum spanning tree will be
# always less than(V - 1), where V is number of vertices in
# graph
# choose 0th vertex and make it true
selected[0] = True
# print for edge and weight
print("Edge : Weight\n")
while (no_edge < V - 1):
# For every vertex in the set S, find the all adjacent vertices
#, calculate the distance from the vertex selected at step 1.
# if the vertex is already in the set S, discard it otherwise
# choose another vertex nearest to selected vertex at step 1.
minimum = INF
x = 0
y = 0
for i in range(V):
if selected[i]:
for j in range(V):
if ((not selected[j]) and G[i][j]):
# not in selected and there is an edge
if minimum > G[i][j]:
minimum = G[i][j]
x = i
y = j
print(str(x) + "-" + str(y) + ":" + str(G[x][y]))
selected[y] = True
no_edge += 1
Time Complexity of Prim's algo is O(V^2). If the input graph is represented using adjacency list, then the time complexity of Prim’s algorithm can be reduced to O(E log V) with the help of binary heap.
Kruskal's algorithm is another popular minimum spanning tree algorithm that uses a different logic to find the MST of a graph. Instead of starting from a vertex, Kruskal's algorithm sorts all the edges from low weight to high and keeps adding the lowest edges, ignoring those edges that create a cycle. We start from the edges with the lowest weight and keep adding edges until we reach our goal. Algorithm of the minimum spanning tree is as follows:
Any minimum spanning tree algorithm revolves around checking if adding an edge creates a loop or not. The most common way to find this out is an algorithm called Union FInd. The Union-Find algorithm divides the vertices into clusters and allows us to check if two vertices belong to the same cluster or not and hence decide whether adding an edge creates a cycle.
class Graph:
def __init__(self, vertices):
self.V = vertices
self.graph = []
def add_edge(self, u, v, w):
self.graph.append([u, v, w])
# Search function
def find(self, parent, i):
if parent[i] == i:
return i
return self.find(parent, parent[i])
def apply_union(self, parent, rank, x, y):
xroot = self.find(parent, x)
yroot = self.find(parent, y)
if rank[xroot] < rank[yroot]:
parent[xroot] = yroot
elif rank[xroot] > rank[yroot]:
parent[yroot] = xroot
else:
parent[yroot] = xroot
rank[xroot] += 1
# Applying Kruskal algorithm
def kruskal_algo(self):
result = []
i, e = 0, 0
self.graph = sorted(self.graph, key=lambda item: item[2])
parent = []
rank = []
for node in range(self.V):
parent.append(node)
rank.append(0)
while e < self.V - 1:
u, v, w = self.graph[i]
i = i + 1
x = self.find(parent, u)
y = self.find(parent, v)
if x != y:
e = e + 1
result.append([u, v, w])
self.apply_union(parent, rank, x, y)
for u, v, weight in result:
print("%d - %d: %d" % (u, v, weight))
The overall time complexity of Kruskal's Algorithm is O(ElogE) or O(ElogV).
Path compression` is a way of flattening the structure of the tree whenever Find is used on it. Since each element visited on the way to a root is part of the same set, all of these visited elements can be reattached directly to the root. The resulting tree is much flatter, speeding up future operations not only on these elements, but also on those referencing them.
function Find(x)
if x.parent != x
x.parent := Find(x.parent)
return x.parent
Huffman Coding is a technique of compressing data to reduce its size without losing any of the details. It is generally useful to compress the data in which there are frequently occurring characters. Huffman coding first creates a tree using the frequencies of the character and then generates code for each character. Once the data is encoded, it has to be decoded. Decoding is done using the same tree. Algorithm for Huffman coding is as follows:
string = 'BCAADDDCCACACAC'
class NodeTree(object):
def __init__(self, left=None, right=None):
self.left = left
self.right = right
def children(self):
return (self.left, self.right)
def nodes(self):
return (self.left, self.right)
def __str__(self):
return '%s_%s' % (self.left, self.right)
# Main function implementing huffman coding
def huffman_code_tree(node, left=True, binString=''):
if type(node) is str:
return {node: binString}
(l, r) = node.children()
d = dict()
d.update(huffman_code_tree(l, True, binString + '0'))
d.update(huffman_code_tree(r, False, binString + '1'))
return d
# Calculating frequency
freq = {}
for c in string:
if c in freq:
freq[c] += 1
else:
freq[c] = 1
freq = sorted(freq.items(), key=lambda x: x[1], reverse=True)
nodes = freq
while len(nodes) > 1:
(key1, c1) = nodes[-1]
(key2, c2) = nodes[-2]
nodes = nodes[:-2]
node = NodeTree(key1, key2)
nodes.append((node, c1 + c2))
nodes = sorted(nodes, key=lambda x: x[1], reverse=True)
huffmanCode = huffman_code_tree(nodes[0][0])
print(' Char | Huffman code ')
print('----------------------')
for (char, frequency) in freq:
print(' %-4r |%12s' % (char, huffmanCode[char]))
The time complexity of the Huffman algorithm is O(nlogn). Using a heap to store the weight of each tree, each iteration requires O(logn) time to determine the cheapest weight and insert the new weight.
In the set cover problem we have the input as a set of elements B where sets S1,....,Sm ⊆ B. The ouput that we have to get is a selection of the Si whose union is B and the cost should be the number of sets picked. Here, we took an example where we had a set of letters as follows: {arid, dash, drain, heard, lost, nose, shun, slate, snare, thread, lid, roast} and we had to cover a set B which included {a,d,e,h,i,l,n,o,r,s,t,u}. The first approach we discussed was to pick the element in set Si with the largest number of uncovered elements and repeat this step until all elements of B are covered. So, First we pick thread which covers {t,h,r,e,a,d,s}. So, We are left with {i,l,n,o,s,u}. Next we picked lost which covers {l,o,s}. So, We are now left with {i,n,u}. Next we pick drain which covers {i,n}. Now we are only left with {u}. So, At last we will pick shun which covers the last remaining letter {u}. So, Now as we have covered all the letters, Is this approach optimum? We thought of another approach of picking the elements where we observed that for {u} we will have to pick shun. So, The uncovered elements are {a,d,e,i,l,o,r,t}. So, We need atleast three more picks since {e} and {i} aren't together and if we pick arid/drain/lead for covering {i} still we require two more picks to cover all elements.
The complexity of Set Cover Problem is mn where m is the size of the universe and n is the number of sets in the collection.