Graph:
Definitions:
- A nonlinear data structure consists of nodes connected by vertices.
- A graph can be undirected, directed, or weighted.
Graph is the generalization of linked list and tree.
The most common representation of graph in a coding interview is
- Matrix: Basically a list of lists storing all the nodes in a graph. The vertices are implicitly present between two nodes in the horizontal and vertical directions.
- Adjacency matrix: A matrix storing the information whether a vertex exists between two nodes. This is usually the case for directed graph, because the same pair is repeated in (row, col) then (col, row). This is less frequently encountered.
- Adjacency list: A node will store its value and a list of all the nodes that it can go to. Commonly encountered.
Similar to tree, graph solution relies heavily on depth-first search and breadth-first search, at least for matrix, though what qualifies as DFS and BFS for matrix is different from tree.
Problems
1. Number of Islands
Intuition
From an example that I have, I thought of maintaining a count of the island count
and a visited-or-not set visited
. The meat will be a nested for
loop (simplest way to search for the possible next land after finishing with an island). Of course, a helper function i.e. dfs
will be implemented with the purpose of adding visited nodes to the set.
Depends on the requirement, if the input array can be modified, the set can be aborted and I can modify the array in-place, which saves both time and space.
Algorithm
Helper function: 1. Base case: 1. Out of bound: row or column is less than 0, row or column passes the correponding length. 2. Position is water i.e. not '1'
. (3. Position already visited.) 2. Mark the position as visited. 3. Call the helper function on four adjacent tiles.
Then iterate through the tiles, call the helper function where appropriate and return the count.
Complexity
Time complexity: \(O(m \times n)\): This is the amortized complexity. The time complexity will be at worst in a map full of lands. The first call will prompt the algorithm to crawl over every other nodes. This operation’s cost will equal to the number of edges, which is less than \(O(4 \times m \times n)\). For the rest, it will be just \(O(1)\). There are \(m \times n\) tiles, so the cost spread out will be roughly \(4\) i.e. \(O(1)\) for each call. And each call will happen inside a nested for loop of \(m \times n\), hence the overall cost.
Space complexity: \(O(m \times n)\): The maximum size of the recursive call stack.
Code
In-place modification
class Solution:
def numIslands(self, grid: List[List[str]]) -> int:
= len(grid), len(grid[0])
ROWS, COLS = 0
count
def dfs(row, col):
if min(row, col) < 0 or row >= ROWS or \
>= COLS or grid[row][col] != '1':
col return
= 'X'
grid[row][col]
- 1, col)
dfs(row + 1, col)
dfs(row - 1)
dfs(row, col + 1)
dfs(row, col
for row in range(ROWS):
for col in range(COLS):
if grid[row][col] == '1':
dfs(row, col)+= 1
count
return count
With another set
class Solution:
def numIslands(self, grid: List[List[str]]) -> int:
= len(grid), len(grid[0])
ROWS, COLS = 0
count = set()
visited
def dfs(row, col):
if min(row, col) < 0 or row >= ROWS or col >= COLS or\
!= '1' or (row, col) in visited:
grid[row][col] return
visited.add((row, col))
- 1, col)
dfs(row + 1, col)
dfs(row - 1)
dfs(row, col + 1)
dfs(row, col
for row in range(ROWS):
for col in range(COLS):
if grid[row][col] == '1' and (row, col) not in visited:
dfs(row, col)+= 1
count
return count
2. Flood Fill
Intuition
This can be called “the easier Number of Islands”. Following the same approach, but this time the starting point for depth-first search is given. Afterwards, it is performing search in all 4 directions, replacing the satisfying tiles with the value given.
Algorithm
For the helper function: 1. Base case: out of bound, the tile does not have the original value, the tile has already been filled. 2. Recursive: Replace the tile, then call the function on all 4 directions.
Complexity
Time complexity: \(O(m \times n)\): The work at every pixel is \(O(1)\), and the algorithm may visit every pixel.
Space complexity: \(O(m \times n)\): The maximum size of the recursive call stack.
Note: No. of rows is \(m\), no. of columns is \(n\).
Code
class Solution:
def floodFill(self, image: List[List[int]], sr: int, sc: int, color: int) -> List[List[int]]:
= len(image), len(image[0])
ROWS, COLS = image[sr][sc]
original_color def dfs(row, col):
if min(row, col) < 0 or row >= ROWS or col >= COLS or \
!= original_color or image[row][col] == color:
image[row][col] return
= color
image[row][col]
- 1, col)
dfs(row + 1, col)
dfs(row - 1)
dfs(row, col + 1)
dfs(row, col
dfs(sr, sc)return image
3. Max Area of Island
Intuition
Another variation of Number of Islands. The least memory intensive way is to return the area of each island in the recursive function and modify the input matrix in-place. However, if the input matrix needs to be intact, a set should be used to check whether a node is visited or not.
Algorithm
For the helper function: 1. Base case: out of bound, the tile is water, the tile has already been filled. Return 0 2. Recursive: Replace the tile value, then call the function on all 4 directions. Return 1 + the value of each direction.
Complexity
Time complexity: \(O(m \times n)\): The work at each tile is \(O(1)\), and the algorithm will visit every tile.
Space complexity: \(O(m \times n)\): The maximum size of the recursive call stack. Also the size of the set to keep track of nodes.
Code
- Modify in-place
class Solution:
def maxAreaOfIsland(self, grid: List[List[int]]) -> int:
= len(grid), len(grid[0])
ROWS, COLS = 0
max_area
def searchArea(row, col):
if min(row, col) < 0 or row >= ROWS or \
>= COLS or grid[row][col] != 1:
col return 0
= 0
grid[row][col] return 1 + searchArea(row + 1, col) + searchArea(row - 1, col) + searchArea(row, col + 1) + searchArea(row, col - 1)
for row in range(ROWS):
for col in range(COLS):
if grid[row][col] == 1:
= searchArea(row, col)
area = max(max_area, area)
max_area return max_area
- Another set:
class Solution:
def maxAreaOfIsland(self, grid: List[List[int]]) -> int:
= len(grid), len(grid[0])
ROWS, COLS = 0
max_area = set()
visited
def searchArea(row, col):
if min(row, col) < 0 or row >= ROWS or col >= COLS \
or grid[row][col] != 1 or (row, col) in visited:
return 0
visited.add((row, col))return 1 + searchArea(row + 1, col) + searchArea(row - 1, col) + searchArea(row, col + 1) + searchArea(row, col - 1)
for row in range(ROWS):
for col in range(COLS):
if grid[row][col] == 1:
= searchArea(row, col)
area = max(max_area, area)
max_area return max_area
4. Rotting Oranges
Intuition
The introduction to breadth-first search in matrix. I still need to expand on all 4 directions. The difference here is I need to do so in the context of layer. Simply put, the \(n^{th}\) layer from the first node is all nodes that I can reach by walking \(n\) steps from the first node (counting the first node as the \(0^{th}\) layer). So instead of expand in each direction as far as possible, I expand all 4 nodes for each node in a layer.
Algorithm
- Initialize variables:
minute
to 0,rotten
queue for the oranges, andfresh
to count the fresh oranges. - Iterate the matrix, count the number of fresh oranges and append rotten oranges to the queue.
- While there is still fresh orange and the queue is not empty, iterate through one layer of the queue.
- For each node, expand to all 4 adjacent nodes.
- If the node is within bound and contains a fresh orange, turn it rotten, add to the queue, and decrement the number of fresh oranges.
- At the end, increment
minute
.
- Return per condition.
Complexity
Time complexity: \(O(m \times n)\): Iterate through all the nodes in matrix roughly twice.
Space complexity: \(O(m \times n)\): The maximum size of the queue.
Code
from collections import deque
from typing import List
class Solution:
def orangesRotting(self, grid: List[List[int]]) -> int:
= len(grid), len(grid[0])
ROWS, COLS = 0
fresh = deque()
rotten = 0
minute
for row in range(ROWS):
for col in range(COLS):
if grid[row][col] == 1:
+= 1
fresh if grid[row][col] == 2:
rotten.append((row, col))
= [[0, 1], [0, -1], [1, 0], [-1, 0]]
neighbors
while fresh > 0 and rotten:
for _ in range(len(rotten)):
= rotten.popleft()
row, col
for dr, dc in neighbors:
= row + dr, col + dc
new_row, new_col if (
0 <= new_row < ROWS
and 0 <= new_col < COLS
and grid[new_row][new_col] == 1
):= 2
grid[new_row][new_col]
rotten.append((new_row, new_col))-= 1
fresh
+= 1
minute return minute if fresh == 0 else -1
5. Shortest Path in Binary Matrix
Intuition
Another breadth-first search problem. This time, I can move in all 8 directions. However, the principle and boilerplate is still the same.
Algorithm
- Edge cases: The start and destination cannot be reached or the length of the matrix is 1.
- Initialize variables: the size of the matrix, the length of the shortest path, the set of visited nodes, the queue for breadth-first search, the possible directions at a node.
- For the helper function:
- Expand in 8 adjacent tiles for each nodes in one layer of the queue.
- If encounter the destination, return immediately.
- Else if the tile is within bound, accessible, and yet to visit, add it to the queue and mark it as visisted.
- Increment the length at the end.
- Return -1 as no path can be found.
Complexity
Time complexity: \(O(n^2)\) - for \(n\) the size of the matrix. At worst, the algorithm visits every node and it does \(O(1)\) work at each node.
Space complexity: \(O(n^2)\) - the size of the visited set in the worst case.
Code
from collections import deque
class Solution:
def shortestPathBinaryMatrix(self, grid: List[List[int]]) -> int:
if max(grid[0][0], grid[-1][-1]) == 1:
return -1
if len(grid) == 1 and grid[0][0] == 0:
return 1
= len(grid)
SIZE = 1
shortest = set()
visit = deque()
possible_steps
0, 0))
visit.add((0, 0))
possible_steps.append((
= [(0, 1), (0, -1), (1, 0), (-1, 0), (-1, -1), (1, 1), (-1, 1), (1, -1)]
directions
while possible_steps:
for _ in range(len(possible_steps)):
= possible_steps.popleft()
row, col
for dr, dc in directions:
= row + dr, col + dc
new_row, new_col if new_row == new_col == SIZE - 1:
return shortest + 1
if (0 <= new_row < SIZE and
0 <= new_col < SIZE and
== 0 and
grid[new_row][new_col] not in visit):
(new_row, new_col)
possible_steps.append((new_row, new_col))
visit.add((new_row, new_col))+= 1
shortest
return -1
6. Course Schedule
Intuition
As the number of courses I can take and the number of courses offered are the same, the only case when I cannot take the courses is when some form a cycle. You know it: to sign up for driver license, you need identity card and passport; to sign up for identity card you need driver license and passport; to sign up for passport, of course you need identity card and driver license…
The problem is thus reduced to the problem of detecting cycle within a directed graph. Kahn’s algorithm for topological sorting is the go to answer, unless I have not studied it (I haven’t). So let’s go to a more familiar solution: depth-first search.
A cycle can be detected when one of a node’s (A) neighbors (B) has a back edge connected to it (A). This means that A should already have been visited when the algorithm tries to expand to the neighbors of B. But just encountering a visited node is not complete to detect a cycle - that node must also still be in the recursive call stack (recall that the node is only popped from the recursive call stack once all of its neighbors have been processed). A set can be used to keep up with this.
Algorithm
- Translate the list of prerequisites into a dictionary mapping prerequisite → course. Initialize the set of visited node.
- Helper function to detect cycle:
- If the node is still in the stack, return
True
. Else if the node has been visited, returnFalse
. - Add the node to the set of in-stack nodes and visited nodes.
- Expand to its neighbors and return
True
if needed. - Remove the node from the in-stack set and return
False
.
- If the node is still in the stack, return
- Call the helper function on each of the course. Return
False
if a cycle is detected. - Return
True
at the end.
Complexity
- Time complexity: \(O(n + m)\) - with \(n\) the number of courses and \(m\) the length of
prerequisites
. - Space complexity: \(O(n + m)\) - the combined size of the visited set and the prerequisite dictionary.
Code
from collections import defaultdict
class Solution:
def canFinish(self, numCourses: int, prerequisites: list[list[int]]) -> bool:
= defaultdict(list)
coursePreDict
for course, pre in prerequisites:
coursePreDict[pre].append(course)
= set()
visited
for course in range(numCourses):
if self.checkCycle(course, coursePreDict, visited, set()):
return False
return True
def checkCycle(self, node, coursePreDict, visited, inStack):
if node in inStack:
return True
if node in visited:
return False
inStack.add(node)
visited.add(node)for course in coursePreDict[node]:
if self.checkCycle(course, coursePreDict, visited, inStack):
return True
inStack.remove(node)return False
7. Clone Graph
Intuition
This one is breadth-first search. Store the old node and corresponding node inside a dictionary. Perform breadth first search with the adjacency list to copy new node and add node to new adjacency list.
Algorithm
- Edge case: node is
None
. - Initialize the dictionary and the queue with the first node.
- While queue is not empty:
- Dequeue the node.
- Iterate through the list of neighbors:
- Copy the new node if not already and add that node to the queue.
- Append the neighbor to the list of neighbors of the copied node.
- Return the copied node of the input node.
Complexity
Time complexity: \(O(n + m)\) - with \(n\) the number of nodes and \(m\) the number of vertices
Space complexity: \(O(n)\) - the maxmum size of the queue.
Code
from lintcode import (
UndirectedGraphNode,
)
from collections import (
deque
)
"""
Definition for a UndirectedGraphNode:
class UndirectedGraphNode:
def __init__(self, label):
self.label = label
self.neighbors = []
"""
class Solution:
"""
@param node: A undirected graph node
@return: A undirected graph node
"""
def clone_graph(self, node: UndirectedGraphNode) -> UndirectedGraphNode:
# write your code here
if not node:
return
= {}
copied = UndirectedGraphNode(node.label)
copied[node.label] = deque()
queue
queue.append(node)
while queue:
for _ in range(len(queue)):
= queue.popleft()
temp
for neighbor in temp.neighbors:
if neighbor.label not in copied:
= UndirectedGraphNode(neighbor.label)
copied[neighbor.label]
queue.append(neighbor)
copied[temp.label].neighbors.append(copied[neighbor.label])
return copied[node.label]
8. Word Search
Intuition
Depth-first search in matrix. This is similar to other problems. There are interesting follow-up questions about search pruning. Basically, I can store the character frequencies for both board
and word
, and then return False
immediately when I a character does not appear enough in board
. This will consume more memory and run longer for small board
but reduce the need for running DFS at all for invalid cases, which can cauce a lot for large board
. This is also reflected in the runtime at the end.
Algorithm
- Prune: return
False
whereboard
has less characters thanword
.- Some character in
word
does not appear enough times inboard
.
- DFS nested function:
- Parameters:
row
,col
,current_char
inword
(pointer, not the value), and avisited
set for each search. - Base case:
current_char
pointer has moved pass the length ofword
. This returnsTrue
i.e., the word has been found.row
orcol
moves past the limits, the cell has already been visited, or the character does not match. This returnsFalse
.
- Add the current position to
visited
. - Perform DFS on 4 adjacent cells (change
row
orcol
, and incrementcurrent_cahr
). The result (found or not) is stored infound
. - Remove the position from
visited
. - Return
found
.
- Parameters:
- Call the helper function on every cell in the matrix. Once the word has been found, return
True
immediately. - If the loop finishes executing, return
False
. #### Complexity
Time complexity: \(O(4^{m \times n})\) with \(m\) rows, \(n\) columns. This is the number of helper function calls in the worst-case.
Memory complexity: \(O(k)\) for \(k\) the size of the recursion call stack.
Code
from collections import Counter
class Solution:
def exist(self, board: List[List[str]], word: str) -> bool:
= len(board), len(board[0])
NUM_ROWS, NUM_COLS if len(word) > NUM_ROWS * NUM_COLS:
return False
= Counter(sum(board, []))
count
for c, countWord in Counter(word).items():
if count[c] < countWord:
return False
def searchWord(row, col, current_char, visited):
if current_char >= len(word):
return True
if (not (-1 < row < NUM_ROWS and -1 < col < NUM_COLS)) or \
in visited or board[row][col] != word[current_char]:
(row, col) return False
visited.add((row, col))# print("Visited so far", visited)
# print("Current progress", current_char)
= (searchWord(row + 1, col, current_char + 1, visited) or
found - 1, col, current_char + 1, visited) or
searchWord(row + 1, current_char + 1, visited) or
searchWord(row, col - 1, current_char + 1, visited))
searchWord(row, col
visited.discard((row, col))return found
for row in range(NUM_ROWS):
for col in range(NUM_COLS):
if searchWord(row, col, 0, set()):
return True
return False
9. Minimum Height Trees
Intuition
This is an instance of topological sorting. I was livid to learn that this is a oft-tested algorithm but I have not learnt. The algorithm gives me the impression that I will go about removing the leaf nodes i.e., nodes with only 1 edge of the graph. This is logically done in this case layer-by-layer i.e., with BFS.
One insight to solve the problem is that at the end, there can be at most 2 nodes that satisfy the requirement.
Algorithm
- Construct the adjacency list from the inputs.
- Initialize the queue with nodes with only 1 edge.
- Perform BFS while there are more than 2 untrimmed nodes:
- An array
leaves
will contain the nodes to be trimmed at the level. Hence, the number of remaining leaves can be calculated by subtracting the length of this array. - Prepare a new array
new_leaves
to store the leaves at the next level. - Iterate
leaves
by popping from it:- Get the node sole neighbor and use that to remove the node itself from the adjacency list of the neighbor.
- If the neighbor becomes a leaf, add it to
new_leaves
.
- Reassign
new_leaves
toleaves
.
- An array
- At the end, return
leaves
.
Complexity
Time complexity: \(O(n)\) with \(n\) the number of nodes. Almost all nodes will be trimmed, which takes \(O(n)\). \(O(n)\) is also the cost to construct the adjacency list.
Memory complexity: \(O(n)\) is the size of the adjacency list and also the worst-case for the queue.
Code
from collections import defaultdict
class Solution:
def findMinHeightTrees(self, n: int, edges: List[List[int]]) -> List[int]:
# Edge case
if n < 3:
return list(range(n))
= defaultdict(set)
neighbors = defaultdict(list)
depthToNode
for node1, node2 in edges:
neighbors[node1].add(node2)
neighbors[node2].add(node1)
# Initialize the first layer of leaves
= []
leaves for i in range(n):
if len(neighbors[i]) == 1:
leaves.append(i)
# Trim the leaves until reaching the centroids
= n
remaining_nodes while remaining_nodes > 2:
-= len(leaves)
remaining_nodes = []
new_leaves
while leaves:
= leaves.pop()
leaf = neighbors[leaf].pop()
neighbor
neighbors[neighbor].discard(leaf)if len(neighbors[neighbor]) == 1:
new_leaves.append(neighbor)
= new_leaves
leaves
return leaves
10. Accounts Merge
Intuition
Another algorithm/data structure, less frequently featured in interviews but featured nonetheless - Union Find, which allows for quick querying and joining of disjoint sets. The key operations it supports are:
- Find: Determine which set a particular element is in.
- Union: Join two disjoint sets into a single set.
I should go through the Leetcode Explore for this section.
Algorithm
- Define the Union Find data structure, completed with union by rank and path compression.
Complexity
With \(n\) the number of accounts and \(k\) the maximum length of an account.
Time complexity: \(O(n \times k \times \log(n \times k))\) - there is a sorting step, and if all emails belong to one person, sorting will cost this much.
Memory complexity: \(O(n \times k)\) - the size of the intermediate
result
.
Code
from collections import defaultdict
class UnionFind:
def __init__(self, size):
self.root = list(range(size))
self.rank = [1] * size
# Union by rank
def union(self, x, y):
= self.find(x), self.find(y)
rootX, rootY if rootX != rootY:
if self.rank[rootX] > self.rank[rootY]:
self.root[rootY] = rootX
if self.rank[rootX] < self.rank[rootY]:
self.root[rootX] = rootY
else:
self.root[rootY] = rootX
self.rank[rootX] += 1
# Path compression
def find(self, child):
if child != self.root[child]:
self.root[child] = self.find(self.root[child])
return self.root[child]
class Solution:
def accountsMerge(self, accounts: List[List[str]]) -> List[List[str]]:
= UnionFind(len(accounts))
union_find
# Create connections betwene indices
= {}
email_to_index for i, (_, *emails) in enumerate(accounts):
for email in emails:
if email in email_to_index:
union_find.union(i, email_to_index[email])= i
email_to_index[email]
# Append emails to correct indices
= defaultdict(list)
result for email, index in email_to_index.items():
result[union_find.find(index)].append(email)
return [[accounts[i][0]] + sorted(emails) for i, emails in result.items()]