Graph

Author

Pham Nguyen Hung

Published

July 21, 2023

Graph:

Definitions:

  • A nonlinear data structure consists of nodes connected by vertices.
  • A graph can be undirected, directed, or weighted.

Graph is the generalization of linked list and tree.

The most common representation of graph in a coding interview is

  • Matrix: Basically a list of lists storing all the nodes in a graph. The vertices are implicitly present between two nodes in the horizontal and vertical directions.
  • Adjacency matrix: A matrix storing the information whether a vertex exists between two nodes. This is usually the case for directed graph, because the same pair is repeated in (row, col) then (col, row). This is less frequently encountered.
  • Adjacency list: A node will store its value and a list of all the nodes that it can go to. Commonly encountered.

Similar to tree, graph solution relies heavily on depth-first search and breadth-first search, at least for matrix, though what qualifies as DFS and BFS for matrix is different from tree.

Problems

1. Number of Islands

Intuition

From an example that I have, I thought of maintaining a count of the island count and a visited-or-not set visited. The meat will be a nested for loop (simplest way to search for the possible next land after finishing with an island). Of course, a helper function i.e. dfs will be implemented with the purpose of adding visited nodes to the set.

Depends on the requirement, if the input array can be modified, the set can be aborted and I can modify the array in-place, which saves both time and space.

Algorithm

Helper function: 1. Base case: 1. Out of bound: row or column is less than 0, row or column passes the correponding length. 2. Position is water i.e. not '1'. (3. Position already visited.) 2. Mark the position as visited. 3. Call the helper function on four adjacent tiles.

Then iterate through the tiles, call the helper function where appropriate and return the count.

Complexity

  • Time complexity: \(O(m \times n)\): This is the amortized complexity. The time complexity will be at worst in a map full of lands. The first call will prompt the algorithm to crawl over every other nodes. This operation’s cost will equal to the number of edges, which is less than \(O(4 \times m \times n)\). For the rest, it will be just \(O(1)\). There are \(m \times n\) tiles, so the cost spread out will be roughly \(4\) i.e. \(O(1)\) for each call. And each call will happen inside a nested for loop of \(m \times n\), hence the overall cost.

  • Space complexity: \(O(m \times n)\): The maximum size of the recursive call stack.

Code

In-place modification

class Solution:
    def numIslands(self, grid: List[List[str]]) -> int:
        ROWS, COLS = len(grid), len(grid[0])
        count = 0
        
        def dfs(row, col):
            if min(row, col) < 0 or row >= ROWS or \
                col >= COLS or grid[row][col] != '1':
                return
            
            grid[row][col] = 'X'
            
            dfs(row - 1, col)
            dfs(row + 1, col)
            dfs(row, col - 1)
            dfs(row, col + 1)

        for row in range(ROWS):
            for col in range(COLS):
                if grid[row][col] == '1':
                    dfs(row, col)
                    count += 1

        return count

With another set

class Solution:
    def numIslands(self, grid: List[List[str]]) -> int:
        ROWS, COLS = len(grid), len(grid[0])
        count = 0
        visited = set()
        
        def dfs(row, col):
            if min(row, col) < 0 or row >= ROWS or col >= COLS or\
                grid[row][col] != '1' or (row, col) in visited:
                return
            
            visited.add((row, col))
            
            dfs(row - 1, col)
            dfs(row + 1, col)
            dfs(row, col - 1)
            dfs(row, col + 1)

        for row in range(ROWS):
            for col in range(COLS):
                if grid[row][col] == '1' and (row, col) not in visited:
                    dfs(row, col)
                    count += 1

        return count

2. Flood Fill

Intuition

This can be called “the easier Number of Islands”. Following the same approach, but this time the starting point for depth-first search is given. Afterwards, it is performing search in all 4 directions, replacing the satisfying tiles with the value given.

Algorithm

For the helper function: 1. Base case: out of bound, the tile does not have the original value, the tile has already been filled. 2. Recursive: Replace the tile, then call the function on all 4 directions.

Complexity

  • Time complexity: \(O(m \times n)\): The work at every pixel is \(O(1)\), and the algorithm may visit every pixel.

  • Space complexity: \(O(m \times n)\): The maximum size of the recursive call stack.

Note: No. of rows is \(m\), no. of columns is \(n\).

Code

class Solution:

    def floodFill(self, image: List[List[int]], sr: int, sc: int, color: int) -> List[List[int]]:
        ROWS, COLS = len(image), len(image[0])
        original_color = image[sr][sc]
        def dfs(row, col):
            if min(row, col) < 0 or row >= ROWS or col >= COLS or \
                image[row][col] != original_color or image[row][col] == color:
                return
            
            image[row][col] = color

            dfs(row - 1, col)
            dfs(row + 1, col)
            dfs(row, col - 1)
            dfs(row, col + 1)
        
        dfs(sr, sc)
        return image

3. Max Area of Island

Intuition

Another variation of Number of Islands. The least memory intensive way is to return the area of each island in the recursive function and modify the input matrix in-place. However, if the input matrix needs to be intact, a set should be used to check whether a node is visited or not.

Algorithm

For the helper function: 1. Base case: out of bound, the tile is water, the tile has already been filled. Return 0 2. Recursive: Replace the tile value, then call the function on all 4 directions. Return 1 + the value of each direction.

Complexity

  • Time complexity: \(O(m \times n)\): The work at each tile is \(O(1)\), and the algorithm will visit every tile.

  • Space complexity: \(O(m \times n)\): The maximum size of the recursive call stack. Also the size of the set to keep track of nodes.

Code

  • Modify in-place
class Solution:
    def maxAreaOfIsland(self, grid: List[List[int]]) -> int:
        ROWS, COLS = len(grid), len(grid[0])
        max_area = 0

        def searchArea(row, col):
            if min(row, col) < 0 or row >= ROWS or \
                col >= COLS or grid[row][col] != 1:
                return 0
            grid[row][col] = 0
            return 1 + searchArea(row + 1, col) + searchArea(row - 1, col) + searchArea(row, col + 1) + searchArea(row, col - 1)
        
        for row in range(ROWS):
            for col in range(COLS):
                if grid[row][col] == 1:
                    area = searchArea(row, col)
                    max_area = max(max_area, area)
        return max_area
  • Another set:
class Solution:
    def maxAreaOfIsland(self, grid: List[List[int]]) -> int:
        ROWS, COLS = len(grid), len(grid[0])
        max_area = 0
        visited = set()

        def searchArea(row, col):
            if min(row, col) < 0 or row >= ROWS or col >= COLS \
            or grid[row][col] != 1 or (row, col) in visited:
                return 0
            visited.add((row, col))
            return 1 + searchArea(row + 1, col) + searchArea(row - 1, col) + searchArea(row, col + 1) + searchArea(row, col - 1)
        
        for row in range(ROWS):
            for col in range(COLS):
                if grid[row][col] == 1:
                    area = searchArea(row, col)
                    max_area = max(max_area, area)
        return max_area

4. Rotting Oranges

Intuition

The introduction to breadth-first search in matrix. I still need to expand on all 4 directions. The difference here is I need to do so in the context of layer. Simply put, the \(n^{th}\) layer from the first node is all nodes that I can reach by walking \(n\) steps from the first node (counting the first node as the \(0^{th}\) layer). So instead of expand in each direction as far as possible, I expand all 4 nodes for each node in a layer.

Algorithm

  1. Initialize variables: minute to 0, rotten queue for the oranges, and fresh to count the fresh oranges.
  2. Iterate the matrix, count the number of fresh oranges and append rotten oranges to the queue.
  3. While there is still fresh orange and the queue is not empty, iterate through one layer of the queue.
    1. For each node, expand to all 4 adjacent nodes.
    2. If the node is within bound and contains a fresh orange, turn it rotten, add to the queue, and decrement the number of fresh oranges.
    3. At the end, increment minute.
  4. Return per condition.

Complexity

  • Time complexity: \(O(m \times n)\): Iterate through all the nodes in matrix roughly twice.

  • Space complexity: \(O(m \times n)\): The maximum size of the queue.

Code

from collections import deque
from typing import List

class Solution:
    def orangesRotting(self, grid: List[List[int]]) -> int:
        ROWS, COLS = len(grid), len(grid[0])
        fresh = 0
        rotten = deque()
        minute = 0
        
        for row in range(ROWS):
            for col in range(COLS):
                if grid[row][col] == 1:
                    fresh += 1
                if grid[row][col] == 2:
                    rotten.append((row, col))
        
        neighbors = [[0, 1], [0, -1], [1, 0], [-1, 0]]

        while fresh > 0 and rotten:
            for _ in range(len(rotten)):
                row, col = rotten.popleft()

                for dr, dc in neighbors:
                    new_row, new_col = row + dr, col + dc
                    if (
                        0 <= new_row < ROWS
                        and 0 <= new_col < COLS
                        and grid[new_row][new_col] == 1
                    ):
                        grid[new_row][new_col] = 2
                        rotten.append((new_row, new_col))
                        fresh -= 1
            
            minute += 1
        return minute if fresh == 0 else -1

5. Shortest Path in Binary Matrix

Intuition

Another breadth-first search problem. This time, I can move in all 8 directions. However, the principle and boilerplate is still the same.

Algorithm

  1. Edge cases: The start and destination cannot be reached or the length of the matrix is 1.
  2. Initialize variables: the size of the matrix, the length of the shortest path, the set of visited nodes, the queue for breadth-first search, the possible directions at a node.
  3. For the helper function:
    1. Expand in 8 adjacent tiles for each nodes in one layer of the queue.
    2. If encounter the destination, return immediately.
    3. Else if the tile is within bound, accessible, and yet to visit, add it to the queue and mark it as visisted.
    4. Increment the length at the end.
  4. Return -1 as no path can be found.

Complexity

  • Time complexity: \(O(n^2)\) - for \(n\) the size of the matrix. At worst, the algorithm visits every node and it does \(O(1)\) work at each node.

  • Space complexity: \(O(n^2)\) - the size of the visited set in the worst case.

Code

from collections import deque

class Solution:
    def shortestPathBinaryMatrix(self, grid: List[List[int]]) -> int:
        if max(grid[0][0], grid[-1][-1]) == 1:
            return -1
        if len(grid) == 1 and grid[0][0] == 0:
            return 1

        SIZE = len(grid)
        shortest = 1
        visit = set()
        possible_steps = deque()
        
        visit.add((0, 0))
        possible_steps.append((0, 0))

        directions = [(0, 1), (0, -1), (1, 0), (-1, 0), (-1, -1), (1, 1), (-1, 1), (1, -1)]

        while possible_steps:
            for _ in range(len(possible_steps)):
                row, col = possible_steps.popleft()
                
                for dr, dc in directions:
                    new_row, new_col = row + dr, col + dc
                    if new_row == new_col == SIZE - 1:
                        return shortest + 1
                    if (0 <= new_row < SIZE and
                        0 <= new_col < SIZE and
                        grid[new_row][new_col] == 0 and
                        (new_row, new_col) not in visit):
                        possible_steps.append((new_row, new_col))
                        visit.add((new_row, new_col))
            shortest += 1
        
        return -1

6. Course Schedule

Intuition

As the number of courses I can take and the number of courses offered are the same, the only case when I cannot take the courses is when some form a cycle. You know it: to sign up for driver license, you need identity card and passport; to sign up for identity card you need driver license and passport; to sign up for passport, of course you need identity card and driver license…

The problem is thus reduced to the problem of detecting cycle within a directed graph. Kahn’s algorithm for topological sorting is the go to answer, unless I have not studied it (I haven’t). So let’s go to a more familiar solution: depth-first search.

A cycle can be detected when one of a node’s (A) neighbors (B) has a back edge connected to it (A). This means that A should already have been visited when the algorithm tries to expand to the neighbors of B. But just encountering a visited node is not complete to detect a cycle - that node must also still be in the recursive call stack (recall that the node is only popped from the recursive call stack once all of its neighbors have been processed). A set can be used to keep up with this.

Algorithm

  1. Translate the list of prerequisites into a dictionary mapping prerequisite → course. Initialize the set of visited node.
  2. Helper function to detect cycle:
    1. If the node is still in the stack, return True. Else if the node has been visited, return False.
    2. Add the node to the set of in-stack nodes and visited nodes.
    3. Expand to its neighbors and return True if needed.
    4. Remove the node from the in-stack set and return False.
  3. Call the helper function on each of the course. Return False if a cycle is detected.
  4. Return True at the end.

Complexity

  • Time complexity: \(O(n + m)\) - with \(n\) the number of courses and \(m\) the length of prerequisites.
  • Space complexity: \(O(n + m)\) - the combined size of the visited set and the prerequisite dictionary.

Code

from collections import defaultdict

class Solution:
    def canFinish(self, numCourses: int, prerequisites: list[list[int]]) -> bool:
        coursePreDict = defaultdict(list)

        for course, pre in prerequisites:
            coursePreDict[pre].append(course)
        
        visited = set()

        for course in range(numCourses):
            if self.checkCycle(course, coursePreDict, visited, set()):
                return False
        
        return True
    
    def checkCycle(self, node, coursePreDict, visited, inStack):
        if node in inStack:
            return True
        if node in visited:
            return False
        
        inStack.add(node)
        visited.add(node)
        for course in coursePreDict[node]:
            if self.checkCycle(course, coursePreDict, visited, inStack):
                return True
        
        inStack.remove(node)
        return False

7. Clone Graph

Intuition

This one is breadth-first search. Store the old node and corresponding node inside a dictionary. Perform breadth first search with the adjacency list to copy new node and add node to new adjacency list.

Algorithm

  1. Edge case: node is None.
  2. Initialize the dictionary and the queue with the first node.
  3. While queue is not empty:
    1. Dequeue the node.
    2. Iterate through the list of neighbors:
      1. Copy the new node if not already and add that node to the queue.
      2. Append the neighbor to the list of neighbors of the copied node.
  4. Return the copied node of the input node.

Complexity

  • Time complexity: \(O(n + m)\) - with \(n\) the number of nodes and \(m\) the number of vertices

  • Space complexity: \(O(n)\) - the maxmum size of the queue.

Code

from lintcode import (
    UndirectedGraphNode,
)

from collections import (
    deque
)

"""
Definition for a UndirectedGraphNode:
class UndirectedGraphNode:
    def __init__(self, label):
        self.label = label
        self.neighbors = []
"""

class Solution:
    """
    @param node: A undirected graph node
    @return: A undirected graph node
    """
    def clone_graph(self, node: UndirectedGraphNode) -> UndirectedGraphNode:
        # write your code here
        if not node:
            return
        copied = {}
        copied[node.label] = UndirectedGraphNode(node.label)
        queue = deque()
        queue.append(node)

        while queue:
            for _ in range(len(queue)):
                temp = queue.popleft()

                for neighbor in temp.neighbors:
                    if neighbor.label not in copied:
                        copied[neighbor.label] = UndirectedGraphNode(neighbor.label)
                        queue.append(neighbor)
                    copied[temp.label].neighbors.append(copied[neighbor.label])
        
        return copied[node.label]

9. Minimum Height Trees

Intuition

This is an instance of topological sorting. I was livid to learn that this is a oft-tested algorithm but I have not learnt. The algorithm gives me the impression that I will go about removing the leaf nodes i.e., nodes with only 1 edge of the graph. This is logically done in this case layer-by-layer i.e., with BFS.

One insight to solve the problem is that at the end, there can be at most 2 nodes that satisfy the requirement.

Algorithm

  1. Construct the adjacency list from the inputs.
  2. Initialize the queue with nodes with only 1 edge.
  3. Perform BFS while there are more than 2 untrimmed nodes:
    1. An array leaves will contain the nodes to be trimmed at the level. Hence, the number of remaining leaves can be calculated by subtracting the length of this array.
    2. Prepare a new array new_leaves to store the leaves at the next level.
    3. Iterate leaves by popping from it:
      1. Get the node sole neighbor and use that to remove the node itself from the adjacency list of the neighbor.
      2. If the neighbor becomes a leaf, add it to new_leaves.
    4. Reassign new_leaves to leaves.
  4. At the end, return leaves.

Complexity

  • Time complexity: \(O(n)\) with \(n\) the number of nodes. Almost all nodes will be trimmed, which takes \(O(n)\). \(O(n)\) is also the cost to construct the adjacency list.

  • Memory complexity: \(O(n)\) is the size of the adjacency list and also the worst-case for the queue.

Code

from collections import defaultdict

class Solution:
    def findMinHeightTrees(self, n: int, edges: List[List[int]]) -> List[int]:
        # Edge case
        if n < 3:
            return list(range(n))

        neighbors = defaultdict(set)
        depthToNode = defaultdict(list)

        for node1, node2 in edges:
            neighbors[node1].add(node2)
            neighbors[node2].add(node1)

        # Initialize the first layer of leaves
        leaves = []
        for i in range(n):
            if len(neighbors[i]) == 1:
                leaves.append(i)
        
        # Trim the leaves until reaching the centroids
        remaining_nodes = n
        while remaining_nodes > 2:
            remaining_nodes -= len(leaves)
            new_leaves = []

            while leaves:
                leaf = leaves.pop()
                neighbor = neighbors[leaf].pop()
                neighbors[neighbor].discard(leaf)
                if len(neighbors[neighbor]) == 1:
                    new_leaves.append(neighbor)
            
            leaves = new_leaves
        
        return leaves

10. Accounts Merge

Intuition

Another algorithm/data structure, less frequently featured in interviews but featured nonetheless - Union Find, which allows for quick querying and joining of disjoint sets. The key operations it supports are:

  • Find: Determine which set a particular element is in.
  • Union: Join two disjoint sets into a single set.

I should go through the Leetcode Explore for this section.

Algorithm

  1. Define the Union Find data structure, completed with union by rank and path compression.

Complexity

With \(n\) the number of accounts and \(k\) the maximum length of an account.

  • Time complexity: \(O(n \times k \times \log(n \times k))\) - there is a sorting step, and if all emails belong to one person, sorting will cost this much.

  • Memory complexity: \(O(n \times k)\) - the size of the intermediate result.

Code

from collections import defaultdict

class UnionFind:
    def __init__(self, size):
        self.root = list(range(size))
        self.rank = [1] * size
    
    # Union by rank
    def union(self, x, y):
        rootX, rootY = self.find(x), self.find(y)
        if rootX != rootY:
            if self.rank[rootX] > self.rank[rootY]:
                self.root[rootY] = rootX
            if self.rank[rootX] < self.rank[rootY]:
                self.root[rootX] = rootY
            else:
                self.root[rootY] = rootX
                self.rank[rootX] += 1
    
    # Path compression
    def find(self, child):
        if child != self.root[child]:
            self.root[child] = self.find(self.root[child])
        return self.root[child]

class Solution:
    def accountsMerge(self, accounts: List[List[str]]) -> List[List[str]]:
        union_find = UnionFind(len(accounts))

        # Create connections betwene indices
        email_to_index = {}
        for i, (_, *emails) in enumerate(accounts):
            for email in emails:
                if email in email_to_index:
                    union_find.union(i, email_to_index[email])
                email_to_index[email] = i
        
        # Append emails to correct indices
        result = defaultdict(list)
        for email, index in email_to_index.items():
            result[union_find.find(index)].append(email)
        
        return [[accounts[i][0]] + sorted(emails) for i, emails in result.items()]