Array & String

Author

Pham Nguyen Hung

Published

July 21, 2023

Array:

Definition:

The easiest, linear data structure, consisting of elements stored in contiguous memory slots. Elements are access by index, usually a number. The dictionary, or HashMap is a related data structure, consisting of key-value pairs that will be used in dealing array question.

Problem:

1. Two Sum:

The basic of basic problem, the first problem everybody encounters when he/she starts LeetCoding.

Intuition

Traversing the list seems like the most logical thing to do. The brute-force way is to check every pair of numbers. The time complexity is O(n^2). To optimize this, it is better to store the information I have already encountered when I traverse the list. I need to use a data structure that supports fast searching - HashMap. Either way works, but the HashMap can be used for storing the index of the element as well. If I store an element as the key and the index as the value, I can quickly search and return the indices.

Algorithm

The algorithm can be described as:

  1. Traverse the array from the beginning.
  2. Store the element encountered in a set/dictionary.
  3. Check if the sum - element is in the dictionary.
  4. Return the indices.

Complexity

  • Time complexity: \(O(n)\): Traversing the whole list once.

  • Space complexity: \(O(n)\): In the worst case, I will have a dictionary with a size equal to the array.

Code

from collections import defaultdict
class Solution:
    def twoSum(self, nums: List[int], target: int) -> List[int]:
        numsDict = defaultdict(int)
        for ind, num in enumerate(nums):
            if target-num in numsDict:
                return [ind, numsDict[target-num]]
            numsDict[num] = ind

2. Contains Duplicate

Intuition

Again, traverse the array, storing information about what I have encountered. This time, I only need to store the element itself, so a HashSet suffices.

Algorithm

  1. Traverse the array.
  2. Check if the current element is in the HashSet. If it is, return True for duplicate.
  3. If I reach the end of the array, return False for no duplicate.

Complexity

  • Time complexity: \(O(n)\): Traversing the whole list once.

  • Space complexity: \(O(n)\): In the worst case, I will have a set with a size equal to the array.

Code

class Solution:
    def containsDuplicate(self, nums: List[int]) -> bool:
        numSet = set()
        for num in nums:
            if num in numSet:
                return True
            numSet.add(num)
        return False

3. Valid Anagram:

Intuition

Again, traverse the array, storing information about what I have encountered. This time, I only need to store the element itself, so a HashSet suffices.

Algorithm

  1. Traverse the array.
  2. Check if the current element is in the HashSet. If it is, return True for duplicate.
  3. If I reach the end of the array, return False for no duplicate.

Complexity

  • Time complexity: \(O(n)\): Traversing the whole list once.

  • Space complexity: \(O(n)\): In the worst case, I will have a set with a size equal to the array.

Code

from collections import defaultdict
class Solution:
    def isAnagram(self, s: str, t: str) -> bool:
        if len(s) != len(t):
            return False
        hashS, hashT = defaultdict(int), defaultdict(int)
        for index in range(len(s)):
            hashS[s[index]] += 1
            hashT[t[index]] += 1
        return hashS == hashT

4. Majority Element

Intuition

There are many approaches to the problem. The simplest way except brute-force the whole thing is to use a HashMap to count the number of occurences of each element (in Python can be done faster with collections.Counter). The solution provided is to deal with the follow-up question of linear time with \(O(1)\) memory.

Algorithm

This is a named algorithm: Boyer-Moore Voting Algorithm. The procedure is this: while I traverse the array, if I choose the first element as a candidate and consider three things:

  1. An encounter with the same element a vote for the candidate
  2. An encounter with a difference element a vote against the candidate
  3. Change the candidate to the current element if it is different than the candidate and the vote for the candidate has become 0

then at the end, I should be left with the candidate.

This is because the majority element will occur more than n//2 times with n the length of the array. Because of this, the majority element will be the only possible candidate with a positive vote at the end.

Complexity

  • Time complexity: \(O(n)\): I need to traverse the array once.
  • Space complexity: \(O(1)\): I only need to keep track of the count and the candidate, which requires constant memory.

Code

class Solution:
    def majorityElement(self, nums):
        count = 0
        candidate = None

        for num in nums:
            if count == 0:
                candidate = num
            count += (1 if num == candidate else -1)

        return candidate

5. Group Anagrams

Intuition

From Valid Anagram, I know that one word is an anagram of another if character frequencies are the same. I can check everything in one go with a dictionary. The key information is lowercase letter only. This means that character frequency can be capture in a tuple of 26 number, each one the number of character a, b, etc. in the word. I can append each word to the correct key in the dictionary. Afterwards, I simply return the values of the dictionary.

Algorithm

  1. Initialize a dictionary, better with Python defaultdict.
  2. Iterate the list of words. For each word, initialize an array of 26 0s and increment the correct count for each character. ord() is useful in this case.
  3. Append the word to the value array of the correct key.
  4. Return the values of the dictionary.

Complexity

  • Time complexity: \(O(nk)\): I need to traverse the array once. \(k\) is the maximum length of a string in the array.
  • Space complexity: \(O(nk)\): The information stored in the dictionary.

Code

from collections import defaultdict
class Solution:
    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        countDict = defaultdict(list)
        for s in strs:
            count = [0] * 26
            for c in s:
                count[ord(c) - ord('a')] += 1
            countDict[tuple(count)].append(s)
        return countDict.values()

6. Top K Frequent Elements

Intuition

To satisfy the \(O(nlogn)\) time constraint, computer science students will think of Quickselect. If you know how to implement Quickselect, it’s fine.

For someone who doesn’t, the approach is to use a dictionary the frequency as the key, and the value an array of elements with that key. To return, I can just iterate the dictionary from the maximum value to the minimum, append the value to an array and return the k elements. #### Algorithm

  1. Take frequency count of the elements in the array, better with Python Counter
  2. Initialize a dictionary, better with Python defaultdict.
  3. Iterate the array. Append each element to the value of the correct key.
  4. Iterate the dictionary from the maximum value down for the top-k elements and return it.

Complexity

  • Time complexity: \(O(n)\): I need to traverse the array once and the dictionary once. Worst-case, the dictionary has the same size as the array.
  • Space complexity: \(O(n)\): Worst-case, I will need to store triple the memory of the original array.

Code

from collections import defaultdict, Counter
class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        count = Counter(nums)
        ans = defaultdict(list)
        for num in count:
            ans[count[num]].append(num)
        top = max(ans)
        res = []
        for i in range(top,-1,-1):
            res.extend(ans[i])
        return res[:k]

Optimized solution from NeetCode

class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        # Bucket Sort
        countDict = {}
        for i in nums:
            countDict[i] = countDict.get(i, 0) + 1
        freqList = [[] for i in range(len(nums))]
        for num, count in countDict.items():
            freqList[count-1].append(num)
        res = []
        for i in range(len(freqList) - 1, 0, -1):
            for n in freqList[i]:
                res.append(n)
                if len(res) == k:
                    return res

7. Encode and Decode Strings

Intuition

There are many approaches to this problem. My approach is the simplest: I will use an usual combination of character to insert between strings in the list. To minimize memory use, I choose a 3 combo: '*-*'. I reckon that I can get the work done with '*-', but '*-* is safer and cuter.

A more seasoned programmer will give a more complex approach. He will use ':' to insert between, and inserting '::' when there is a ':'. This is a general method, which takes care of the case you need to impress the guys by showing that you can think generally, given that you may need to learn another language at the company. But in Python, simpler is better - it is just an one-liner with fast, optimized method for both functions. But of course, it is not so impressive, making you look so dependent on Python.

NeetCode has a middle approach - Use the length and '#' to insert before each character. This is much better.

Algorithm

One-liner: Use built-in method .join() and .split() to do the heavy work.

General:

  • Encode the string by concatenating the length + '#' + the string into a big one and return.

  • Decode the string by creating a for loop and deal with phrases consequentially.

    1. Extract the character length of the next phrase from the string.
    2. Use slicing to get the phrase and append to the result list.
    3. Move on to the next position.
    4. Return the result.

Complexity

  • Time complexity: \(O(n)\): I need to traverse the array and string once for each.
  • Space complexity: \(O(1)\): I will need to store some pointers for both functions.

Code

One-liner

class Solution:
    """
    @param: strs: a list of strings
    @return: encodes a list of strings to a single string.
    """
    def encode(self, strs):
        # write your code here
        return '*-*'.join(strs)

    """
    @param: str: A string
    @return: dcodes a single string to a list of strings
    """
    def decode(self, str):
        # write your code here
        return str.split('*-*')

Optimized NeetCode

class Solution:
    """
    @param: strs: a list of strings
    @return: encodes a list of strings to a single string.
    """

    def encode(self, strs):
        res = ""
        for s in strs:
            res += str(len(s)) + "#" + s
        return res

    """
    @param: s: A string
    @return: decodes a single string to a list of strings
    """

    def decode(self, s):
        res, i = [], 0

        while i < len(s):
            j = i
            while s[j] != "#":
                j += 1
            length = int(s[i:j])
            res.append(s[j + 1 : j + 1 + length])
            i = j + 1 + length
        return res

8. Product of Array Except Self

Intuiton

The trick is to use “prefix” and “postfix” arrays, respectively storing the product of all elements coming before and after the current values. These arrays have the properties of being able to be calculated iteratively in a for loop from the calculated value before. The answer is simply the element-wise product of the two arrays.

To get to the \(O(1)\) memory solution, it is about converting one array into just a pointer to cache the product.

Algorithm

  1. Traverse the array to calculate the prefix (postfix array).
  2. Traverse the array again to update the postfix product and also the result.
  3. Return the result.

Complexity

  • Time complexity: \(O(n)\): I need to traverse the original array twice.
  • Space complexity: \(O(1)\): I will need to store the postfix product while calculating.

Code

class Solution:
    def productExceptSelf(self, nums: List[int]) -> List[int]:
        # To achieve O(1) space, will need to perform the multiplication 
        # directly on the prefix array
        # postfix array needs to become just a pointer to the cached product 
        lenNum = len(nums)
        prefix, postfix = [1] * lenNum, 1
        for i in range(1, lenNum):
            prefix[i] = prefix[i-1]*nums[i-1]
        for i in range(lenNum-1, -1, -1):
            prefix[i] *= postfix
            postfix *= nums[i]
        return prefix

9. Valid Sudoku

Intuiton

The nested for loop to check every element is inevitable. The trick is to realize that the box can be arranged and accessed with modulo 3.

Algorithm

  1. Initialized rows, cols, boxes to keep track of the respective factor. The most elegant way to do this in Python is with collections.defaultdict(set)
  2. Traverse the board with a nested for loop and update the respective factor.
  3. Return False whenever a condition is violated, or True if the for loop executes successfully.

Complexity

  • Time complexity: \(O(n^2)\): I need to traverse the original array with a nested for loop. Of course, if I am dealing with a standard Sudoku every time then it is \(O(1)\).
  • Space complexity: \(O(n)\): I will need to store 3 dictionaries with worse case the same size as the board.

Code

from collections import defaultdict

class Solution:
    def isValidSudoku(self, board: List[List[str]]) -> bool:
        rows, cols, boxes = defaultdict(set), defaultdict(set), defaultdict(set)
        for row in range(9):
            for col in range(9):
                cur = board[row][col]
                if cur != '.':
                    if (cur in rows[row] or
                        cur in cols[col] or
                        cur in boxes[(row//3, col//3)]):
                        return False
                    rows[row].add(cur)
                    cols[col].add(cur)
                    boxes[(row//3, col//3)].add(cur)
        return True

10. Longest Consecutive Sequence

Intuiton

I need to perform a lot of searches and there is the possibility of duplicates. Hence, it is better if I convert the array to a set. I can also create safeguard by NOT counting the number in the sequence if the one preceding it is in the set.

Algorithm

  1. Convert the whole array into a set.
  2. Traverse the set. For each element, count and update the longest consecutive sequence if the one preceding it is not in the array.
  3. Return the longest length.

Complexity

  • Time complexity: \(O(n)\): I need to traverse the original array once
  • Space complexity: \(O(n)\): I will need to store a set with the same size as the array.

Code

class Solution:
    def longestConsecutive(self, nums):
        longest_streak = 0
        num_set = set(nums)

        for num in num_set:
            if num - 1 not in num_set:
                current_num = num
                current_streak = 1

                while current_num + 1 in num_set:
                    current_num += 1
                    current_streak += 1

                longest_streak = max(longest_streak, current_streak)

        return longest_streak

11. Sort Colors

Intuiton

First simple solution: I need the number of elements of each color. I can do that easily with a dictionary. Even better, because the number of colors i.e., the values of each element in the array is fixed, the memory cost is \(O(1)\). After constructing the dictionary, I can reassign values to the original array one by one. This solution is good enough for an interview.

In the follow-up question, the solution needs to modify the values in-place in just one pass instead of two, and memory complexity should be \(O(1)\). This can be done with a three-pointer approach a.k.a rainbow sort a.k.a Dutch national flag algorithm. The idea is having 3 pointers: low and mid initialized at the start, and high initialized at the end. mid will be used to traverse the array. The objective is keeping all 0s before low and all 2s after high. Hence, at each iteration, values at mid and low will be switched if mid is pointing to a 0, and values at mid and high will be switched if mid is pointing to a 2. The thing that caught me off guard was the pointers only need decrementing/incrementing after switching the actual values. I confused this with switching the index.

Algorithm

  1. Initialize 3 pointers.
  2. Traverse the array.
    1. If mid points to 1, increment mid.
    2. If mid points to 0, switch values of mid and low. Increment both mid and low.
    3. If mid points to 2, switch values of mid and high. Decrement only high.

Complexity

  • Time complexity: \(O(n)\): I need to traverse the original array once
  • Space complexity: \(O(1)\):

Code

Acceptable solution for an interview:

from collections import defaultdict

class Solution:
    def sortColors(self, nums: List[int]) -> None:
        """
        Do not return anything, modify nums in-place instead.
        """
        color_counter = defaultdict(int)
        for num in nums:
            color_counter[num] += 1
        ind = 0
        for key in sorted(color_counter):
            for _ in range(color_counter[key]):
                nums[ind] = key
                ind += 1

One-pass solution:

class Solution:
    def sortColors(self, nums: List[int]) -> None:
        """
        Do not return anything, modify nums in-place instead.
        """
        low, mid, high = 0, 0, len(nums) - 1
        while mid <= high:
            if nums[mid] == 0:
                nums[low], nums[mid] = nums[mid], nums[low]
                low += 1
                mid += 1
            elif nums[mid] == 2:
                nums[high], nums[mid] = nums[mid], nums[high]
                high -= 1
            else:
                mid += 1

12. Longest Palindrome

Intuition

A palindrome can have one character that has an odd number of occurrences while all other characters must have an even number of occurrences. This means that I will need to count the frequency of each character, add the largest even number that is smaller or equal that frequency, and add 1 if there exists odd frequency in the phrase.

Algorithm

  1. Count the number of each character in the string, with collections.Counter or manually with a dictionary.
  2. Iterate through the frequencies:
    1. If the count is odd, add count - 1 to the maximum length.
    2. If the count is even, add it to the maximum length.
  3. Return maximum length, plus 1 if there exists odd count in the dictionary.

Complexity

  • Time complexity: \(O(n)\) - Iterate through the array.

  • Space complexity: \(O(k)\) - with \(k\) the number of distinct charactesr in the string.

Code

from collections import Counter

class Solution:
    def longestPalindrome(self, s: str) -> int:
        charCount = Counter(s)
        maxLength, hasOdd = 0, False
        for count in charCount.values():
            if count % 2:
                maxLength += (count - 1)
                hasOdd = True
            else:
                maxLength += count
        return maxLength + 1 if hasOdd else maxLength

13. String to Integer (atoi)

Intuition

An interesting problem. For some reason NeetCode does not do this? Anyway, back to it. After some tinkering, I discovered some rules for the inputs:

Rules Solutions
Containing leading and trailing spaces Use .strip() to, well, strip them
The first character after needs to be a sign or a number Check and return 0 immediately for None input and input with inappropriate first character (use ord() to check)
The sign is '+' by default but can be changed Have a variable to keep the sign. I used 1 and -1 and multiply with the later to create the sign
Two signs cannot appear in one string Have a variable to check if the sign has already been set. If so, on encountering next sign, return 0 immediately
A period ends reading immediately So be it in the reading loop
Containing leading zeros Check for this after the reading loop

Complexity

  • Time complexity: \(O(n)\): Iterate through the input string.

  • Space complexity: \(O(n)\): I use variables that are essentially integers. However, in stripping the string at the beginning, I create a new copy of the string, possibly with the same length.

Code

class Solution:
    def myAtoi(self, s: str) -> int:
        s = s.strip()
        
        if not s:
            return 0
        if not 42 < ord(s[0]) < 58:
            return 0
        
        start = end = 0
        sign, signSet = 1, False
        
        while start < len(s):
            if signSet and not 47 < ord(s[start]) < 58:
                return 0

            if s[start] == '-':
                sign = -1
                signSet = True
            
            elif s[start] == '+':
                signSet = True
            
            elif s[start] == '.':
                break
            
            elif 47 < ord(s[start]) < 58:
                end = start + 1
                while end < len(s):
                    if 47 < ord(s[end]) < 58:
                        end += 1
                    else:
                        break
                break
            start += 1
        
        while start < end and s[start] == '0':
            start += 1

        result = 0 if start >= end else sign * eval(s[start:end])

        return max(-2**31, result) * (result < 0) + min(result, 2**31 - 1) * (result > 0)

14. Longest Palindromic Substring

Intuition

I could not solve the problem in time.

First, the brute-force solution of checking every substring. Looping over all substrings take \(O(n^2)\) time, and checking for palindrome of each one takes up to \(O(n)\) time. The final solution is \(O(n^3)\). However, you can 1 optimization: instead of iterating from small to large substrings, you can iterate from large to small. This way, you can immediately return at the first substring you find, and start with the checking that likely takes up the least iteration to terminate if wrong. Implement it with best coding practice and you can still pass the interview.

For a step up from this, I can move from considering the bounds to considering the centers. Each letter is a palindrome on its own. A longer palindrome can be constructed if the two letters surrounding it are the same, and the next pair, next pair, so on for the odd-length palindrome case. For the even-length palindrome, the algorithm will expand from a pair of letters as a center. A general recursive helper function can be defined to use in both cases. Some math is required to manipulate the pointers, but that part is okay.

Dynamic programming can be used, but it will be included in the correct post.

Algorithm

Complexity

  • Time complexity: \(O(n^3)\) to \(O(n^2)\)

  • Space complexity: \(O(1)\) - both ways only manipulate pointers.

Code

Naive

class Solution:
    def longestPalindrome(self, string: str) -> str:
        def checkPalindrome(i, j):
            left = i
            right = j - 1
            
            while left < right:
                if string[left] != string[right]:
                    return False
                
                left += 1
                right -= 1
            
            return True
        
        for length in range(len(string), 0, -1):
            for start in range(len(string) - length + 1):
                if checkPalindrome(start, start + length):
                    return string[start:start + length]

        return ""

Optimized

class Solution:
    def longestPalindrome(self, s: str) -> str:
        def expand(i, j):
            left = i
            right = j
            while left >= 0 and right < len(s) and s[left] == s[right]:
                left -= 1
                right += 1
            
            return right - left - 1
        
        ans = [0, 0]

        for i in range(len(s)):
            odd_length = expand(i,i)
            if odd_length > ans[1] - ans[0] + 1:
                dist = odd_length // 2
                ans = [i - dist, i + dist]
            
            even_length = expand(i, i + 1)
            if even_length > ans[1] - ans[0] + 1:
                dist = (even_length // 2) - 1
                ans = [i - dist, i + dist + 1]
        
        i, j = ans
        return s[i:j+1]

15. Find All Anagrams in a String

Intuition

Hashing is your friend when you need to check the similarity between 2 strings, whether it is exact match or character-frequency match. This is again true for this problem.

The question asks for a list of first indices of substrings that are anagrams i.e., have the same character frequency as another string. This is asking for sliding window character count.

Algorithm

  1. Check that the length of s is larger than p.
  2. Get the dictionary with character count for p. This can be done succintly with Counter or the same way as below.
  3. Initialize the first window character count of s and check if it is an anagram of p.
  4. Iterate through all other windows of string p, check for anagram, and add to the result list if necessary.

Complexity

  • Time complexity: \(O(n + m)\) - with \(n\) the length of s and \(m\) the length of p

  • Space complexity: \(O(m)\) - the size of each character count dictionary.

Code

from collections import Counter, defaultdict
from typing import List

class Solution:
    def findAnagrams(self, s: str, p: str) -> List[int]:
        if len(s) < len(p):
            return []

        p_char_count, p_length = Counter(p), len(p)
        window_char_count, s_length = defaultdict(int), len(s)
        result = []

        for ind in range(p_length):
            window_char_count[s[ind]] += 1

        if window_char_count == p_char_count:
            result.append(0)

        for ind in range(s_length - p_length):

            if window_char_count[s[ind]] > 1:
                window_char_count[s[ind]] -= 1
            else:
                window_char_count.pop(s[ind])
            
            window_char_count[s[ind + p_length]] += 1
            if window_char_count == p_char_count:
                result.append(ind + 1)
        
        return result

16. Spiral Matrix

Intuition

This is just moving around the matrix with 4 pointers.

Algorithm

Complexity

  • Time complexity: \(O(m \times n)\) - visit every cell.

  • Memory complexity: \(O(1)\)

Code

class Solution:
    def spiralOrder(self, matrix: List[List[int]]) -> List[int]:
        result = []
        left, right = 0, len(matrix[0])
        top, bottom = 0, len(matrix)

        while left < right and top < bottom:
            # left -> right at current top row
            for i in range(left, right):
                result.append(matrix[top][i])
            top += 1

            # top -> bottom at current rightmost column
            for i in range(top, bottom):
                result.append(matrix[i][right - 1])
            right -= 1

            if not (left < right and top < bottom):
                break

            # right -> left at current bottom row
            for i in range(right - 1, left - 1, -1):
                result.append(matrix[bottom - 1][i])
            bottom -= 1

            # bottom -> top at current leftmost column
            for i in range(bottom - 1, top - 1, -1):
                result.append(matrix[i][left])
            left += 1

        return result