Sliding window

Author

Pham Nguyen Hung

Published

July 21, 2023

Sliding window

Definitions

Sliding window at its core is processing the data in an array in chunk, with limits set by 2 pointers at both sides. Sliding window is particularly suitable if I need to performf repeated operations on a sub-array. Instead of going crazy over all of the possible chunks, I can just change the result (e.g. sum of elements in the window) by discarding the leaving and accounting for the entering element(s).

Problems

1. Best Time to Buy and Sell Stock

Intuition

The first problem in sliding window. The brute-force way of doing this is to check every single pair fo number for the largest right-left pair. The time complexity will be \(O(n^2)\). How do I modify this solution to be easier?

First, the brute-force algorithm involves keeping track of the maximum number so far. This is a good thing that I can keep.

Second, let’s assume this array of [2, ..., 1]. Let’s say that the maximum of the subarray between 2 and 1 is 6: [2, ...6..., 1] (it does not matter where the 6 is as long as it occurs before 1). The max profit I can get so far is \(6-2=4\). To increase the profit, there are two ways: shift the buy date to a day with smaller price, or shift the sell date to a day with a larger price. Let’s say there is a 7 after 1. I can increase the profit if I shift the sell date from 6 to 7: [2, ...6..., 1, ...7...]. However, the best profit is when I shift the buy date from 2 to 1 as well. Now, because I already keep track of the max profit so far, so it cannot be worse if I shift the buy date every single time I find a better buy date.

Algorithm

The algorithm: 1. Initialize the left and right sides of the window + the max profit so far. 2. Traverse the array with the right side. 3. If I encounter a better buy date (right < left), update the buy date. If I encounter a potential sell date (right >= left), check and update the profit so far. 4. Return the max profit at the end. #### Complexity - Time complexity: \(O(n)\): I am traversing the array once. - Space complexity: \(O(1)\): I am only keeping track of a bunch of integers. #### Code

class Solution:
    def maxProfit(self, prices: List[int]) -> int:
        buyDay = maxProfit = 0
        for sellDay in range(len(prices)):
            if prices[sellDay] < prices[buyDay]:
                buyDay = sellDay
            else:
                currentProfit = prices[sellDay] - prices[buyDay]
                if currentProfit > maxProfit:
                    maxProfit = currentProfit
        return maxProfit

2. Maximum Subarray:

Intuition

This is one problem in the class of “if you know, you know”, though of course you can discover for yourself from keen observation.

The named algorithm is Kadane’s algorithm. I did not know the exact formal definition for it, but its pattern is that of sliding window (though the class will be dynamic programming). The intuition here is that if I keep track of the sum of a subarray, one that sum reaches a negative number, the whole subarray can be discarded from consideration, because whatever the value of the next element, the subarray sum will always be worse if I include the negative-sum subarray. Incredible intuition, though I could not see this in the first time I did it.

Algorithm

  1. Traverse the array, keeping track of the current sum curSum and the maximum sum so far maxSum of the array.
  2. At each element, if curSum becomes negative, I reset it to 0 first. Then I add the element to curSum. If curSum becomes larger than maxSum, I update maxSum.
  3. If I reach the end of the array, return maxSum.

Complexity

  • Time complexity: \(O(n)\): Traversing the whole array once.

  • Space complexity: \(O(1)\): I just need to keep track of a bunch of integers.

Code

class Solution:
    def maxSubArray(self, nums: List[int]) -> int:
        curSum, maxSum = 0, float('-inf')
        for num in nums:
            if curSum < 0:
                curSum = 0
            curSum += num
            if curSum > maxSum:
                maxSum = curSum
        return maxSum

3. Repeated DNA Sequences

Intuition

The problem requires us to check for repeated 10-character substrings in a string. An immediate solution is checking every pair of substrings and append the repeated ones to a list and return. It takes \(O(n^2)\) time in this case (whether I check every pair, or discard the previous pairs of substrings that I have checked).

To do better, I ca think of hashing, which was invented to reduce search time from \(O(n)\) to \(O(1)\). I can hash every substring of length 10, and store them in a hash map. I return every substrings that that appear more than 1. This way, I can reduce the time complexity to \(O(n)\). I can optimize this further by dropping the hash map and use one hash set to store the substrings and one hash set to store the repeated values.

In this case, I use Python built-in data structure. I can implement the hashing function ourselves. The most suitable one for this problem is the polynomial rolling hash algorithm. You can read more about it here.

Algorithm

  1. Initialize two hash sets: one to store the substrings, one to store the repeated substrings.
  2. Traverse the string, and at each index, check if the substring of length 10 starting from that index is in the hash set of substrings. If it is, add it to the hash set of repeated substrings. If it is not, add it to the hash set of substrings.
  3. Return the list of repeated substrings.

Complexity

  • Time complexity: \(O(n)\): I am traversing the string once.

  • Space complexity: \(O(n)\): I am storing the substrings in the hash set.

Code

class Solution:
    def findRepeatedDnaSequences(self, s: str) -> List[str]:
        store, result = set(), set()
        for i in range(len(s)-9):

            segment = s[i:i+10]

            if segment in store:
                result.add(segment)
            else:
                store.add(segment)
        return list(result)

Rolling hashes version:

from typing import List

class Solution:
    def findRepeatedDnaSequences(self, s: str) -> List[str]:
        # Initialize variables
        base = 4
        window_size = 10
        mapping = {'A': 1, 'C': 2, 'G': 3, 'T': 4}
        highest_value = pow(base, window_size-1)
        store, result = set(), set()
        hashing = 0

        # Convert the string into an array of number:
        numbers = [mapping[nuc] for nuc in s]
        
        for start in range(len(s) - window_size + 1):
            if start != 0:
                hashing = (hashing - numbers[start - 1] * highest_value) * base \
                    + numbers[start + window_size - 1]
            else:
                for end in range(window_size):
                    hashing = hashing * base + numbers[end]
            if hashing in store:
                result.add(s[start:start + window_size])
            store.add(hashing)
        return result

4. Sliding Window Maximum

Intuition

The problem is explicitly sliding window. The brute-force solution is calculating the maximum of each window. This takes \(O(nk)\) time, where \(n\) is the length of the array and \(k\) is the size of the window. I can do better by using a data structure that can keep track of the maximum of each window in \(O(1)\) time. This data structure is called deque (double-ended queue). I can use a deque to store the indices of the elements in the window in the descending order. At each step, I check if the first element of the deque is still in the window. If it is not, I remove it. Then I check if the current element is larger than the last element of the deque. If it is, I remove the last element of the deque. Then I append the current element to the deque. The first element of the deque is the index of the maximum element in the window. I append it to the result array.

Algorithm

  1. Initialize a deque and a result array.
  2. Traverse the array. At each step:
    1. Check if the first element of the deque is still in the window. If it is not, remove it.
    2. Check if the current element is larger than the last element of the deque. If it is, remove the last element of the deque.
    3. Append the current element to the deque.
    4. Append the first element of the deque to the result array.
  3. Return the result array.

Complexity

  • Time complexity: \(O(n)\): There are 4 cases for the input array:
  1. The array is strictly increasing: At the first step, I will pop from the initialized deque \(k\) times before appending. In subsequent steps, I will pop once and append once. This takes \(O(n)\) time in total.
  2. The array is monotonically decreasing: At each step, I will need to pop once and append once. This takes \(O(n)\) time in total.
  3. The array is mixed: Intuitively, I can think that since the two cases above the runtime is \(O(n)\), the runtime for this mixed case is also \(O(n)\). However, I did not factor in the cost of switching. The worst case is the next value increases after a monotonically decreasing window, or decreases after a strictly increasing window. In either case, I will need to pop \(k\) times before appending, plus filling up the queue \(k\) times before. This can happen at most \(\frac{n}{k}\) times, leading to a runtime of \(O((k+k)\frac{n}{k})\) or \(O(n)\).
  • Space complexity: \(O(k)\): I am storing the indices of the elements in the window in the deque.

Code

from collections import deque

class Solution:
    def maxSlidingWindow(self, nums: List[int], k: int) -> List[int]:
        if len(nums) == k:
            return [max(nums)]

        window = deque()
        result = []
        
        for i in range(k):
            self._clean(i, window, nums)
            window.append(i)
        result.append(nums[window[0]])

        for i in range(k, len(nums)):
            self._clean(i, window, nums)
            if window and window[0] <= (i-k):
                window.popleft()
            window.append(i)
            result.append(nums[window[0]])
        
        return result
    def _clean(self, i, window, nums):
        while window and nums[i] >= nums[window[-1]]:
            window.pop()

NeetCode version: He used two explicit pointers to iterate the array. The runtime is the same.

# Quoted verbatim from NeetCode
from collections import deque

class Solution:
    def maxSlidingWindow(self, nums: List[int], k: int) -> List[int]:
        output = []
        q = collections.deque()  # index
        l = r = 0
        # O(n) O(n)
        while r < len(nums):
            # pop smaller values from q
            while q and nums[q[-1]] < nums[r]:
                q.pop()
            q.append(r)

            # remove left val from window
            if l > q[0]:
                q.popleft()

            if (r + 1) >= k:
                output.append(nums[q[0]])
                l += 1
            r += 1

        return output

5. Minimum Window Subsequence

Intuition

The problem asked us to find the shortest or earliest-occurred substring of a longer string s where a shorter string t appears as a subsequence (substring is contiguous, subsequence does not have to). The idea is I iterate s until I find the last character of t as the right pane. Here, I go back and iterate in reverse the window slide and t to find the real left pane. If the length of this substring is smaller than the current minimum substring, I update it.

Algorithm

  1. Initialize: one pointer for each string, one variable to keep track of the minimum length, one variable to keep track of the subsequence
  2. Iterate string s until I find the last character of string t. Upon finding, I iterate backward string s and compare until I find the first character of string t. This is a subsequence.
  3. I check for the length of the current subsequent and update if necessary (because I iterate from left to right, I do not need to care about the earliest-occurred condition).
  4. At the end, return the minimum subsequence

Complexity

  • Time complexity: \(O(n \times m)\): With n and m the length of each string. This is due to the nested loop.
  • Space complexity: \(O(1)\): I only need to store pointers.

Code

Lintcode version - because LeetCode’s is premium

class Solution:
    """
    @param s: a string
    @param t: a string
    @return: the minimum substring of S
    """
    def min_window(self, s: str, t: str) -> str:
        # Write your code here
        minimum_subsequence = ''
        min_sub_len = float('inf')
        index_s = index_t = 0
        while index_s < len(s):
            if s[index_s] == t[index_t]:
                index_t += 1
                if index_t == len(t):
                    start = end = index_s
                    index_t -= 1
                    while index_t >= 0:
                        if s[start] == t[index_t]:
                            index_t -= 1
                        start -= 1
                    start += 1
                    if end - start < min_sub_len:
                        min_sub_len = end - start
                        minimum_subsequence = s[start:end + 1]
                    index_s = start
                    index_t = 0
            index_s += 1
        return minimum_subsequence

6. Minimum Window Substring

Intuition

There are questions meant to be solved in an order. This is one of them. The problem resembles the one above. The idea is the same: slide a window until the second string is contained inside that window, and then iterate backward to find the minimum subarray. Here, the condition is all characters present and unordered, so a dictionary or (collections.Counter) is perfect.

However, it is inefficient to check the two dictionaries for every increment. Instead, it is better if I check just the character I make change, and have another variable storing the number of chracters with requirement met in the window.

I can optimize a bit with the filtering trick from LeetCode where I filter s for relevant characters only.

Algorithm

  1. Initialize: a character count dictionary for t, a filtered list of s, two pointers, a character count dictionary for a window, two variables to check for satisfaction between the window and string t, and a tuple to store the return variable
  2. Iterate the filter list by incrementing the right pointer. At each character, I update the window dictionary accordingly.
  3. Once condition is met, I repeatedly update answer and shrink the window until condition is no longer met.
  4. Return the answer.

Complexity

  • Time complexity: \(O(m+n)\): For \(m\) and \(n\) the length of each string.

  • Space complexity: \(O(n)\): Both dictionaries store the character count in string t.

Code

from collections import Counter

class Solution:
    def minWindow(self, s: str, t: str) -> str:
        if len(t) > len(s):
            return ''
        
        t_count = Counter(t)
        required = len(t_count)
        filtered_s = [(index, char) for index, char in enumerate(s) if char in t_count]

        window_count = {}
        formed = 0
        left = right = 0
        ans = float('inf'), None, None

        while right < len(filtered_s):
            char = filtered_s[right][1]
            window_count[char] = window_count.get(char, 0) + 1

            if window_count[char] == t_count[char]:
                formed += 1

            while left <= right and formed == required:
                char = filtered_s[left][1]

                end = filtered_s[right][0]
                start = filtered_s[left][0]
                if end - start + 1 < ans[0]:
                    ans = (end - start + 1, start, end)
                
                window_count[char] -= 1
                if window_count[char] < t_count[char]:
                    formed -= 1
                left += 1
            
            right += 1
        return '' if ans[0] == float('inf') else s[ans[1]: ans[2] + 1]

7. Longest Substring Without Repeating Characters

Intuition

“Repeating” suggests that this problem has something to do with Hash Map/Hash Table. Couple that with sliding window, I can think of sliding the pointers from left to right, storing the characters found into a Hash Table. When a repeated character is found at the right pointer, shift the left pointer to right until past the repeated character, discarding elements from the Hash Table in the process.

Approach

  1. Initialize two pointers, character Hash Table, and longest variable.
  2. Iterate the array with the right pointer, adding character to the Hash Table if it is not inside. Upon encountering a repeated character, update longest and shift the left pointer while discarding elements from the Hash Table.
  3. Update longest and return.

Complexity

  • Time complexity: \(O(n)\): Exactly is \(O(2n)\), as each pointer iterates the string at most once. LeetCode suggests a better runtime solution with Hash Map, which helps by being able to shift the left pointer all the way to the right. The key is storing the the index of the next element for every value as the key.

  • Space complexity: \(O(\min (m,n))\): For \(m\) the length of the alphabet and \(n\) the length of the string. This determines the size of the Hash Table.

Code

class Solution:
    def lengthOfLongestSubstring(self, s: str) -> int:
        if len(s) < 2:
            return len(s)
        longest = 0
        char_set = set()
        left = 0
        for right in range(len(s)):
            if s[right] not in char_set:
                char_set.add(s[right])
            else:
                longest = max(longest, right - left)
                while s[left] != s[right]:
                    char_set.discard(s[left])
                    left += 1
                left += 1
        
        return max(longest, right - left)

LeetCode:

class Solution:
    def lengthOfLongestSubstring(self, s: str) -> int:
        longest = 0
        # mp stores the current index of a character
        mp = {}

        i = 0
        # try to extend the range [i, j]
        for j in range(len(s)):
            if s[j] in mp:
                i = max(mp[s[j]], i)

            ans = max(ans, j - i + 1)
            mp[s[j]] = j + 1

        return ans

8. Minimum Size Subarray Sum

Intuition

Another sliding window problem. I can slide the window (iterate the array) and calculate the its sum total until total >= target and then record the length if it is the smallest so far. I can then slide the window forward, and repeat the steps.

There are two ways to slide the window. I need one pointer to increment with the array, while the other being pushed or dragged along. Both implementations are below, but they are fundamentally the same.

Approach

  1. Initialize two pointers, a variable to store window sum, a variable to store the minimum length.
  2. Iterate the array. At each step, extend the window if necessary. If total >= target, update the minimum if necessary. Subtract the element from the intermediate sum as I slide the window.
  3. Return answer accordingly.

Complexity

  • Time complexity: \(O(n)\): Each pointer iterates the array at most once.

  • Space complexity: \(O(1)\): I only store 3 numbers.

Code

class Solution:
    def minSubArrayLen(self, target: int, nums: List[int]) -> int:
        total = right = 0
        ans = float('inf')
        for left in range(len(nums)):
            while right < len(nums) and total < target:
                total += nums[right]
                right += 1
            if total < target:
                break
            ans = min(ans, right - left)
            total -= nums[left]
        return ans if ans != float('inf') else 0

LeetCode:

class Solution:
    def minSubArrayLen(self, target: int, nums: List[int]) -> int:
        left = 0
        sumOfCurrentWindow = 0
        res = float('inf')

        for right in range(0, len(nums)):
            sumOfCurrentWindow += nums[right]

            while sumOfCurrentWindow >= target:
                res = min(res, right - left + 1)
                sumOfCurrentWindow -= nums[left]
                left += 1

        return res if res != float('inf') else 0