Binary search

Definitions:

In a sorted array, there exists a faster way to search for a specific element than visiting each element. The intuition is if I find an element that is smaller than expected, all of the elements before it are also smaller and can be discarded from the search and likewise if it is larger. The optimal way to take advantage of this property is striking in the middle every time, hence the name binary search.

Four levels of binary search

1. Binary search on totally sorted array

Binary Search and Firs Bad Version fit the issue. In this level, I just need to make sure that there will be no infinite loop and I am good. The problem can more complex, but Search a 2D Matrix is the limit.

2. Binary search on partially sorted array

This is the (OOXX) pattern. The most straightforward way to achieve such an array is rotation of a sorted array i.e., [1, 2, 3, 4, 5, 6, 7] → [5, 6, 7, 1, 2, 3, 4]. Here, the first 3 are Os, the last 4 are Xs. Examples are Search in Rotated Sorted Array, Find Minimum in Rotated Sorted Array. Here, the key is paying attention to the update condition.

3. Binary search on unsorted array

Distilled to its core, binary search is about finding a half that must contain the solution after each iteration. Hence, it the data is unsorted but have some conditions that enable this, binary search can be performed. Examples are Find Peak Element. The array is not sorted, but it has a rule: the left side up to the peak is increasing, the right side from the peak is decreasing. Hence, if the mid pointer is larger than the previous one, there must be a peak from it rightward. If the mid pointer is smaller than the previous one, there must be a peak from it leftward.

4. Binary search on possible answer range

As it is a range, it is necessarily sorted. The problem requires you to come up with the search range for answer yourself. Examples are Koko Eating Bananas or Wood Cut.

Problem

1. Binary serach

Intuition

The basic problem of binary search. It basically asks you to implement it.

Algorithm

Initialize two pointers - start and end.
Traverse the array from both ends.
Calculate middle = (start + end)//2 and compare array[middle] with the target.
If I found the target, return middle. If I am smaller, move start to middle. If I am larger, move end to middle.
If I reach the end, return -1 as I do not find the target.

Complexity

Time complexity: \(O(logn)\): I am cutting the array in half repeatedly, so it takes just \(logn\) to search.
Space complexity: \(O(1)\): Pointers are essentially integers.

Code

class Solution:
    def search(self, nums: List[int], target: int) -> int:
        # Edge case
        if nums[0] > target or nums[-1] < target:
            return -1
        
        # General case
        if nums[0] == target:
            return 0
        if nums[-1] == target:
            return len(nums) - 1
        start, end = 0, len(nums) - 1
        while start <= end:
            mid = (start + end)//2
            if nums[mid] == target:
                return mid
            elif nums[mid] < target:
                start = mid + 1
            else:
                end = mid - 1
        
        return -1

2. First Bad Version

Intuition

For any call isBadVersion(version), if the version is good, it means that the first bad version will be a later version; if the version is bad, it means that the first bad version might be an older version. And given the version comes in non-decreasing order, that sounds exactly like a binary search problem. Notice the italic will be and might be - it will affect the way I implement the algorithm.

Algorithm

Initialize two pointers - first and last or whatever to the first and the last version, denoted by numbers.
The terminating condition for this specific implementation is first == last, so the loop condition is while first < last.
I will check the mid version. If it is bad, I will shift last to mid. If it is good, I will shift first to mid + 1.
Return first (or last - does not matter) when the loop terminates. Cannot return mid here as the variable does not exist outside the loop. You can define it so, but there’s no need to.

Complexity

Time complexity: \(O(log(n))\)
Space complexity: \(O(1)\)

Code

# The isBadVersion API is already defined for you.
# def isBadVersion(version: int) -> bool:

class Solution:
    def firstBadVersion(self, n: int) -> int:
        low, high = 1, n
        while low < high:
            mid = (low+high)//2
            if isBadVersion(mid):
                high = mid
            else:
                low = mid + 1
        return low #high

3. Search a 2D Matrix

Intuition

The matrix is sorted + time complexity requirement of \(O(log(m*n))\) → a binary search problem where I need to do binary search two times.

The first binary search is for the row target is in. The condition to satisfy is start < target < end. I can move start and end if the end of the row is smaller than target and the start of the row is larger than target, respectively.

The next binary search within the row is trivial.

Algorithm

Initialize pointers. I need to store the index of the row so initialize an extra one.
Perform binary search based on conditions above to find the row.
Reinitialize pointers.
Perform a normal binary search within the row and return result accordingly.

Complexity

Time complexity: \(O(log(m*n))\): Two binary searches were performed, one to find the row, and one within the row.
Space complexity: \(O(1)\): I only need to store pointers.

Code

class Solution:
    def searchMatrix(self, matrix: List[List[int]], target: int) -> bool:
        # O(log(m*n)) = O(logm) + O(logn)
        start, mid, end = 0, 0, len(matrix)-1
        while start <= end:
            mid = (start+end)//2
            if matrix[mid][0] == target or matrix[mid][-1] == target:
                return True
            elif matrix[mid][0] < target < matrix[mid][-1]:
                break
            elif matrix[mid][-1] < target:
                start = mid + 1
            elif matrix[mid][0] > target:
                end = mid - 1
        # Return False immediately if start > end
        # as it signifies no valid row found
        if start > end:
            return False
        start, end = 0, len(matrix[0]) - 1
        while start <= end:
            middle = (start+end)//2
            if matrix[mid][middle] == target:
                return True
            elif matrix[mid][middle] < target:
                start = middle + 1
            else:
                end = middle - 1
        return False

4. Koko Eating Bananas

Intuition

The problem is discovering what I am supposed to perform binary search on. For this purpose, I need to lean on the brute-force method. So the brute-force method is iterating all possible values of k from 1 to max(piles. The rest is realizing that this is a sorted array, hence I can perform binary search on that.

Algorithm

Initialize pointers and a variable to store the result e.g., res.
Iterate the array from 1 to max(piles) for k.
At each value, calculate the hours taken for such k. If the hourse is smaller or equal to h, it is a likely candidate so I update the res if necessary and update the right pointer. Else, I update the left pointer.
Return res at the end

Complexity

Time complexity: \(O(log(max(piles))*len(piles))\): For each iteration of binary search, I need to calculate the hours, which requires iterating through the array once each time.
Space complexity: \(O(1)\): I need to store a bunch of variables.

Code

from math import ceil
class Solution:
    def minEatingSpeed(self, piles: List[int], h: int) -> int:
        # Easy case:
        if len(piles)==h:
            return max(piles)

        # General cases:
        left, right = 1, max(piles)
        res = right
        while left <= right:
            k = (left+right)//2
            hours = 0
            for pile in piles:
                hours += ceil(pile/k)
            if hours <= h:
                right = k - 1
                res = min(res, k)
            else:
                left = k + 1
        
        return res

5. Median of Two Sorted Arrays

Intuition

The result median is the element that has roughly \((m+n)//2\) elements to both sides. I can get to this element by partitioning a certain number of elements from each array that sums up to \((m+n)//2\). The median can then be selected or calculated from the last element in the left subarray and the first element in the right subarray of each array.

The good thing is the safeguard that NeetCode came up with. The index chosen can be out of bound. For such case, I can set the variable to be \(-\infty\) for the left variable or \(\infty\) for the right variable, as I only need to care for the right minimum and left maximum.

Algorithm

Initialize variables - the arrays, total length, half point.
Perform binary search on one array by selecting the left half-array every time (all elements up to index (start+end)//2) and take the rest from the other array. Pick out the last left element and the first right element.
If all the left are smaller than the other right, calculate the median. Increment the pointer to decrease/increase the number of elements taken from an array if the last left is smaller than the first right correspondingly.

Complexity

Time complexity: \(O(log(m+n))\): I essentially iterate a concatenated version of the two arrays.
Space complexity: \(O(n)\): This implementation requires creating a copy of each array.

Code

class Solution:
    def findMedianSortedArrays(self, nums1: List[int], nums2: List[int]) -> float:
        A, B = nums1, nums2
        total = len(A) + len(B)
        half = total // 2
        
        if len(B) < len(A):
            A, B = B, A
        
        l, r = 0, len(A)-1
        while True:
            i = (l+r)//2
            j = half - i -2

            Aleft = A[i] if i >= 0 else float("-infinity")
            Bleft = B[j] if j >= 0 else float("-infinity")
            Aright = A[i+1] if (i+1) < len(A) else float("infinity")
            Bright = B[j+1] if (j+1) < len(B) else float("infinity")

            # partition is correct
            if Aleft <= Bright and Bleft <= Aright:
                if total % 2:
                    return min(Aright, Bright)
                return (max(Aleft, Bleft) + min(Aright, Bright)) / 2
            elif Aleft > Bright:
                r = i - 1
            else:
                l = i + 1

6. Find the Duplicate Number

Intuition

This can be solved with fast and slow pointers. However, there exists an interesting binary search solution. It has a worse time complexity, but it is interesting to see how the problem can be framed as a binary search problem.

In an array with \(n+1\) elements from \(1\) to \(n\), the number of elements smaller or equal than \(k\) is \(k\). However, if there is a duplicate, say \(n\), the number of elements smaller or equal than \(k\) is greater than \(k\), for all elements greater than or equal to \(n\). I can use this property to perform binary search on the range \([1, n]\), searching for the first element with the number of elements smaller or equal to it greater than itself.

Algorithm

Initialize the left and right pointers.
Perform binary search on the range \([1, n]\). At each iteration, calculate the number of elements smaller or equal to the middle element. If the number is greater than the middle element, I know the duplicate is in the left half. Else, I know the duplicate is in the right half.
Return the left pointer.

Complexity

Time complexity: \(O(n \log n)\)
Space complexity: \(O(1)\)

Code

Update: The solution cannot pass the time limit check on LeetCode. The solution is correct, but it is not efficient enough.

class Solution:
    def findDuplicate(self, nums: List[int]) -> int:
        left, right = 1, len(nums)-1
        while left <= right:
            mid = (left+right)//2
            less = sum(1 for num in nums if num <= mid)

            if less > mid:
                right -= 1
            else:
                left += 1
        
        return left

7. Search in Rotated Sorted Array

Intuition

The rotated sorted list is best illustrated below

I saw that the array is divided into two sorted portion, with the smallest element of the first greater than the largest element of the second. This doubles the consideration when I need to update one of the pointers.

Before updating, I need to find the portion the mid pointer is currently at by comparing its value to left pointer’s value. If mid is in the left portion, I move left to the right if target is larger than mid’s value (target is further to the right of the left portion) or if target is smaller than left’s value (target is in the right portion). Otherwise, update right. The same analysis can be used when mid is in the right portion.

Algorithm

Initialize the left and right pointers.
Perform binary search on num. At each iteration, check the portion mid is currently at before updating left or right according to the rules.
Return mid if target is found else -1.

Complexity

Time complexity:

\(O(n \log n)\) - Space complexity: \(O(1)\)

Code

Update: The solution cannot pass the time limit check on LeetCode. The solution is correct, but it is not efficient enough.

class Solution:
    def search(self, nums: List[int], target: int) -> int:
        left, right = 0, len(nums) - 1
        
        while left <= right:
            mid = (left+right)//2
            if nums[mid] == target:
                return mid
            
            # Left sorted part
            if nums[left] <= nums[mid]:
                if target > nums[mid] or target < nums[left]:
                    left = mid + 1
                else:
                    right = mid - 1
            # Right sorted part
            else:
                if target < nums[mid] or target > nums[right]:
                    right = mid - 1
                else:
                    left = mid + 1
        return -1

8. Random Pick with Weight

Intuition

This is a question in statistics. I did not realize it at first but then my mind was blown. It was statistics.

The question asks me to implement a method to return the an index randomly. If uniform distribution, random.randint, easy enough. But hey, each index is assigned a weight to it, making it relatively more likely to be picked and I need to take this into account. Here’s an example:

For this example:

Index 0 has a probability of \(\frac{10}{10+200+30+40}=\frac{10}{280}\) to be picked at any turn.
Index 1 has a probability of \(\frac{200}{280}\) to be picked at any turn.
Index 2 has a probability of \(\frac{30}{280}\) to be picked at any turn.
Index 3 has a probability of \(\frac{40}{280}\) to be picked at any turn.

To achieve this in reality, I can put 10 pieces of paper with \(0\) on it + 200 pieces of paper with \(1\) on it + 30 pieces of paper with \(2\) on it + 40 pieces of paper with \(3\) on it all in a jar, and then had a blind person pick from the jar and read the number to me at each turn. In code, this means creating an array of 10 \(0\), 200 \(1\), … concatenated or at random position, and then use random.randint to sample from a uniform distribution in range \([0, \text{len(array)}-1]\) (the “blind person” part). But hey, this will consume a lot of space. Statistics has this nice thing called cumulative distribution. Simply, it is a function when you give it a value \(X\) and then you go on to sample the value from the distribution, it will tell you the probability that you will receive a value less than or equal to \(X\). This has an interesting property: if you have a random number chosen from \([0,1]\) with a uniform distribution, the probability that a value \(X\) has the smallest cumulative probability that is larger than this number is equal \(X\)’s probability of being picked when you sample the distribution. This interesting property still holds when that random number is picked from a uniform discrete or uniform continuous distribution. To illustrate, I have this array of running cumulative weights from above: [10, 210, 240, 280]. Let’s pick a random integer from 1 to 280. If the goal is to find the index when the cumulative weight first less than or equal to that integer:

Index 0 covers 1 to 10, has an effective probability of \(\frac{10}{280}\) to be picked at any turn.
Index 1 covers 11 to 210, has an effective probability of \(\frac{200}{280}\) to be picked at any turn.
Index 2 covers 211 to 240, has an effective probability of \(\frac{30}{280}\) to be picked at any turn.
Index 3 covers 241 to 280, has an effective probability of \(\frac{40}{280}\) to be picked at any turn.

Which is the same as above.

For binary search to come into play, the key is that the running cumulative weight array is naturally sorted in the ascending order. Hence, I can search for the index with binary search instead of iterating from left to right.

Algorithm

The answer is asked to be implemented as a data structure. 1. Store the input array w when initialized. Calculate the running sum if the array has more than one element. 2. If the array has more than one element, pick a random integer and search for the index when the running sum first less than or equal that number (or first less than in my implementation - the difference is the range to pick the random number). 3. Return the result.

Complexity

Time complexity:
- __init__(): \(O(n)\) - Iterate the array to calculate the running sum of weights.
- pickIndex(): \(O(\log n)\) - Perform binary search to pick the index.
Memory complexity:
- __init__(): \(O(n)\) - The size of the running sum of weights array.
- pickIndex(): \(O(1)\) - Store the pointers.

Code

from random import randint

class Solution:

    def __init__(self, w: List[int]):
        self.weights = w
        if len(w) > 1:
            self.prefix_totals = []
            prefix_total = 0
            for weight in self.weights:
                prefix_total += weight
                self.prefix_totals.append(prefix_total)
        
    def pickIndex(self) -> int:
        if len(self.weights) < 2:
            return 0
        random_number = randint(0, self.prefix_totals[-1] - 1)
        left, right = 0, len(self.prefix_totals)     
        res = float('inf')
        while left <= right:
            mid = (left + right) // 2
            if self.prefix_totals[mid] > random_number:
                res = min(res, mid)
                right = mid - 1
            else:
                left = mid + 1
        return res

# Your Solution object will be instantiated and called as such:
# obj = Solution(w)
# param_1 = obj.pickIndex()

9. Find K Closest Elements

Intuition

The baseline solution is \(O(n\log n)\) in Python: sort the array according to the criteria, slice the array, sort the slice, and return.

class Solution:
    def findClosestElements(self, arr: List[int], k: int, x: int) -> List[int]:
        # Edge case
        if k == len(arr):
            return arr

        # You will be kicked out of the company soon enough
        return sorted(sorted(arr, key=lambda num:(abs(num-x), num))[:k])

I know immediately I can do better: efficiently search for the closest element that is larger than x with binary search Afterwards it is the question of expanding from this element. I was thinking about expanding and appending and sorting, which makes it \(O(k\log k)\). However, with sliding window, I can do even better by just keeping the left and right pointers and then return the correct slice, which is already sorted. The time complexity is reduced to \(O(\max(\log n, k))\).

One thing that requires a bit of thinking to realize is the left pointer returned by binary search is always more than or equal to x (the reverse is the case if the right pointer is returned). Working immediately with this returned pointer is a bit awkward - it’s better to use the one right before (or right after) that is the closest and smaller (or larger) to x. You can see it in the implementation below.

Algorithm

Use binary search to assign the position of the largest element that is smaller than x to left and right to left+1. The objective slice to return is arr[left+1:right].
While the window has not grown enough, move the right pointer rightward if the right value is closer than the left value and vice versa. Note that if either pointer reach the end of the array, the return slice is meant to start from the begin for the end. Hence, return immediately.
Return the correct slice.

Complexity

Time complexity: \(O(\max(\log n, k))\)
Space complexity: \(O(1)\)

Code

class Solution:
    def findClosestElements(self, arr: List[int], k: int, x: int) -> List[int]:
        # Edge case
        if k == len(arr):
            return arr
        
        # General case
        left = self.binarySearch(arr, x) - 1
        right = left + 1

        while right - left - 1 < k:
            # Start from first element case
            if left < 0:
                return arr[:k]
            # Start from the end case
            elif right == len(arr): 
                return arr[-k:]
            # Normal update case
            elif abs(arr[right] - x) < abs(arr[left] - x):
                right += 1
            else:
                left -= 1
        return arr[left + 1:right]

    def binarySearch(self, arr: List[int], target: int) -> int:
        left, right = 0, len(arr) - 1
        while left <= right:
            mid = (left + right) // 2
            if arr[mid] == target:
                return mid
            elif arr[mid] < target:
                left = mid + 1
            else:
                right = mid - 1
        return left

10. Single Element in a Sorted Array

Intuition

Some math is required - that was what I though when I first saw the problem. And indeed, you need some math. In an array of pairs, array[i] == array[i+1] for i an even index. If a single element is inserted before i, this observation does not hold. Hence, I will know that the single element exists to the left of i and I need to reduce the searching window leftward to i.

Developing this idea, the problem says that the array is sorted, and the time complexity required is \(O(\log n)\). This means binary search. However, there are three modifications:

mid needs to fall on an even index. Hence, I decrement it if it is odd.
When the pattern breaks at mid, mid should also be considered as a candidate. Hence, right is updated to mid only.
When the pattern is encountered, the pair can be dropped, so left is updated to mid + 2.

Note that these are my specific implementation details. There are other ways.

Algorithm

Perform modified binary search:
1. For odd mid pointer, decrement it to even.
2. Check array[i] == array[i+1]. If this is true, left is updated to mid + 2. Else, right is updated to mid.
Return left’s value.

Complexity

Time complexity: \(O(\log n)\)
Space complexity: \(O(1)\)

Code

class Solution:
    def singleNonDuplicate(self, nums: List[int]) -> int:
        # Edge case
        if len(nums) < 2 or nums[0] < nums[1]:
            return nums[0]
        elif nums[-2] < nums[-1]:
            return nums[-1]
        
        # General case
        left, right = 2, len(nums) - 3

        while left < right:
            mid = (right + left) // 2
            if mid % 2:
                mid -= 1
            if nums[mid] != nums[mid+1]:
                right = mid
            else:
                left = mid + 2
        return nums[left]

11. Find Peak Element

Intuition

This is level 3 binary search: the input array is not sorted, but there exists a rule to bisect the array each time. The intuition is as if you are on a mountain: at a position, if the value is smaller than the next one, there must be a peak to the right of it; if the value is larger than the next one, there must be a peak to the left of it. Hence, by checking the middle value one at a time, you will know the direction that you should move to find a peak. One note here is that if the mid value is smaller than the next one, the algorithm can safely eliminate it as a possible peak during update i.e., left = mid + 1. But if it is larger than the next one, the next possible peak could be itself, so right = mid in update.

Algorithm

Perform binary search on the array with two pointers: mid = (left + right) // 2.
If nums[mid] < nums[mid + 1], the value at mid can be eliminated as a possible peak left = mid + 1.
Else, the value at mid is one possible peak, so update right = mid.
At the end, left will be equal to right and both pointing at the peak.

Complexity

Time complexity: \(O(\log n)\) - Usual binary search time complexity.
Space complexity: \(O(1)\)

Code

class Solution:
    def findPeakElement(self, nums: List[int]) -> int:
        left, right = 0, len(nums) - 1
        while left < right:
            mid = (left + right) // 2
            if nums[mid] < nums[mid + 1]:
                left = mid + 1
            else:
                right = mid 
        return left

12. Wood Cut

Intuition

This is level 4 binary search, where you need to search for the answer within a continuous range of integers. The range for this problem is the range of the length of each piece of wood, going from 1 to total length divided by k, the required minimum number of pieces.

Algorithm

Initialize the two limits left and right as 1 and sum(l) // k and max_length as 0.
Perform binary search on the range between left and right.
1. For each mid_length, iterate through the array to calculate the total number of wood pieces.
2. If the total number of wood pieces is larger than or equal k, update max_length if needed and increase left.
3. Else, decrease right.

Complexity

Time complexity: \(O(n \log n)\) - Iterate over the array for every iteration of binary search.
Space complexity: \(O(1)\) - Only store some integers.

Code

from typing import (
    List,
)

class Solution:
    """
    @param l: Given n pieces of wood with length L[i]
    @param k: An integer
    @return: The maximum length of the small pieces
    """
    def wood_cut(self, l: List[int], k: int) -> int:
        # write your code here
        # Edge case
        if not l:
            return 0

        # General case
        left, right = 1, sum(l) // k
        max_length = 0
        while left <= right:
            mid_length = (left + right) // 2
            num_pieces = 0
            for piece in l:
                num_pieces += piece // mid_length
            if num_pieces >= k:
                max_length = max(max_length, mid_length)
                left = mid_length + 1
            else:
                right = mid_length - 1
        return max_length

13. Find Minimum in Rotated Sorted Array

Intuition

This is level 2 i.e., the (OOXX) pattern. The array is sorted in ascending order but rotated i.e., the last number is guaranteed to be smaller than the first number. The answer asked for is the first number of the right portion. For binary search, I can check the portion of the array the middle number is in by comparing it to the right limit. If it is larger, it means that I am on the left portion, and I need to increase left limit. Else, I am on the right portion, and I need to decrease the right limit. There is no harm in going past the answer as there will be another variable to keep track of the current minimum. In the end, the minimum will either be recorded or stored in the left limit, so I return the smaller of the two.

Algorithm

Initialize two limits and current min (to infinity).
Perform binary search on the array.
1. Update the current min with the middle number.
2. If the middle number is larger than the right limit, increase the left limit.
3. Else, decrease the right limit.
Return the smaller betwene current min and left limit.

Complexity

Time complexity: \(O(\log n)\)
Space complexity: \(O(1)\)

Code

class Solution:
    def findMin(self, nums: List[int]) -> int:
        # Edge case
        if not nums:
            return -1
        
        # General case
        left, right = 0, len(nums) - 1
        curr_min = float('inf')
        while left <= right:
            mid = (left + right) // 2
            curr_min = min(curr_min, nums[mid])

            if nums[mid] > nums[right]:
                left = mid + 1
            else:
                right = mid - 1
        
        return min(curr_min, nums[left])

14. Maximum Profit in Job Scheduling

Intuition

For this problem, the Python library bisect is brough in to reduce the code written. Of course, another method can be written to handle the case. In any case, first, the three arrays of startTime, endTime, profit need zipping and sorted based on startTime. startTime also needs sorting as it is required to perform binary search on.

This is an instance of a dynamic programming problem. In an array, each cell will stored the maximum profit considering jobs from the suffix subarray from that index. This array will be filled in reversely. At each iteration, bisect.bisect_left will be called on startTime, looking for an insertion that is farthest left possible for a value passed in (in this case the end time of the job at the current cell). The return value will be an index, which could be larger than the last index of the current array (indicating the value needs appending to the array). That’s why the array needs initializing with an extra cell of 0. The value of the current cell can be updated with the larger between the value of the cell right after (do not take the job found by binary search) and sum of the profit of the job found and the cell value at the index found by binary search.

Algorithm

Complexity

Time complexity: \(O(n \log n)\) - the cost to sort the startTime array.
Space complexity: \(O(n)\) - the cost to create the new combined array.

Code

from typing import List
import bisect

class Solution:
    def jobScheduling(self, startTime: List[int], endTime: List[int], profit: List[int]) -> int:
        jobs = sorted(zip(startTime, endTime, profit))
        startTime.sort()
        dp = [0] * (len(startTime) + 1)
        for i in range(len(startTime) - 1, -1, -1):
            j = bisect.bisect_left(startTime, jobs[i][1])
            dp[i] = max(dp[i + 1], dp[j] + jobs[i][2])
        return dp[0]