Binary search
Definitions:
In a sorted array, there exists a faster way to search for a specific element than visiting each element. The intuition is if I find an element that is smaller than expected, all of the elements before it are also smaller and can be discarded from the search and likewise if it is larger. The optimal way to take advantage of this property is striking in the middle every time, hence the name binary search.
Four levels of binary search
1. Binary search on totally sorted array
Binary Search and Firs Bad Version fit the issue. In this level, I just need to make sure that there will be no infinite loop and I am good. The problem can more complex, but Search a 2D Matrix is the limit.
2. Binary search on partially sorted array
This is the (OOXX) pattern. The most straightforward way to achieve such an array is rotation of a sorted array i.e., [1, 2, 3, 4, 5, 6, 7]
→ [5, 6, 7, 1, 2, 3, 4]
. Here, the first 3 are Os, the last 4 are Xs. Examples are Search in Rotated Sorted Array, Find Minimum in Rotated Sorted Array. Here, the key is paying attention to the update condition.
3. Binary search on unsorted array
Distilled to its core, binary search is about finding a half that must contain the solution after each iteration. Hence, it the data is unsorted but have some conditions that enable this, binary search can be performed. Examples are Find Peak Element. The array is not sorted, but it has a rule: the left side up to the peak is increasing, the right side from the peak is decreasing. Hence, if the mid pointer is larger than the previous one, there must be a peak from it rightward. If the mid pointer is smaller than the previous one, there must be a peak from it leftward.
4. Binary search on possible answer range
As it is a range, it is necessarily sorted. The problem requires you to come up with the search range for answer yourself. Examples are Koko Eating Bananas or Wood Cut.
Problem
1. Binary serach
Intuition
The basic problem of binary search. It basically asks you to implement it.
Algorithm
- Initialize two pointers -
start
andend
. - Traverse the array from both ends.
- Calculate
middle = (start + end)//2
and comparearray[middle]
with the target. - If I found the target, return
middle
. If I am smaller, movestart
tomiddle
. If I am larger, moveend
tomiddle
. - If I reach the end, return -1 as I do not find the target.
Complexity
Time complexity: \(O(logn)\): I am cutting the array in half repeatedly, so it takes just \(logn\) to search.
Space complexity: \(O(1)\): Pointers are essentially integers.
Code
class Solution:
def search(self, nums: List[int], target: int) -> int:
# Edge case
if nums[0] > target or nums[-1] < target:
return -1
# General case
if nums[0] == target:
return 0
if nums[-1] == target:
return len(nums) - 1
= 0, len(nums) - 1
start, end while start <= end:
= (start + end)//2
mid if nums[mid] == target:
return mid
elif nums[mid] < target:
= mid + 1
start else:
= mid - 1
end
return -1
2. First Bad Version
Intuition
For any call isBadVersion(version)
, if the version is good, it means that the first bad version will be a later version; if the version is bad, it means that the first bad version might be an older version. And given the version comes in non-decreasing order, that sounds exactly like a binary search problem. Notice the italic will be and might be - it will affect the way I implement the algorithm.
Algorithm
- Initialize two pointers -
first
andlast
or whatever to the first and the last version, denoted by numbers. - The terminating condition for this specific implementation is
first == last
, so the loop condition iswhile first < last
. - I will check the
mid
version. If it is bad, I will shiftlast
tomid
. If it is good, I will shiftfirst
tomid + 1
. - Return
first
(orlast
- does not matter) when the loop terminates. Cannot returnmid
here as the variable does not exist outside the loop. You can define it so, but there’s no need to.
Complexity
Time complexity: \(O(log(n))\)
Space complexity: \(O(1)\)
Code
# The isBadVersion API is already defined for you.
# def isBadVersion(version: int) -> bool:
class Solution:
def firstBadVersion(self, n: int) -> int:
= 1, n
low, high while low < high:
= (low+high)//2
mid if isBadVersion(mid):
= mid
high else:
= mid + 1
low return low #high
3. Search a 2D Matrix
Intuition
The matrix is sorted + time complexity requirement of \(O(log(m*n))\) → a binary search problem where I need to do binary search two times.
The first binary search is for the row target
is in. The condition to satisfy is start
< target
< end
. I can move start
and end
if the end of the row is smaller than target
and the start of the row is larger than target
, respectively.
The next binary search within the row is trivial.
Algorithm
- Initialize pointers. I need to store the index of the row so initialize an extra one.
- Perform binary search based on conditions above to find the row.
- Reinitialize pointers.
- Perform a normal binary search within the row and return result accordingly.
Complexity
Time complexity: \(O(log(m*n))\): Two binary searches were performed, one to find the row, and one within the row.
Space complexity: \(O(1)\): I only need to store pointers.
Code
class Solution:
def searchMatrix(self, matrix: List[List[int]], target: int) -> bool:
# O(log(m*n)) = O(logm) + O(logn)
= 0, 0, len(matrix)-1
start, mid, end while start <= end:
= (start+end)//2
mid if matrix[mid][0] == target or matrix[mid][-1] == target:
return True
elif matrix[mid][0] < target < matrix[mid][-1]:
break
elif matrix[mid][-1] < target:
= mid + 1
start elif matrix[mid][0] > target:
= mid - 1
end # Return False immediately if start > end
# as it signifies no valid row found
if start > end:
return False
= 0, len(matrix[0]) - 1
start, end while start <= end:
= (start+end)//2
middle if matrix[mid][middle] == target:
return True
elif matrix[mid][middle] < target:
= middle + 1
start else:
= middle - 1
end return False
4. Koko Eating Bananas
Intuition
The problem is discovering what I am supposed to perform binary search on. For this purpose, I need to lean on the brute-force method. So the brute-force method is iterating all possible values of k
from 1 to max(piles
. The rest is realizing that this is a sorted array, hence I can perform binary search on that.
Algorithm
- Initialize pointers and a variable to store the result e.g.,
res
. - Iterate the array from 1 to
max(piles)
fork
. - At each value, calculate the hours taken for such
k
. If the hourse is smaller or equal toh
, it is a likely candidate so I update theres
if necessary and update the right pointer. Else, I update the left pointer. - Return
res
at the end
Complexity
Time complexity: \(O(log(max(piles))*len(piles))\): For each iteration of binary search, I need to calculate the hours, which requires iterating through the array once each time.
Space complexity: \(O(1)\): I need to store a bunch of variables.
Code
from math import ceil
class Solution:
def minEatingSpeed(self, piles: List[int], h: int) -> int:
# Easy case:
if len(piles)==h:
return max(piles)
# General cases:
= 1, max(piles)
left, right = right
res while left <= right:
= (left+right)//2
k = 0
hours for pile in piles:
+= ceil(pile/k)
hours if hours <= h:
= k - 1
right = min(res, k)
res else:
= k + 1
left
return res
5. Median of Two Sorted Arrays
Intuition
The result median is the element that has roughly \((m+n)//2\) elements to both sides. I can get to this element by partitioning a certain number of elements from each array that sums up to \((m+n)//2\). The median can then be selected or calculated from the last element in the left subarray and the first element in the right subarray of each array.
The good thing is the safeguard that NeetCode came up with. The index chosen can be out of bound. For such case, I can set the variable to be \(-\infty\) for the left variable or \(\infty\) for the right variable, as I only need to care for the right minimum and left maximum.
Algorithm
- Initialize variables - the arrays, total length, half point.
- Perform binary search on one array by selecting the left half-array every time (all elements up to index
(start+end)//2
) and take the rest from the other array. Pick out the last left element and the first right element. - If all the left are smaller than the other right, calculate the median. Increment the pointer to decrease/increase the number of elements taken from an array if the last left is smaller than the first right correspondingly.
Complexity
Time complexity: \(O(log(m+n))\): I essentially iterate a concatenated version of the two arrays.
Space complexity: \(O(n)\): This implementation requires creating a copy of each array.
Code
class Solution:
def findMedianSortedArrays(self, nums1: List[int], nums2: List[int]) -> float:
= nums1, nums2
A, B = len(A) + len(B)
total = total // 2
half
if len(B) < len(A):
= B, A
A, B
= 0, len(A)-1
l, r while True:
= (l+r)//2
i = half - i -2
j
= A[i] if i >= 0 else float("-infinity")
Aleft = B[j] if j >= 0 else float("-infinity")
Bleft = A[i+1] if (i+1) < len(A) else float("infinity")
Aright = B[j+1] if (j+1) < len(B) else float("infinity")
Bright
# partition is correct
if Aleft <= Bright and Bleft <= Aright:
if total % 2:
return min(Aright, Bright)
return (max(Aleft, Bleft) + min(Aright, Bright)) / 2
elif Aleft > Bright:
= i - 1
r else:
= i + 1 l
6. Find the Duplicate Number
Intuition
This can be solved with fast and slow pointers. However, there exists an interesting binary search solution. It has a worse time complexity, but it is interesting to see how the problem can be framed as a binary search problem.
In an array with \(n+1\) elements from \(1\) to \(n\), the number of elements smaller or equal than \(k\) is \(k\). However, if there is a duplicate, say \(n\), the number of elements smaller or equal than \(k\) is greater than \(k\), for all elements greater than or equal to \(n\). I can use this property to perform binary search on the range \([1, n]\), searching for the first element with the number of elements smaller or equal to it greater than itself.
Algorithm
- Initialize the left and right pointers.
- Perform binary search on the range \([1, n]\). At each iteration, calculate the number of elements smaller or equal to the middle element. If the number is greater than the middle element, I know the duplicate is in the left half. Else, I know the duplicate is in the right half.
- Return the left pointer.
Complexity
Time complexity: \(O(n \log n)\)
Space complexity: \(O(1)\)
Code
Update: The solution cannot pass the time limit check on LeetCode. The solution is correct, but it is not efficient enough.
class Solution:
def findDuplicate(self, nums: List[int]) -> int:
= 1, len(nums)-1
left, right while left <= right:
= (left+right)//2
mid = sum(1 for num in nums if num <= mid)
less
if less > mid:
-= 1
right else:
+= 1
left
return left
7. Search in Rotated Sorted Array
Intuition
The rotated sorted list is best illustrated below
I saw that the array is divided into two sorted portion, with the smallest element of the first greater than the largest element of the second. This doubles the consideration when I need to update one of the pointers.
Before updating, I need to find the portion the mid
pointer is currently at by comparing its value to left
pointer’s value. If mid
is in the left portion, I move left
to the right if target
is larger than mid
’s value (target
is further to the right of the left portion) or if target
is smaller than left
’s value (target
is in the right portion). Otherwise, update right
. The same analysis can be used when mid
is in the right portion.
Algorithm
- Initialize the left and right pointers.
- Perform binary search on
num
. At each iteration, check the portionmid
is currently at before updatingleft
orright
according to the rules. - Return
mid
iftarget
is found else-1
.
Complexity
- Time complexity:
\(O(n \log n)\) - Space complexity: \(O(1)\)
Code
Update: The solution cannot pass the time limit check on LeetCode. The solution is correct, but it is not efficient enough.
class Solution:
def search(self, nums: List[int], target: int) -> int:
= 0, len(nums) - 1
left, right
while left <= right:
= (left+right)//2
mid if nums[mid] == target:
return mid
# Left sorted part
if nums[left] <= nums[mid]:
if target > nums[mid] or target < nums[left]:
= mid + 1
left else:
= mid - 1
right # Right sorted part
else:
if target < nums[mid] or target > nums[right]:
= mid - 1
right else:
= mid + 1
left return -1
8. Random Pick with Weight
Intuition
This is a question in statistics. I did not realize it at first but then my mind was blown. It was statistics.
The question asks me to implement a method to return the an index randomly. If uniform distribution, random.randint
, easy enough. But hey, each index is assigned a weight to it, making it relatively more likely to be picked and I need to take this into account. Here’s an example:
For this example:
- Index 0 has a probability of \(\frac{10}{10+200+30+40}=\frac{10}{280}\) to be picked at any turn.
- Index 1 has a probability of \(\frac{200}{280}\) to be picked at any turn.
- Index 2 has a probability of \(\frac{30}{280}\) to be picked at any turn.
- Index 3 has a probability of \(\frac{40}{280}\) to be picked at any turn.
To achieve this in reality, I can put 10 pieces of paper with \(0\) on it + 200 pieces of paper with \(1\) on it + 30 pieces of paper with \(2\) on it + 40 pieces of paper with \(3\) on it all in a jar, and then had a blind person pick from the jar and read the number to me at each turn. In code, this means creating an array of 10 \(0\), 200 \(1\), … concatenated or at random position, and then use random.randint
to sample from a uniform distribution in range \([0, \text{len(array)}-1]\) (the “blind person” part). But hey, this will consume a lot of space. Statistics has this nice thing called cumulative distribution. Simply, it is a function when you give it a value \(X\) and then you go on to sample the value from the distribution, it will tell you the probability that you will receive a value less than or equal to \(X\). This has an interesting property: if you have a random number chosen from \([0,1]\) with a uniform distribution, the probability that a value \(X\) has the smallest cumulative probability that is larger than this number is equal \(X\)’s probability of being picked when you sample the distribution. This interesting property still holds when that random number is picked from a uniform discrete or uniform continuous distribution. To illustrate, I have this array of running cumulative weights from above: [10, 210, 240, 280]
. Let’s pick a random integer from 1 to 280. If the goal is to find the index when the cumulative weight first less than or equal to that integer:
- Index 0 covers 1 to 10, has an effective probability of \(\frac{10}{280}\) to be picked at any turn.
- Index 1 covers 11 to 210, has an effective probability of \(\frac{200}{280}\) to be picked at any turn.
- Index 2 covers 211 to 240, has an effective probability of \(\frac{30}{280}\) to be picked at any turn.
- Index 3 covers 241 to 280, has an effective probability of \(\frac{40}{280}\) to be picked at any turn.
Which is the same as above.
For binary search to come into play, the key is that the running cumulative weight array is naturally sorted in the ascending order. Hence, I can search for the index with binary search instead of iterating from left to right.
Algorithm
The answer is asked to be implemented as a data structure. 1. Store the input array w
when initialized. Calculate the running sum if the array has more than one element. 2. If the array has more than one element, pick a random integer and search for the index when the running sum first less than or equal that number (or first less than in my implementation - the difference is the range to pick the random number). 3. Return the result.
Complexity
- Time complexity:
__init__()
: \(O(n)\) - Iterate the array to calculate the running sum of weights.pickIndex()
: \(O(\log n)\) - Perform binary search to pick the index.
- Memory complexity:
__init__()
: \(O(n)\) - The size of the running sum of weights array.pickIndex()
: \(O(1)\) - Store the pointers.
Code
from random import randint
class Solution:
def __init__(self, w: List[int]):
self.weights = w
if len(w) > 1:
self.prefix_totals = []
= 0
prefix_total for weight in self.weights:
+= weight
prefix_total self.prefix_totals.append(prefix_total)
def pickIndex(self) -> int:
if len(self.weights) < 2:
return 0
= randint(0, self.prefix_totals[-1] - 1)
random_number = 0, len(self.prefix_totals)
left, right = float('inf')
res while left <= right:
= (left + right) // 2
mid if self.prefix_totals[mid] > random_number:
= min(res, mid)
res = mid - 1
right else:
= mid + 1
left return res
# Your Solution object will be instantiated and called as such:
# obj = Solution(w)
# param_1 = obj.pickIndex()
9. Find K Closest Elements
Intuition
The baseline solution is \(O(n\log n)\) in Python: sort the array according to the criteria, slice the array, sort the slice, and return.
class Solution:
def findClosestElements(self, arr: List[int], k: int, x: int) -> List[int]:
# Edge case
if k == len(arr):
return arr
# You will be kicked out of the company soon enough
return sorted(sorted(arr, key=lambda num:(abs(num-x), num))[:k])
I know immediately I can do better: efficiently search for the closest element that is larger than x
with binary search Afterwards it is the question of expanding from this element. I was thinking about expanding and appending and sorting, which makes it \(O(k\log k)\). However, with sliding window, I can do even better by just keeping the left and right pointers and then return the correct slice, which is already sorted. The time complexity is reduced to \(O(\max(\log n, k))\).
One thing that requires a bit of thinking to realize is the left pointer returned by binary search is always more than or equal to x
(the reverse is the case if the right pointer is returned). Working immediately with this returned pointer is a bit awkward - it’s better to use the one right before (or right after) that is the closest and smaller (or larger) to x
. You can see it in the implementation below.
Algorithm
- Use binary search to assign the position of the largest element that is smaller than
x
toleft
andright
toleft+1
. The objective slice to return isarr[left+1:right]
. - While the window has not grown enough, move the right pointer rightward if the right value is closer than the left value and vice versa. Note that if either pointer reach the end of the array, the return slice is meant to start from the begin for the end. Hence, return immediately.
- Return the correct slice.
Complexity
Time complexity: \(O(\max(\log n, k))\)
Space complexity: \(O(1)\)
Code
class Solution:
def findClosestElements(self, arr: List[int], k: int, x: int) -> List[int]:
# Edge case
if k == len(arr):
return arr
# General case
= self.binarySearch(arr, x) - 1
left = left + 1
right
while right - left - 1 < k:
# Start from first element case
if left < 0:
return arr[:k]
# Start from the end case
elif right == len(arr):
return arr[-k:]
# Normal update case
elif abs(arr[right] - x) < abs(arr[left] - x):
+= 1
right else:
-= 1
left return arr[left + 1:right]
def binarySearch(self, arr: List[int], target: int) -> int:
= 0, len(arr) - 1
left, right while left <= right:
= (left + right) // 2
mid if arr[mid] == target:
return mid
elif arr[mid] < target:
= mid + 1
left else:
= mid - 1
right return left
10. Single Element in a Sorted Array
Intuition
Some math is required - that was what I though when I first saw the problem. And indeed, you need some math. In an array of pairs, array[i] == array[i+1]
for i
an even index. If a single element is inserted before i
, this observation does not hold. Hence, I will know that the single element exists to the left of i
and I need to reduce the searching window leftward to i
.
Developing this idea, the problem says that the array is sorted, and the time complexity required is \(O(\log n)\). This means binary search. However, there are three modifications:
mid
needs to fall on an even index. Hence, I decrement it if it is odd.- When the pattern breaks at
mid
,mid
should also be considered as a candidate. Hence,right
is updated tomid
only. - When the pattern is encountered, the pair can be dropped, so
left
is updated tomid + 2
.
Note that these are my specific implementation details. There are other ways.
Algorithm
- Perform modified binary search:
- For odd
mid
pointer, decrement it to even. - Check
array[i] == array[i+1]
. If this is true,left
is updated tomid + 2
. Else,right
is updated tomid
.
- For odd
- Return
left
’s value.
Complexity
Time complexity: \(O(\log n)\)
Space complexity: \(O(1)\)
Code
class Solution:
def singleNonDuplicate(self, nums: List[int]) -> int:
# Edge case
if len(nums) < 2 or nums[0] < nums[1]:
return nums[0]
elif nums[-2] < nums[-1]:
return nums[-1]
# General case
= 2, len(nums) - 3
left, right
while left < right:
= (right + left) // 2
mid if mid % 2:
-= 1
mid if nums[mid] != nums[mid+1]:
= mid
right else:
= mid + 2
left return nums[left]
11. Find Peak Element
Intuition
This is level 3 binary search: the input array is not sorted, but there exists a rule to bisect the array each time. The intuition is as if you are on a mountain: at a position, if the value is smaller than the next one, there must be a peak to the right of it; if the value is larger than the next one, there must be a peak to the left of it. Hence, by checking the middle value one at a time, you will know the direction that you should move to find a peak. One note here is that if the mid
value is smaller than the next one, the algorithm can safely eliminate it as a possible peak during update i.e., left = mid + 1
. But if it is larger than the next one, the next possible peak could be itself, so right = mid
in update.
Algorithm
- Perform binary search on the array with two pointers:
mid = (left + right) // 2
. - If
nums[mid] < nums[mid + 1]
, the value atmid
can be eliminated as a possible peakleft = mid + 1
. - Else, the value at
mid
is one possible peak, so updateright = mid
. - At the end,
left
will be equal toright
and both pointing at the peak.
Complexity
Time complexity: \(O(\log n)\) - Usual binary search time complexity.
Space complexity: \(O(1)\)
Code
class Solution:
def findPeakElement(self, nums: List[int]) -> int:
= 0, len(nums) - 1
left, right while left < right:
= (left + right) // 2
mid if nums[mid] < nums[mid + 1]:
= mid + 1
left else:
= mid
right return left
12. Wood Cut
Intuition
This is level 4 binary search, where you need to search for the answer within a continuous range of integers. The range for this problem is the range of the length of each piece of wood, going from 1 to total length divided by k
, the required minimum number of pieces.
Algorithm
- Initialize the two limits
left
andright
as1
andsum(l) // k
andmax_length
as0
. - Perform binary search on the range between
left
andright
.- For each
mid_length
, iterate through the array to calculate the total number of wood pieces. - If the total number of wood pieces is larger than or equal
k
, updatemax_length
if needed and increaseleft
. - Else, decrease
right
.
- For each
Complexity
Time complexity: \(O(n \log n)\) - Iterate over the array for every iteration of binary search.
Space complexity: \(O(1)\) - Only store some integers.
Code
from typing import (
List,
)
class Solution:
"""
@param l: Given n pieces of wood with length L[i]
@param k: An integer
@return: The maximum length of the small pieces
"""
def wood_cut(self, l: List[int], k: int) -> int:
# write your code here
# Edge case
if not l:
return 0
# General case
= 1, sum(l) // k
left, right = 0
max_length while left <= right:
= (left + right) // 2
mid_length = 0
num_pieces for piece in l:
+= piece // mid_length
num_pieces if num_pieces >= k:
= max(max_length, mid_length)
max_length = mid_length + 1
left else:
= mid_length - 1
right return max_length
13. Find Minimum in Rotated Sorted Array
Intuition
This is level 2 i.e., the (OOXX) pattern. The array is sorted in ascending order but rotated i.e., the last number is guaranteed to be smaller than the first number. The answer asked for is the first number of the right portion. For binary search, I can check the portion of the array the middle number is in by comparing it to the right limit. If it is larger, it means that I am on the left portion, and I need to increase left limit. Else, I am on the right portion, and I need to decrease the right limit. There is no harm in going past the answer as there will be another variable to keep track of the current minimum. In the end, the minimum will either be recorded or stored in the left limit, so I return the smaller of the two.
Algorithm
- Initialize two limits and current min (to infinity).
- Perform binary search on the array.
- Update the current min with the middle number.
- If the middle number is larger than the right limit, increase the left limit.
- Else, decrease the right limit.
- Return the smaller betwene current min and left limit.
Complexity
Time complexity: \(O(\log n)\)
Space complexity: \(O(1)\)
Code
class Solution:
def findMin(self, nums: List[int]) -> int:
# Edge case
if not nums:
return -1
# General case
= 0, len(nums) - 1
left, right = float('inf')
curr_min while left <= right:
= (left + right) // 2
mid = min(curr_min, nums[mid])
curr_min
if nums[mid] > nums[right]:
= mid + 1
left else:
= mid - 1
right
return min(curr_min, nums[left])
14. Maximum Profit in Job Scheduling
Intuition
For this problem, the Python library bisect
is brough in to reduce the code written. Of course, another method can be written to handle the case. In any case, first, the three arrays of startTime
, endTime
, profit
need zip
ping and sorted based on startTime
. startTime
also needs sorting as it is required to perform binary search on.
This is an instance of a dynamic programming problem. In an array, each cell will stored the maximum profit considering jobs from the suffix subarray from that index. This array will be filled in reversely. At each iteration, bisect.bisect_left
will be called on startTime
, looking for an insertion that is farthest left possible for a value passed in (in this case the end time of the job at the current cell). The return value will be an index, which could be larger than the last index of the current array (indicating the value needs appending to the array). That’s why the array needs initializing with an extra cell of 0. The value of the current cell can be updated with the larger between the value of the cell right after (do not take the job found by binary search) and sum of the profit of the job found and the cell value at the index found by binary search.
Algorithm
Complexity
Time complexity: \(O(n \log n)\) - the cost to sort the
startTime
array.Space complexity: \(O(n)\) - the cost to create the new combined array.
Code
from typing import List
import bisect
class Solution:
def jobScheduling(self, startTime: List[int], endTime: List[int], profit: List[int]) -> int:
= sorted(zip(startTime, endTime, profit))
jobs
startTime.sort()= [0] * (len(startTime) + 1)
dp for i in range(len(startTime) - 1, -1, -1):
= bisect.bisect_left(startTime, jobs[i][1])
j = max(dp[i + 1], dp[j] + jobs[i][2])
dp[i] return dp[0]