### Sorted arrays

**Source:**Just made it up!

**Problem:**

Easy: Given 2 sorted arrays of size n, give an efficient algorithm to find the kth largest number.

Hard: Given m sorted arrays of size n each, give an efficient algorithm to find the kth largest number.

Update (04 December 2010)

**Solution:**Posted by Gaurav Sinha (chera) (CSE IITK 1996 Graduate, Now working at Indian Revenue Service) in comments! Another solution posted by me in comments!

May be I am jumping two quick on the problem to get the solution, (i.e. there is efficient way to do this besides mine)

ReplyDeleteBoth can be considered equivalent problems so solving second part,

Make a min heap of smallest element of m arrays. And call extract min k times, and after each extract min, insert the next element of the array in you just evicted.

So order is m + 2klog(m) interesting it is independent of n which shud be obv as we are not concerned how big an individual list is.

Pratik please tell whether it is right or wrong

@sumit : k can be O(mn). We might have to devise a strategy independent of k, but dependent on n.

ReplyDeleteA simple solution is to merge the k largest elements of the m arrays. This can be done in o(mk) time, if n>k, and o(mn) otherwise.

ReplyDeleteI think sumit's algorithm in fact very efficiently merges m sorted arrays [not nessarily equal in size] containing a total of N elements in time N log m i.e. mn log m in present problem.

ReplyDeleteA naive approach to merging the arrays sequentially would yield time complexity of Nm i.e. nm^2.

Suppose the arrays [1....n] are A1,A2,...Am.

ReplyDeletestart with array A1. consider A1 [i], There are (i-1) numbers more than A1[i] in A1. find the number of elements more than A1 [i], in array A[2]. This can be done in order log(n) time using binary search. So,by iteration, in order (m-1) log n time we can find n(1,i) = number of elements more than A1[i] in each of arrays A1,A2,...,Am.

Now do binary search in Array A1to search for i, such that n(1,i)=k-1. The binary search will require at most log n steps with each step taking (m-1)log n time. So, total time to search in Array A1 is (m-1) ((log n)^2). If search in Array A1 fails, it means kth largest element is not in A1,so repeat this process for A2,A3..Am . So, total time taken is order m (m-1) (log n)^2.

@sumit. You raised an interesting point that the solution is independent of n. Your solution is correct. But I don't think its efficient.

ReplyDelete@chera. Merging the "trimmed" arrays in O(mk) would solve the problem. But its not efficient. Interesting solution is your last comment! Thanks a lot.

Let me provide my algorithm. I think its more efficient for the easy part. You have two arrays A[1..k] and B[1..k]. The arrays are sorted. Compare A[k/2] and B[k/2]. If B[k/2]>A[k/2], we can remove all elements from B[k/2..k] and all elements from A[1..k/2]. So, we are left with two arrays of size k/2 and we need to find k/2th largest element (we need to take care of int.floor and int.ceiling but you get the idea).

For m=2, this algorithm would give the kth largest element in O(log k).

Can someone please generalize this for the general m case?

One generalization that I can think of, works when m < (log k) - typical case in real world

ReplyDeleteDivide each array of size k into two halves. Get the centre points of all arrays. Find the minimum centre point. Except this array, remove the second half of the other m-1 arrays. So, we are left with (m-1) arrays of size k/2 each and one array of size k. And we need to find the kth largest element. This is like we have m+1 arrays of size k/2 each and we need to find the kth largest element. So, in each iteration, time taken is O(m), and we reduce the array size by half and increase the number of arrays by 1. Now I am not sure when to stop and how to calculate the complexity. But this looks like a reasonable approach.

Choose a random Pivot. Now binary search this pivot in each of the m arrays. By this we get the total number of elements less than the pivot and total number of elements greater than pivot. With proper pivot choosing policy (repeat if not good, etc..) we can be sure to eliminate a "good" fraction of total number of elements.

ReplyDeleteTime taken in each iteration is O(m log n) and total number of iterations is O(log mn). SO final time taken is O(m.log n.log mn)

Choose a random Pivot. Now binary search this pivot in each of the m arrays. By this we get the total number of elements less than the pivot and total number of elements greater than pivot. With proper pivot choosing policy (repeat if not good, etc..) we can be sure to eliminate a "good" fraction of total number of elements.

ReplyDeleteTime taken in each iteration is O(m log n) and total number of iterations is O(log mn). SO final time taken is O(m.log n.log mn)

first we devise a method to find the median of the numbers in O [m log(n)] time. We can then use this method to find kth largest element by padding extra elements in arrays, so that the problem is reduced to finding the median of the resultant arrays.

ReplyDeletecompare the center points of all arrays. suppose the minimum and maximum centerpoints are in arrrays Ai and Aj, respectively. suppose sizes of the arrays are si and sj. let s= min(si,sj). Truncate the lower half and upper half of Ai and Aj , by (s/2)elements. This will keep the median unchanged because equal number of elements are removed from both sides of the median. Now iteratively apply the above procedure to find the median.

Total time taken is O(m log n) because in each step, at least one array is reduced to half size.

in the above method we need to take care of floor and ceiling issues etc. but i think these missing gaps can be filled. for example if s=1, then we can check whether sole element is the median directly, in O(m) time. If yes then we stop otherwise we remove 1 element from both arrays Ai and Aj, and at least one array becomes empty.

Now to find kth largest element, we can use above method by padding [not more tham mn] extra elements in the arrays so that the problem reduces to finding median. total time should be O(m log n).