### Equal zeroes and ones

Source: Asked to me by Dinesh Dharme (CSE IITB Fifth year student)

Problem:
You have a array of 0's and 1's. e.g. 0001011101001010100101010101100111
Find the maximum contiguous subsequence of the above sequence so the number of 0's and 1's in that subsequence are equal

Update (21 Dec 2010):
Solution: Posted by Nikhil Garg (Third year student, CSE IIT Delhi) in comments!

1. O(N^2) is trivial.
I propose a O(N) solution.
First look at array having 1 in place of 1 & -1 in place of 0. So our objective is to find max sequence whose sum is 0.

Now lets keep a Hashmap along which stores pairs of form ( S, i) which says that prefix [1...i] of sequence has a sum of S. This map is indexed in S.

Also keep a rolling variable for sum seen so far. Suppose you reach j & sum of [1..j] is S. You see in map if there is some entry for S. If there is say it was i ( i.e prefix [1...i] had sum of S ). Then [i+1...j] is 0 sum seq. So update your ans with this and move ahead.
If S was not in map, put pair ( S, j) in map & keep moving.

You take exactly N steps in moving & at each step you do constant work. So this is O(N) solution. If you don't want to use HashMap, use a balanced tree instead to get time of O(N log N)

2. Correct solution. But I don't see any reason to use hash map.

You are saying that checking duplicate in a hashmap is O(1). To be precise, if the number of buckets you have is k, then your algorithm is O(n)*O(n/k)

Use an O(n) size array. So, you can have the algorithm in O(n) space and O(n) time.

In the algorithm using a tree instead of hashmap, you take O(n) space but O(nlog n) time.

Infact, better than tree algorithm would be stable inplace sorting. This will take O(1) space and O(nlog n) time. :)

Cheers! Hoping all my statements are correct. Check! :)

3. Indeed. You are correct , as always :)

I proposed HashMaps because most modern languages today provide easy to use O(1) average access time maps ( and that I am strongly biased to short implementable solutions :D )

Of course in theory as you correctly pointed, this is not the case. We can certainly use O(N) size array to index into sums as range of possible sums is only form [-N,N].

What I don't get is how to solve problem with O(1) extra storage using sorting. What I think is - you propose to store pairs of form ( S, i) for all prefixes [1..i] of seq and sort on first key breaking ties by second. Then all candidate solutions appear as consecutive here, so we can find the solution in a single scan.

But this is O(N) extra storage.

4. Aah. I missed that. Since we need to store the pairs (S,i), the algorithm would still be O(n) space. Thanks.