Skip to main content

Open Ended Search Problem

Disclaimer: It is a made up problem. Not to be attempted by light hearted.

Problem:
I have a 300 word text. I have a large list of indexed strings (Length of string ~ 20, Number of strings ~ 1M). I need to figure out phrases in the 300 word text that match exactly to one of the strings in the large list of strings I have.

A naive approach:
Taking all 45000 (300 C 2) phrases, search in the large list of strings. Can we do better than this? We need to minimize calls to list of indexed strings!


Comments

  1. approach 1: Compute Hash values of all 300 texts, run any string matching algo with that and the list of strings you have such that only if the hash values match, comparison is made.

    approach 2: If no memory constraints, then run counting sort on the 300 words, and use binary search. Or create 300X300 matrix and run binary search on the 2D matrix. That will will complexity o(log(300square))

    ReplyDelete
  2. We can make trie using 20 length word window size (20 length window will traverse over the entire 300 word text). Then we can traverse the list and search for strings in the trie.

    ReplyDelete

Post a Comment

Popular posts from this blog

Asking a girl out

This is not a puzzle. So, for those of you who follow this puzzle blog, please bear with me for just one post. Interesting Math in this article though :P

Most of my friends already read an article that I wrote more than an year back - "Speak Up"


Here, inspired by the movie, The Beautiful Mind, I give a mathematical analysis of asking a girl out. Nice time it is. Feb 10. No plans for Feb 14 and I am sure this article makes me look even more geekier and all the more reason for me to believe that I will be alone, yet again. But what the hell, lets do it!

Note: This is not an independent analysis. There are many "mathematics sites" which does "similar" analysis.

@Consultants, correct me if I am wrong in my estimates. :P

Why is there a need to be selective?

From the age of 15, I guess there are approximately 3,600 girls I have liked (On average days, I don't see new girls. But going outside, I like about 30 girls. Saying that I go out once every week right …

Consecutive Heads

Let's say A keep tossing a fair coin, until he get 2 consecutive heads, define X to be the number of tosses for this process; B keep tossing another fair coin, until he get 3 consecutive heads, define Y to be the number of the tosses for this process.

1) Calculate P{X>Y}
2) What's the expected value of X
3) What's the expected value of Y

This is probably the hardest puzzle I have ever put on my blog. Hence, I will post its solution in the post directly rather than on comment.

Solution:
1)
(Solved by me finally after 13 months :))

Make a state diagram. Let the state be (a,b) where a is the number of consecutive heads streak "A" is on currently and b is the number of consecutive heads streak "B" is on currently.

So, (0,3) (1,3) are final accepted states and (2,0) (2,1) (2,2) (2,3) are failure states. If you get tails, your contribution to the state reaches to "0"

f(State) = P(X>Y | "State" configuration initially)

f(0,0) = 1/4[f(…

Fraction Brainteaser

Source:
Sent to me by Gaurav Sinha

Problem:
Siddhant writes a Maths test and correctly answers 5 out of 6 Arithmetic questions and 20 out of 28 Geometry questions. In total, Siddhant scores 25 out of 34. 

Vaibhav writes another Maths test and correctly answers 20 out of 25 Arithmetic questions and 6 out of 9 Geometry questions. in total, Vaibhav scores 26 out of 34.

Note that
a) Vaibhav scores more than Siddhant
b) Siddhant score better than Vaibhav in both individual topics - 5/6 > 20/25 and 20/28 > 6/9

How is it possible?