Oct 10, 2009

A Scenario to Ponder #14

Lots of things happened since the last post I made in this blog. I got married. Didn't do much justice to my blog followers since then. My wife couldn't stand me being interested in ONLY her :) So, she pushed me to get this blog started again and I hope she keeps pushing me to continue this blog.

Now, Here is the scenario.
I had been playing Chicktionary (google it if you had not heard of it) in Club Bing. There are some other versions of the same game - Word Warp, Text Twist, Jumble, etc. In a gist, its a game where you are given a chain of letters and you need to come up with as many words as you can with these letters in a given time. It's a pretty addictive game.

Within a few days, I thought it will be even more interesting if I can write a query which will generate the words for me when I give the chain of letters.

To begin with that, I had to set the environment:
1. I need the list of words that my query will be searching from. On googling I found that the following link had a good list of words:

http://www.sil.org/linguistics/wordlists/english/wordlist/wordsEn.txt

2. I loaded this list to a table [WordList] ([Word] varchar(50)) in a SQL Server database.

3. I know that none of these games accept any word that is less than 3 characters long. So I deleted those words from the table.

4. When you do all the above steps, you should have 109441 words in your table.

With the environment set, I need to write a query (or a stored procedure) that will accept a string of letters and get me all the words in the WordList table that can be formed using the string of letters and it should be displayed in order of longest word to the shortest word.

There is always a brute force method to solve these kind of scenarios. But, since this game is timed, I want the most optimal solution to get the result generated as soon as possible.

So, Get me the best solution you can think of. Post as many solutions as you want/can. It can be in SQL Server 2000 or SQL Server 2005. The solution that I have is for SQL Server 2005.
Happy Querying!!
Here is an example Output I am looking for the input string: USILDF

















This is most optimized solution I can come up with for this scenario. If you can find a better solution, please post in the comments section. I would love to see it.


DECLARE @INPUT_WORD           VARCHAR(10)

SET         @INPUT_WORD =     'USILDF';

WITH NUMBERS AS
(
SELECT 1 AS NUM
UNION ALL
SELECT NUM+1 FROM NUMBERS WHERE NUM+1 <= LEN(@INPUT_WORD)
),
WRDLIST AS
(
SELECT WORD,SUBSTRING(WORD,NUM,1) CHR,COUNT(*) CNT
FROM WORDLIST, NUMBERS
GROUP BY WORD,SUBSTRING(WORD,NUM,1)
),
INPUTWORD AS
(
SELECT SUBSTRING(@INPUT_WORD,NUM,1) CHR,COUNT(*) CNT
FROM NUMBERS
GROUP BY SUBSTRING(@INPUT_WORD,NUM,1)
)
SELECT
A.WORD
FROM
WRDLIST A, INPUTWORD B
WHERE
A.CHR = B.CHR
AND A.CNT <=B.CNT
GROUP BY
A.WORD
HAVING
SUM(A.CNT) = LEN(A.WORD)
ORDER BY
LEN(A.WORD) DESC,A.WORD DESC

0 comments:

Post a Comment