tag:blogger.com,1999:blog-292867782024-03-07T04:39:39.992-05:00SQL Garbage Collector<b>Collecting SQL pointers . . .</b>Unknownnoreply@blogger.comBlogger60125tag:blogger.com,1999:blog-29286778.post-34958242699555403442013-02-27T19:22:00.000-05:002013-03-06T23:02:47.069-05:00SQL to ECL - Metamorphosis<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<div class="MsoNormal">
As the old saying goes, it can be figured out whether you are a data guy or not just by the way you solve the problem:</div>
<div class="MsoNormal">
An imperative programmer goes: for each order in the orders table, find the corresponding order details. . . </div>
<div class="MsoNormal">
And the SQL programmer goes: take the orders table and join with the order details on the order ID key. . . </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
As a SQL programmer, you look at tables as sets and not as individual records. You would not want to bother about whether the join happens as a nested loop, hash, merge or map-reduce for that matter. You want the join to happen the way you intend to (functionally) and let your query engine find the best way to do it based on the data distribution, size etc. That is the SQL programming style. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Now, <b>ECL is not SQL but has the SQL programming style.</b> I find it to be a pseudo functional-declarative programming style with some object oriented concepts tossed in. Don’t pull your hair out. Not just yet!!!</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
In ECL, there are only 2 types of statements that I have come across.</div>
<div class="MsoNormal">
</div>
<ol style="text-align: left;">
<li><b style="text-indent: -0.25in;">Action:</b><span style="text-indent: -0.25in;"> Something that produces an output.</span></li>
<li><b style="text-indent: -0.25in;">Declaration:</b><span style="text-indent: -0.25in;"> This is a single assignment (or, it cannot be re-assigned). These are called attributes in the ECL world. </span></li>
</ol>
In my experiments with ECL, I have created attributes to:<br />
- hold a value (string, int, set, record set, table, file etc)<br />
- a function or action <br />
- a definition (More like a table definition)<br />
<br />
Most of the ECL code are definitions. And your action can call a definition or pipe your definitions to cascade to a result. More on this later.<br />
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Of course, you did notice that there is no concept of a variable. And <b>that </b>is the only thing to “get” in ECL. If you really think about it, ECL gives you amazing abstraction over all the threading and grid-ding that happens behind the scenes. How is it able to do that? By signing a contract with you that says, a definition (or an attribute) does not change its state, ever, <b>ever</b>. So, the attribute points to the same thing whoever wants, or more specifically whichever machine in the grid wants. There, it eliminated any custom race condition you may introduce in your code (which is common in parallel computing world). As long as we adhere to that contract, you can tell HPCC what you need and it will get you the result. With me, so far?? </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
It’s not as hard as you think to code without variables. Purely functional languages like Haskell do not allow you to re-assign.<br />
<br />
So, how do you work around this constraint? </div>
<div class="MsoNormal">
<b>Blunt answer:</b> Create a new definition to hold the mutated value.</div>
<div class="MsoNormal">
<span style="font-family: Wingdings; mso-ascii-font-family: Calibri; mso-char-type: symbol; mso-hansi-font-family: Calibri; mso-symbol-font-family: Wingdings;"><br /></span></div>
<div class="MsoNormal">
Think about it, you don’t “really” use variables in the SQL world either. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Now, why definitions? If it's SQL programmer friendly, why not use SQL? When you try to answer this question, you will uncover the brilliance of this powerful language. Here is my attempt.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Rewind a few years, SQL Server 2005 introduced the common table expressions (CTE) or the “WITH” clause. What problem was it really trying to solve? Help generate number table using recursion, No!!!</div>
<div class="MsoNormal">
In production applications, the biggest use of CTE was to remove code clutter. I was able to remove inline views and move it to the top of the query. Improved code readability. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Take this cooked-up query in SQL Server 2000:</div>
<span style="font-family: Courier New;"><span style="color: blue;">SELECT</span> <span style="color: maroon;">*</span><br /><span style="color: blue;">FROM</span> OrderDetails <span style="color: maroon;">A</span><br /><span style="color: blue;">LEFT</span> <span style="color: blue;">OUTER</span> <span style="color: blue;">JOIN</span> <span style="color: maroon;">(</span><span style="color: blue;">SELECT</span> <span style="color: maroon;">productid</span><span style="color: silver;">,</span><br /> <span style="color: magenta;"><i>Max</i></span><span style="color: maroon;">(</span><span style="color: maroon;">transactiondate</span><span style="color: maroon;">)</span> <span style="color: blue;">AS</span> <span style="color: maroon;">LatestDate</span><br /> <span style="color: blue;">FROM</span> <span style="color: maroon;">transactions</span><br /> <span style="color: blue;">WHERE</span> <span style="color: maroon;">orderid</span> <span style="color: blue;"><> ''</span><br /> <span style="color: blue;">AND</span> <span style="color: maroon;">type</span> <span style="color: silver;">=</span> <span style="color: red;">'Sell'</span><br /> <span style="color: blue;">GROUP</span> <span style="color: blue;">BY</span> <span style="color: maroon;">productid</span><br /> <span style="color: blue;">HAVING</span> <span style="color: magenta;"><i>Year</i></span><span style="color: maroon;">(</span><span style="color: maroon;">latestdate</span><span style="color: maroon;">)</span> <span style="color: silver;">></span> <span style="color: black;">2011</span><span style="color: maroon;">)</span> <span style="color: maroon;">B</span><br /><span style="color: blue;">ON</span> </span><br />
<span style="font-family: Courier New;"><span style="color: maroon;"> B</span><span style="color: silver;">.</span><span style="color: maroon;">productid</span> <span style="color: silver;">=</span> <span style="color: maroon;">A</span><span style="color: silver;">.</span><span style="color: maroon;">productid</span><br /><span style="color: blue;">WHERE</span> </span><br />
<span style="font-family: Courier New;"><span style="color: maroon;"> A</span><span style="color: silver;">.</span><span style="color: maroon;">status</span> <span style="color: silver;">=</span> <span style="color: red;">'Active'</span> </span><br />
<br />
With CTE, you can change it to:</div>
<span style="font-family: Courier New;"><span style="color: blue;">WITH</span> <span style="color: maroon;">B</span><br /> <span style="color: blue;">AS</span> <span style="color: maroon;">(</span><span style="color: blue;">SELECT</span> <span style="color: maroon;">productid</span><span style="color: silver;">,</span><br /> <span style="color: magenta;"><i>Max</i></span><span style="color: maroon;">(</span><span style="color: maroon;">transactiondate</span><span style="color: maroon;">)</span> <span style="color: blue;">AS</span> <span style="color: maroon;">LatestDate</span><br /> <span style="color: blue;">FROM</span> <span style="color: maroon;">transactions</span><br /> <span style="color: blue;">WHERE</span> <span style="color: maroon;">orderid</span> <span style="color: blue;"><> ''</span><br /> <span style="color: blue;">AND</span> <span style="color: maroon;">type</span> <span style="color: silver;">=</span> <span style="color: red;">'Sell'</span><br /> <span style="color: blue;">GROUP</span> <span style="color: blue;">BY</span> <span style="color: maroon;">productid</span><br /> <span style="color: blue;">HAVING</span> <span style="color: magenta;"><i>Year</i></span><span style="color: maroon;">(</span><span style="color: maroon;">latestdate</span><span style="color: maroon;">)</span> <span style="color: silver;">></span> <span style="color: black;">2011</span><span style="color: maroon;">)</span><br /><span style="color: blue;">SELECT</span> <span style="color: maroon;">*</span><br /><span style="color: blue;">FROM</span> <span style="color: maroon;">orderdetails</span> <span style="color: maroon;">A</span><br /><span style="color: blue;">LEFT</span> <span style="color: blue;">OUTER</span> <span style="color: blue;">JOIN</span> <span style="color: maroon;">b</span><br /><span style="color: blue;">ON</span> </span><br />
<span style="font-family: Courier New;"><span style="color: maroon;"> b</span><span style="color: silver;">.</span><span style="color: maroon;">productid</span> <span style="color: silver;">=</span> <span style="color: maroon;">A</span><span style="color: silver;">.</span><span style="color: maroon;">productid</span><br /><span style="color: blue;">WHERE</span> </span><br />
<span style="font-family: Courier New;"><span style="color: maroon;"> A</span><span style="color: silver;">.</span><span style="color: maroon;">status</span> <span style="color: silver;">=</span> <span style="color: red;">'Active'</span><span style="font-size: x-small;"> </span></span><br />
<span style="font-family: Courier New; font-size: x-small;"><br /></span>This is cleaner, I am now looking at two queries. One that generates the table B and the other one that uses it. Coming to think of it, Wouldn't it be great if I can move each of the complexities to its own shell (or definition). So that each of definition can be reused as you need. Like this:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://1.bp.blogspot.com/-sHrv89dj514/US-tDX0G-PI/AAAAAAAAAaU/lsWp-NDGUwY/s1600/ECL.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="http://1.bp.blogspot.com/-sHrv89dj514/US-tDX0G-PI/AAAAAAAAAaU/lsWp-NDGUwY/s1600/ECL.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
Voila!!! This is exactly how the ECL code looks (Well, its a about 95% close and the compiler will help you fix the rest). But that's about it. This will run just fine for a single node with gigabytes of data or 100 nodes petabytes of data. Welcome to the world of big data.<br />
<br />
Once you get past this stage, you will quickly move beyond SQL scope. There is a world of constructs to handle any type or size of data. It's got some powerful detergents built-in to clean up the dirtiest of data you have, that will blow your mind.<br />
<br />
And, just so you know you can write the above query as a single line of code by substituting your definitions inline (recursively). And you will be able to build your SQL Server 2000 query in ECL. But, why you would want to do that is a different discussion!!!<br />
<br /></div>
Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-29286778.post-76507995703099072882013-02-26T17:00:00.001-05:002013-03-06T23:02:47.072-05:00ECL - Big data for the SQLly Inclined<div dir="ltr" style="text-align: left;" trbidi="on">
The last project I worked, we pushed the limits of RDBMS (misfit for our requirements) and I decided that the next time, I will consider beyond SQL for my data needs. I started exploring the NOSQL world - MongoDB,Neo4j, REDIS etc and I understood that had we been open to these technologies when we started our last project, we might have had a lot easier life.<br />
<br />
Eventually, I started seeing Hadoop everywhere, our company was talking, customers were talking, my friends were talking. And I started learning the jargons around Hadoop (map reduce, hive, sqoop, HDFS,HBase) and I used to throw these words in my conversation along with a few zoo animals and figured out most of my friends were doing the same and we had a happy ecosystem going on. But, deep down I knew that I was ignoring the elephant in the room that was staring at me. I read through the famed map reduce research paper and I was able to get the concept. But, I was not able to get to start playing with hadoop. Setting up was easy and you can get the word count sample working in an hour. But, after that I was stuck. I understood the power of what it can do. But, I felt I did not know the right language to communicate with it. It's like someone asked me to write a web server in SQL. Of course you can do it, but I don't want to. To me, SQL is "the" reference implementation of a Domain Specific Language. And the ease at which you can instruct your RDBMS to do a complex task was mind blowing as long as you are operating within the problem domain. <br />
<br />
In retrospect, I understood that the reason why I had so much reluctance to get into hadoop was because I am not a technology guy (there, I said it). I like to solve logical problems (puzzles or problems, I don't care). From a problem solver perspective, my problem statement does not change whether I am working with 1 record or 1gazellion records. It does not change if I have 1 line of text or the entire world wide web to process. I wanted that abstraction. SQL was giving me that (almost), till the data spilled over to the next machine. So, I started looking for languages on top of Hadoop that can help me out. Looked at Pig and Hive and a few others but, i felt this was like LINQ for SQL. You can change the programming languange but you cannot change the fundamental building blocks. I don't want to come out wrong. I love LINQ, but not so much when i have to write complex SQL queries in LINQ.<br />
<br />
And, so I started exploring options that were outside Hadoop. Come on, big data is such a "big" pie, and it will not be monopolized. Anyways, my search ended with ECL. A programing language for taming the super computing grid called High-Performance Computing Cluster. It was open source, installation was exactly like how it was for SQL Server. <br />
"Download the VM and download an IDE (looks like your SQL Server management studio). Connect to the server and get going."<br />
<br />
Played with it for a few days. They have some tutorial videos in their site (<strong>google HPCC systems</strong>). My interest was mainly because the programming style was so different. Not similar to any language that I knew. But, I was able to relate to it. I didn't have to skew my thinking to fit to their programing style.<br />
<br />
Also, it's been "the" programming language for HPCC for the last 10 years or so and it has undergone a lot of refinement over the years. So, I knew that I can take a deep breath and give some time to understand this language.<br />
<br />
Fast forward a few months... I am still in love with ECL. And some day, someone may write ECL for Hadoop. But, till then, I am taming big data, the ECL way.<br />
<br />
If you got an hour or so, give ECL a try and let me know what you think. <br />
<br />
There a mental switch that you need to turn on to be able to easily starting thinking in ECL and with that, it becomes pretty much like SQL, actually even more elegant in a few cases. I will anyway write about it in my next post...breaking down a complex SQL query and building it in ECL. It should be a lot more easy to understand, I hope. </div>
Unknownnoreply@blogger.com3tag:blogger.com,1999:blog-29286778.post-84545693684786313302010-04-16T06:26:00.002-04:002010-04-16T06:33:42.099-04:00Custom SQL Stored Procedure Best Practices Analyzer - SQL Cop, Maybe?<span style="font-family: inherit;">Recently, I moved to Seattle as a Data Architect for a product our company was developing. I also started taking up most of the DBX roles in that project. <br />
<br />
Though our architecture is predominantly CRUD based and we have our in-house CRUD proc generator, some business logic invariably seeps into the SPs. And we had let it to the discretion of the developers on whether to do the processing at Data or the Business Layer. I thought it would make sense to keep a rudimentary check on what logic was going into the SP and make sure our developers don't misuse the liberty and continue to follow the guidelines defined.</span><br />
<a name='more'></a><br />
After a futile search for a tool like FXCop for SQL Server, I thought it would be fun to build a simple Stored Procedure validation tool that will solve the purpose. What follows is the process on how I went about building the tool. The code is given at the end of this post. Feel free to use it, extend it and share it.<br />
<br />
To start with, I wanted the validation to be done on 4 entities<br />
1. Proc Name<br />
2. Proc Definition<br />
3. Input parameters<br />
4. Output parameters<br />
<br />
I decided to use the EVENTDATA generated by the DDL trigger to do the validations on the proc name and the definition. And I plan to query the sys.parameters table to do any validations on the proc parameters.<br />
<br />
We will be doing 3 types of validations on the entities:<br />
1. Verify that entity contains a specific text<br />
2. Verify that entity doesn't contain a specific text<br />
3. For names, verify the length of the value is within a min and max length<br />
<br />
With the primary requirements stated, lets define the other constraints and requirements:<br />
1. Version: SQL Server 2005.<br />
2. The validation rules have to run every time a SP is compiled to check for best practices.<br />
3. Existing validation rules can be updated and new ones added.<br />
4. The administrator should be able to set whether non-compliance to a validations will fail the compile or not (IsCriticalValidation).<br />
5. Log all compiled non-compliance for reference.<br />
6. There should be an option for the admin to over-ride any validation for any SP<br />
7. CLR is disabled in the server (Just didn't want the solution to mandate CLR)<br />
<br />
The above requirements compelled me to follow this design approach:<br />
1. Use a DDL Trigger (Requirements: 1,2)<br />
2. Have a table to store validation rules (Reqirements: 2,3,4)<br />
3. Have a table for logging non-compliance (Requirement: 5)<br />
4. Have a table for storing SP exceptions (Requirement: 6<br />
5. Everything in T-SQL (Requirement: 7)<br />
<br />
Lets start with building the tables: <br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>-- Table for proc validation rules
CREATE TABLE dbo.__SYS_ProcValidationRules(
RuleID smallint NOT NULL,
RuleDescription nvarchar(256) NOT NULL,
IsActive bit NOT NULL,
SearchableEntity nvarchar(32) NULL,
SearchType nvarchar(32) NULL,
IsCaseSensitive bit NULL,
RuleParameter sql_variant NULL,
IsCriticalValidation bit NOT NULL,
PRIMARY KEY CLUSTERED (RuleID ASC)
)
</code></pre><br />
Now, I added a few best practices that you want your developers to follow into this rules table. The table now looks like this:<br />
<a href="http://2.bp.blogspot.com/_sD0X6TOq9Vw/S8gxmXo2_9I/AAAAAAAAAUk/PLzzOsl-3Qk/s1600/Table+Data.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="242" src="http://2.bp.blogspot.com/_sD0X6TOq9Vw/S8gxmXo2_9I/AAAAAAAAAUk/PLzzOsl-3Qk/s640/Table+Data.PNG" width="640" wt="true" /></a><br />
<br />
<span style="font-family: inherit;">Now we create the other tables for the sake of completeness. We will not be filling any data now.</span><br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>--Table to hold SP exceptions
CREATE TABLE dbo.__SYS_ProcExceptions(
SchemaName nvarchar(256) NOT NULL DEFAULT ('dbo'),
ProcName nvarchar(1000) NOT NULL,
ExceptionReason nvarchar(2000) NULL,
ExceptionRuleID smallint NULL,
CreateDate datetime NOT NULL DEFAULT (getdate()),
CreateUser nvarchar(100) NOT NULL DEFAULT (suser_sname())
)
GO
--Table to maintain log on non-compliant SPs created
CREATE TABLE dbo.__SYS_ProcNonCompliance(
SchemaName nvarchar(256) NOT NULL DEFAULT ('dbo'),
ProcName nvarchar(1000) NOT NULL,
RuleDescription nvarchar(256) NOT NULL,
IsException bit NOT NULL,
CreateDate datetime NOT NULL DEFAULT (getdate())
)
</code></pre><br />
<span style="font-family: inherit;">We now create the function that will accept the rule and entity value and returns whether the rule passed or not.</span><br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>-- Function to evaluate any rule
CREATE FUNCTION dbo.__SYS_ValidateRule(@in_SearchText varchar(MAX),
@in_SearchType varchar(32),
@in_IsCaseSensitive bit,
@RuleParameter sql_Variant)
RETURNS BIT
WITH ENCRYPTION
AS
BEGIN
IF @in_SearchType = 'MAX LEN' AND (LEN(@in_SearchText) <= CAST(@RuleParameter AS INT))
RETURN(1);
IF(@in_SearchType = 'MIN LEN' AND LEN(@in_SearchText) >= CAST(@RuleParameter AS INT))
RETURN(1);
IF(@in_SearchType = 'LIKE')
BEGIN
IF(@in_IsCaseSensitive = 0 AND @in_SearchText LIKE CAST(@RuleParameter AS VARCHAR(256)))
RETURN(1);
IF(@in_IsCaseSensitive = 1 AND @in_SearchText COLLATE Latin1_General_BIN LIKE
CAST(@RuleParameter AS VARCHAR(256)) COLLATE Latin1_General_BIN)
RETURN(1);
END
IF(@in_SearchType = 'NOT LIKE' AND @in_SearchText NOT LIKE CAST(@RuleParameter AS VARCHAR(256)))
RETURN(1);
RETURN(0);
END
</code></pre><br />
<span style="font-family: inherit;">The only one thats remaining now is the Database Trigger for create and alter procedure. You can get the definition of that trigger from the link below.</span><br />
<br />
<span style="font-family: inherit;">This <a href="http://docs.google.com/leaf?id=0B5fC3kDDhdp7MzBjYWUzNGMtMjVjMi00MTI4LTgyMTgtMDE0YWI5ODU4ZWYx&hl=en">link</a> holds all the necessary scripts. After running the scripts, here is what happens when I try to compile a sub-standard procedure.</span><br />
<br />
<a href="http://2.bp.blogspot.com/_sD0X6TOq9Vw/S8gxi13IS6I/AAAAAAAAAUc/OITVph6DH3I/s1600/Output.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="412" src="http://2.bp.blogspot.com/_sD0X6TOq9Vw/S8gxi13IS6I/AAAAAAAAAUc/OITVph6DH3I/s640/Output.PNG" width="640" wt="true" /></a><br />
<br />
Suggestions and feedback welcome!!Unknownnoreply@blogger.com6tag:blogger.com,1999:blog-29286778.post-88629768323351713292010-03-14T12:11:00.005-04:002010-05-04T19:36:32.026-04:00A Scenario to Ponder #16During my school days, I used to play this game that I would call as "word mutation". The problem statement gives the starting and ending 4-letter words. <br />
<br />
The game is to find the path from the starting to the ending word by changing one alphabet at a time and ,of course, each of the words in the path should be a valid word.<br />
<a name='more'></a><br />
Given below is one example:<br />
<br />
Start: WORD <br />
End : COME<br />
<br />
One solution is<br />
WORD <br />
WORE <br />
CORE<br />
COME<br />
<br />
So, here is the SQL puzzle. Given the starting and ending 4-letter words, Can you write a query to find the shortest path, between these two words. If you have multiple shortest paths, just show one.<br />
<br />
You need to have a table of words and you can build it using the list available here: <br />
http://www.sil.org/linguistics/wordlists/english/wordlist/wordsEn.txt<br />
<br />
Please post you answers in the comments section.<br />
Happy Querying!!!<br />
<br />
Given below is my take at solving this puzzle. Not very proud of the solution. I feel I could have as well written it in C#. <br />
All you SQL gurus... I am really looking forward to a see a better looking/performing solution.<br />
<br />
<pre style="font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; color: #000000; background-color: #eee;font-size: 12px;border: 1px dashed #999999;line-height: 14px;padding: 5px; overflow: auto; width: 100%"><code>/*
I created a view VW_WORD that will only
select 4 letter words from the Words table.
I am using that as my Word Dictionary for
this solution.
*/
SET NOCOUNT ON;
DECLARE @StartWord CHAR(4);
DECLARE @EndWord CHAR(4);
SET @StartWord = 'WORD';
SET @EndWord = 'COME';
DECLARE @WordsTable as TABLE
(
Word char(4),
WordsBranch varchar(max),
Closeness tinyint,
IterationCounter int
)
DECLARE @IterationCounter int;
SET @IterationCounter = 1;
;WITH Positions AS
(
SELECT 1 AS Pos
UNION ALL
SELECT 2
UNION ALL
SELECT 3
UNION ALL
SELECT 4
)
INSERT INTO @WordsTable (Word, WordsBranch, Closeness, IterationCounter)
SELECT
UPPER(Word),
CAST(UPPER(@StartWord) + '-->' + UPPER(Word) AS VARCHAR(MAX)) AS WordsBranch,
(SELECT COUNT(*) FROM Positions WHERE SUBSTRING(Word, Pos,1) = SUBSTRING(@EndWord,Pos,1)) as Closeness,
@IterationCounter as IterationCounter
FROM
VW_WORD
WHERE
DIFFERENCE(Word,@StartWord) >= 3
AND (SELECT COUNT(*) FROM Positions WHERE SUBSTRING(Word, Pos,1) = SUBSTRING(@StartWord,Pos,1)) >= 3
AND (SELECT COUNT(*) FROM Positions WHERE SUBSTRING(Word, Pos,1) = SUBSTRING(@EndWord,Pos,1)) >= 1
DELETE FROM @WordsTable WHERE Closeness <> (SELECT MAX(Closeness) FROM @WordsTable)
WHILE NOT EXISTS (SELECT 1 FROM @WordsTable WHERE Word = @EndWord)
BEGIN
SET @IterationCounter = @IterationCounter + 1
IF(@IterationCounter = 10)
BREAK; /* No solution Found */
;WITH Positions AS
(
SELECT 1 AS Pos
UNION ALL
SELECT 2
UNION ALL
SELECT 3
UNION ALL
SELECT 4
)
INSERT INTO @WordsTable (Word, WordsBranch, Closeness, IterationCounter)
SELECT
UPPER(A.Word),
WordsBranch + '-->' + UPPER(A.Word) AS WordsBranch,
(SELECT COUNT(*) FROM Positions WHERE SUBSTRING(A.Word, Pos,1) = SUBSTRING(@EndWord,Pos,1)),
@IterationCounter as IterationCounter
FROM
VW_WORD AS A,
(SELECT * FROM @WordsTable WHERE IterationCounter = @IterationCounter - 1) AS B
WHERE
DIFFERENCE(A.Word,B.Word) >= 3
AND (SELECT COUNT(*) FROM Positions WHERE SUBSTRING(A.Word, Pos,1) = SUBSTRING(B.Word,Pos,1)) = 3
AND (SELECT COUNT(*) FROM Positions WHERE SUBSTRING(A.Word, Pos,1) = SUBSTRING(@EndWord,Pos,1)) >= B.Closeness
AND NOT EXISTS (SELECT 1 FROM @WordsTable C where C.Word = A.Word)
DELETE FROM @WordsTable WHERE IterationCounter < @IterationCounter - 1
END
IF EXISTS(SELECT 1 FROM @WordsTable WHERE Closeness = 4)
SELECT WordsBranch FROM @WordsTable WHERE Closeness = 4
ELSE
PRINT 'No solution found'
</code></pre>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-29286778.post-40233323882122860232009-11-14T03:06:00.006-05:002009-11-23T09:55:19.573-05:00Solving Sudoku using SQL Server 2005 - Step by Step - Part #6<b>Implementation of RunSolveAlgorithm4:</b><br />
<br />
This is the last post in this <a href="http://omnibuzz-sql.blogspot.com/2009/11/solving-sudoku-using-sql-server-2005.html">series</a>. The <a href="http://omnibuzz-sql.blogspot.com/2009/11/solving-sudoku-using-sql-server-2005_7764.html">previous algorithm</a> is the last one that tries to solve the puzzle logically. This one will take the latest unsolved sudoku board that we get after running through the first three algorithms and use brute force to solve the puzzle. I am not going to reinvent the wheel here. <br />
<br />
<a name='more'></a>This is a SQL Server implementation of <a href="http://technology.amis.nl/blog/6404/oracle-rdbms-11gr2-solving-a-sudoku-using-recursive-subquery-factoring">"solving sudoku using recursive subquery factoring"</a> written by Anton Scheffer for Oracle 11gR2. <br />
<br />
Here is the logic that was used. Using recursive CTEs, start filling the blank cells (one cell at a time) with all possible valid values for the cell, starting from left to right ,top to bottom. Each empty cell will be a root node with subsequent solutions branching under it. Each valid value filled in that cell will be a new solution node in the branch, which will become the root node for the next blank cell. Now, fill the second cell the same way for each of the node and continue. <br />
<br />
The branching of a node will terminate when one of the following conditions are met:<br />
1. If any node cannot no longer have a valid value for the next empty cell, we stop branching that node - It can no longer lead us to the valid solution. <br />
2. When all cells in the sudoku board are filled, stop processing that node. The node holds the solution.<br />
<br />
Here is the implementation: <br />
<br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>ALTER PROC RunSolveAlgorithm4
AS
SET NOCOUNT ON
BEGIN
WITH SUDOKU_BOARD_AS_STRING AS /*Convert our latest Sudoku Board as a continous string*/
(
SELECT CAST((
SELECT ISNULL(VAL,0) FROM SUDOKU_BOARD ORDER BY YPOS,XPOS FOR XML PATH('')
) AS VARCHAR(81)) AS SUDOKU_PROBLEM
),
ALL_SOLNS AS
( SELECT SUDOKU_PROBLEM AS SOLN, CHARINDEX('0',SUDOKU_PROBLEM,1) CUR_POSN FROM SUDOKU_BOARD_AS_STRING
UNION ALL /*Recursive member - Replace the next blank cell with a possible value*/
SELECT CAST(LEFT(SOLN,CUR_POSN-1) + CAST(A.NUM AS CHAR(1)) + RIGHT(SOLN,81-CUR_POSN) AS VARCHAR(81)),
CHARINDEX( '0',SOLN,CUR_POSN+1)
FROM ALL_SOLNS,NUMBERS A
WHERE CUR_POSN > 0 /*Branching termination condition 2. No more cells to fill. SOLUTION!!! */
AND A.NUM NOT IN (/*Pick the solution only when the current value is not in any other cell of
the same row, column and 3X3 block. Branching termination condition 1*/
SELECT substring(SOLN,(B.NUM-1)*9 + C.NUM,1) FROM NUMBERS B, NUMBERS C
WHERE B.NUM = (CUR_POSN-1)/9+1
OR C.NUM = (CUR_POSN-1)%9 + 1
OR ((B.NUM-1)/3)*3 + (C.NUM-1)/3 = (((CUR_POSN-1)/27)*3 + ((CUR_POSN-1)%9)/3))
),
SUDOKU_SOLUTION AS
( SELECT A.NUM AS XPOS, B.NUM AS YPOS, SUBSTRING(SOLN,(B.NUM-1)*9 + A.NUM,1) AS VAL
FROM NUMBERS A,NUMBERS B,(SELECT TOP 1 SOLN FROM ALL_SOLNS WHERE CUR_POSN = 0) C
)
UPDATE SUD
SET SUD.VAL = SOL.VAL
FROM SUDOKU_BOARD SUD, SUDOKU_SOLUTION SOL
WHERE SUD.XPOS = SOL.XPOS
AND SUD.YPOS = SOL.YPOS
/* Update the solution board too so that its in sync with sudoku board*/
UPDATE SOL
SET SOL.VAL = SUD.VAL
FROM SUDOKU_BOARD SUD, SOLUTION_BOARD SOL
WHERE SUD.XPOS = SOL.XPOS
AND SUD.YPOS = SOL.YPOS
END
</code></pre><br />
This algorithm can be run directly after loading the data in the sudoku board, without running the other three algorithms. However, due to the recursive nature of this algorithm, the number of branches can be exponentially reduced by filling a few extra cells in the sudoku board. So, I am having this as the last algorithm to be used.<br />
<br />
That brings this series to an end (atleast for now). <br />
<br />
<b>One final note:</b> If you are planning to build more algorithms on this, add them before this one, for obvious reasons - so that you don't work on a solved puzzle :) You can get the source of all the code in this series from <a href="http://docs.google.com/leaf?id=0B5fC3kDDhdp7NmViMDMxNGYtNjIyMi00NmYxLWIwMzEtY2FhMjk5ZmNhNGUx&hl=en">here</a>.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-29286778.post-77880444884216960302009-11-14T02:13:00.011-05:002009-11-23T10:06:31.796-05:00Solving Sudoku using SQL Server 2005 - Step by Step - Part #5<strong>Implementation of RunSolveAlgorithm3:</strong><br />
<br />
The <a href="http://omnibuzz-sql.blogspot.com/2009/11/solving-sudoku-using-sql-server-2005_14.html">last algorithm</a> that we implemented was able to solve an easy puzzle. Now, lets take a hard one and see if the solution we have built till now in this <a href="http://omnibuzz-sql.blogspot.com/2009/11/solving-sudoku-using-sql-server-2005.html">series</a> can solve it.<br />
<br />
<pre style="background-color: #eeeeee; border-bottom: rgb(153,153,153) 1px dashed; border-left: rgb(153,153,153) 1px dashed; border-right: rgb(153,153,153) 1px dashed; border-top: rgb(153,153,153) 1px dashed; color: black; font-family: Andale Mono,Lucida Console,Monaco,fixed,monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code> EXEC SolveSudoku
'030,001,000,,006,000,050,,500,000,983,,080,006,302,,000,050,000,,903,800,060,,714,000,009,,020,000,800,,000,400,030'
</code></pre><br />
<a name='more'></a>post-solve sudoku board - before implementing algorithm 3 <br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/_sD0X6TOq9Vw/Sv5XtrMrAjI/AAAAAAAAAS0/wNhhlmc3J1c/s1600-h/step5+-+PreSolve.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" sr="true" src="http://2.bp.blogspot.com/_sD0X6TOq9Vw/Sv5XtrMrAjI/AAAAAAAAAS0/wNhhlmc3J1c/s320/step5+-+PreSolve.PNG" /></a><br />
</div><br />
post-solve solution board - before implementing algorithm 3<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/_sD0X6TOq9Vw/Sv5Xv7MhylI/AAAAAAAAAS8/jStSoUSy_Jk/s1600-h/step5+-+PreSolveSoln.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" sr="true" src="http://1.bp.blogspot.com/_sD0X6TOq9Vw/Sv5Xv7MhylI/AAAAAAAAAS8/jStSoUSy_Jk/s400/step5+-+PreSolveSoln.PNG" /></a><br />
</div><br />
We see that the current solve methods cannot handle this puzzle. Lets build the next algorithm.<br />
<br />
<b>Implementation of RunSolveAlgorithm3:</b><br />
The next algorithm is again an implementation from <a href="http://www.sudokusolver.co.uk/solvemethods.html">sudoku solver </a>which they call the solve method B.<br />
It is a little complex and I enjoyed implementing this one. You see that any row in the board will belong to three 3X3 block and the row will have 3 cells in each block. Now, check if there are any values in the row occur only in one of the 3 blocks. If it does, it means that for that particular block, the value exists in that row. So, that value can be removed from the other 6 cells in the block. We do the same for the columns. <br />
<br />
We can do the check the other way round too, where any block will belong to 3 rows and will have 3 cells in each row. Now, check if there are any values in the block occur only in one of the 3 rows. If it does, it means that for that particular row, the value exists in that block. So, that value can be removed from the other 6 cells in the row. We can do it similarly for columns. <br />
<br />
With the algorithm explained, here is the implementation:<br />
<br />
<pre style="background-color: #eeeeee; border-bottom: rgb(153,153,153) 1px dashed; border-left: rgb(153,153,153) 1px dashed; border-right: rgb(153,153,153) 1px dashed; border-top: rgb(153,153,153) 1px dashed; color: black; font-family: Andale Mono,Lucida Console,Monaco,fixed,monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>ALTER PROC RunSolveAlgorithm3
AS
SET NOCOUNT ON
BEGIN
DECLARE @RowCount SMALLINT,
@Counter SMALLINT
SET @RowCount = 1
SET @Counter = 1
WHILE(@RowCount > 0 AND dbo.VerifySolve() = 0)
BEGIN
SET @RowCount = 0;
SET @Counter = 0;
WHILE (@Counter <=9)
BEGIN
/*For each row, check if any number occur only in a specific block, then the number will be
in that row, remove it from all other rows in that block */
WITH YSOL AS
(
SELECT YPOS,(XPOS-1)/3 + 1 AS XBLK, SUBSTRING(VAL,NUM,1) AS VAL FROM SOLUTION_BOARD A, NUMBERS B
WHERE B.NUM <=LEN(A.VAL)
AND LEN(VAL) > 1
GROUP BY YPOS,(XPOS-1)/3 + 1 ,SUBSTRING(VAL,NUM,1)
),
YSOL_DEL AS
(
SELECT D.NUM AS XPOS,C.NUM AS YPOS,A.VAL FROM YSOL A LEFT OUTER JOIN YSOL B ON A.YPOS = B.YPOS
AND A.XBLK <> B.XBLK AND A.VAL= B.VAL
INNER JOIN NUMBERS C ON (A.YPOS -1)/3 =(C.NUM-1)/3
INNER JOIN NUMBERS D ON A.XBLK =(D.NUM-1)/3+1
WHERE B.YPOS IS NULL
AND A.YPOS <> C.NUM
AND A.VAL = @Counter
)
UPDATE SOL SET VAL = REPLACE(SOL.VAL,YDEL.VAL,'')
FROM SOLUTION_BOARD SOL, YSOL_DEL YDEL
WHERE SOL.XPOS = YDEL.XPOS
AND SOL.YPOS = YDEL.YPOS
AND LEN(SOL.VAL) > 1;
/*For each column, check if any number occur only in a specific block, then the number will be
in that column, remove it from all other columns in that block */
WITH XSOL AS
(
SELECT XPOS,(YPOS-1)/3 + 1 AS YBLK, SUBSTRING(VAL,NUM,1) AS VAL FROM SOLUTION_BOARD A, NUMBERS B
WHERE B.NUM <=LEN(A.VAL)
AND LEN(VAL) > 1
GROUP BY XPOS,(YPOS-1)/3 + 1 ,SUBSTRING(VAL,NUM,1)
),
XSOL_DEL AS
(
SELECT D.NUM AS YPOS,C.NUM AS XPOS,A.VAL FROM XSOL A LEFT OUTER JOIN XSOL B ON A.XPOS = B.XPOS
AND A.YBLK <> B.YBLK AND A.VAL= B.VAL
INNER JOIN NUMBERS C ON (A.XPOS -1)/3 =(C.NUM-1)/3
INNER JOIN NUMBERS D ON A.YBLK =(D.NUM-1)/3+1
WHERE B.XPOS IS NULL
AND A.XPOS <> C.NUM
AND A.VAL = @Counter
)
UPDATE SOL SET VAL = REPLACE(SOL.VAL,XDEL.VAL,'')
FROM SOLUTION_BOARD SOL, XSOL_DEL XDEL
WHERE SOL.YPOS = XDEL.YPOS
AND SOL.XPOS = XDEL.XPOS
AND LEN(SOL.VAL) > 1;
/*For each block, check if any number occur only in a specific row, then the number will be
in that block, remove it from all other blocks in that row*/
WITH YBLK AS
(
SELECT ((YPOS-1)/3)*3 + (XPOS-1)/3 + 1 AS BPOS, YPOS,
SUBSTRING(VAL,NUM,1) AS VAL
FROM SOLUTION_BOARD A, NUMBERS B
WHERE B.NUM <=LEN(A.VAL)
AND LEN(VAL) > 1
GROUP BY
((YPOS-1)/3)*3 + (XPOS-1)/3,YPOS,SUBSTRING(VAL,NUM,1)
),
YBLK_DEL AS
(
SELECT A.YPOS,C.NUM AS XPOS ,A.VAL
FROM YBLK A LEFT OUTER JOIN YBLK B
ON A.BPOS = B.BPOS
AND A.YPOS <> B.YPOS
AND A.VAL= B.VAL
INNER JOIN NUMBERS C ON 1 = 1
WHERE B.YPOS IS NULL
AND A.BPOS = 8
AND (C.NUM-1)/3 + 1 <> A.BPOS - ((A.YPOS-1)/3)*3
AND A.VAL = @Counter
)
UPDATE SOL SET VAL = REPLACE(SOL.VAL,YDEL.VAL,'')
FROM SOLUTION_BOARD SOL, YBLK_DEL YDEL
WHERE SOL.XPOS = YDEL.XPOS
AND SOL.YPOS = YDEL.YPOS
AND LEN(SOL.VAL) > 1;
/*For each block, check if any number occur only in a specific column, then the number will be
in that block, remove it from all other blocks in that column*/
WITH XBLK AS
(
SELECT ((XPOS-1)/3)*3 + (YPOS-1)/3 + 1 AS BPOS, XPOS,
SUBSTRING(VAL,NUM,1) AS VAL
FROM SOLUTION_BOARD A, NUMBERS B
WHERE B.NUM <=LEN(A.VAL)
AND LEN(VAL) > 1
GROUP BY
((XPOS-1)/3)*3 + (YPOS-1)/3,XPOS,SUBSTRING(VAL,NUM,1)
),
XBLK_DEL AS
(
SELECT A.XPOS, C.NUM AS YPOS ,A.VAL
FROM XBLK A LEFT OUTER JOIN XBLK B
ON A.BPOS = B.BPOS
AND A.XPOS <> B.XPOS
AND A.VAL= B.VAL
INNER JOIN NUMBERS C ON 1 = 1
WHERE B.XPOS IS NULL AND
A.BPOS = 8
AND (C.NUM-1)/3 + 1 <> A.BPOS - ((A.XPOS-1)/3)*3
AND A.VAL = @Counter
)
UPDATE SOL SET VAL = REPLACE(SOL.VAL,XDEL.VAL,'')
FROM SOLUTION_BOARD SOL, XBLK_DEL XDEL
WHERE SOL.YPOS = XDEL.YPOS
AND SOL.XPOS = XDEL.XPOS
AND LEN(SOL.VAL) > 1;
SET @Counter = @Counter + 1
END
/* If the above updates led to a determining the value of any cell in the solution
board (Only one digit exists in the cell), then we update the sudoku_board */
UPDATE SUD
SET VAL = SOL.VAL
FROM SUDOKU_BOARD SUD, SOLUTION_BOARD SOL
WHERE SUD.XPOS = SOL.XPOS
AND SUD.YPOS = SOL.YPOS
AND LEN(SOL.VAL) = 1
AND SUD.VAL IS NULL
SET @RowCount = @@ROWCOUNT
/* We rerun SolveAlgorithms 1 and 2 to see if there are any more solves possible */
EXEC RunSolveAlgorithm1;
EXEC RunSolveAlgorithm2;
END
END
GO
</code></pre><br />
post-solve sudoku board - after implementing Algorithm 3 (Solved)<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/_sD0X6TOq9Vw/Sv5YHV94Y9I/AAAAAAAAATM/djsEp143oZE/s1600-h/step5+-+PostSolve.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" sr="true" src="http://2.bp.blogspot.com/_sD0X6TOq9Vw/Sv5YHV94Y9I/AAAAAAAAATM/djsEp143oZE/s320/step5+-+PostSolve.PNG" /></a><br />
</div><br />
With this implementation, we should be able to solve most medium and some hard puzzles. There are a lot more well known algorithms available to solve the puzzle logically, which should be equally interesting to implement. If you come up with an implementation other than the ones given here, please feel free to post the link or the actual implementation in the comments section. But, I am done with my implementations for now. The <a href="http://omnibuzz-sql.blogspot.com/2009/11/solving-sudoku-using-sql-server-2005_3068.html">next post</a>, our last algorithm, will be a brute force algorithm, which will be a fall back in case our first 3 algorithms are not able to solve the puzzle.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-29286778.post-4819437514894498082009-11-14T01:37:00.006-05:002009-11-23T10:02:58.317-05:00Solving Sudoku using SQL Server 2005 - Step by Step - Part #4<strong>Implementation of RunSolveAlgorithm2:</strong><br />
<br />
We implemented <a href="http://omnibuzz-sql.blogspot.com/2009/11/solving-sudoku-using-sql-server-2005_12.html">RunSolveAlgorithm1</a> in previous post of this <a href="http://omnibuzz-sql.blogspot.com/2009/11/solving-sudoku-using-sql-server-2005.html">series</a> . The next algorithm is the implementation of Solve Method A from <a href="http://www.sudokusolver.co.uk/solvemethods.html">sudoku solver</a>.<br />
<br />
In this algorithm, we check all the cells (having mutiple values) in each row and see if a particular value occurs only once in that row. Then update that as the solution for the cell having that value. We do the similar check for column and the 3X3 block. <br />
<br />
<a name='more'></a>This can solve the easy to medium puzzles. Here goes the implementation.<br />
<br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>ALTER PROC RunSolveAlgorithm2
AS
SET NOCOUNT ON
BEGIN
DECLARE @RowCount int,
@UpdateRowCount int
SET @RowCount = 1
SET @UpdateRowCount = 0
WHILE(@RowCount > 0 AND dbo.VerifySolve() = 0)
BEGIN
SET @RowCount = 0;
/* Take all the cells, having mutiple values, in each row and see if a particular value occurs only
once in that row. Then update that as the solution for the cell */
WITH XSOL AS
(
SELECT XPOS,SUBSTRING(VAL,NUM,1) AS VAL FROM SOLUTION_BOARD A, NUMBERS B
WHERE B.NUM <=LEN(A.VAL)
AND LEN(VAL) > 1
GROUP BY XPOS,SUBSTRING(VAL,NUM,1) HAVING COUNT(*) = 1
)
UPDATE SOL
SET VAL = XSOL.VAL
FROM SOLUTION_BOARD SOL, XSOL
WHERE
SOL.XPOS = XSOL.XPOS
AND LEN(SOL.VAL) > 1
AND CHARINDEX(XSOL.VAL,SOL.VAL) > 0;
SET @UpdateRowCount = @@ROWCOUNT;
SET @RowCount = @RowCount + @UpdateRowCount;
IF(@UpdateRowCount > 0) /* Need to rerun algorithm 1 for clean up if any cell was updated */
EXEC RunSolveAlgorithm1;
/* Take all the cells, having mutiple values, in each column and see if a particular value occurs only
once in that column. Then update that as the solution for the cell */
WITH YSOL AS
(
SELECT YPOS,SUBSTRING(VAL,NUM,1) AS VAL FROM SOLUTION_BOARD A, NUMBERS B
WHERE B.NUM <=LEN(A.VAL)
AND LEN(VAL) > 1
GROUP BY YPOS,SUBSTRING(VAL,NUM,1) HAVING COUNT(*) = 1
)
UPDATE SOL
SET VAL = YSOL.VAL
FROM SOLUTION_BOARD SOL, YSOL
WHERE
SOL.YPOS = YSOL.YPOS
AND LEN(SOL.VAL) > 1
AND CHARINDEX(YSOL.VAL,SOL.VAL) > 0;
SET @UpdateRowCount = @@ROWCOUNT;
SET @RowCount = @RowCount + @UpdateRowCount;
IF(@UpdateRowCount > 0) /* Need to rerun algorithm 1 for clean up if any cell was updated */
EXEC RunSolveAlgorithm1;
/* Take all the cells, having mutiple values, in each 3X3 block and see if a particular value occurs only
once in that block. Then update that as the solution for the cell */
WITH BSOL AS
(
SELECT ((YPOS-1)/3)*3 + (XPOS-1)/3 AS BPOS,SUBSTRING(VAL,NUM,1) AS VAL FROM SOLUTION_BOARD A, NUMBERS B
WHERE B.NUM <=LEN(A.VAL)
AND LEN(VAL) > 1
GROUP BY ((YPOS-1)/3)*3 + (XPOS-1)/3,SUBSTRING(VAL,NUM,1) HAVING COUNT(*) = 1
)
UPDATE SOL
SET VAL = BSOL.VAL
FROM SOLUTION_BOARD SOL, BSOL
WHERE
((SOL.YPOS-1)/3)*3 + (SOL.XPOS-1)/3 = BSOL.BPOS
AND LEN(SOL.VAL) > 1
AND CHARINDEX(BSOL.VAL,SOL.VAL) > 0;
SET @UpdateRowCount = @@ROWCOUNT;
SET @RowCount = @RowCount + @UpdateRowCount;
IF(@UpdateRowCount > 0) /* Need to rerun algorithm 1 for clean up if any cell was updated */
EXEC RunSolveAlgorithm1;
END
END
GO
</code></pre><br />
When I call the proc SolveSudoku now, you can see that the problem is solved and when solved, the solution board and sudoku board are in sync.<br />
<br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>EXEC SolveSudoku
'790,000,300,,000,006,900,,800,030,076,,000,005,002,,005,418,700,,400,700,000,,610,090,008,,002,300,000,,009,000,054'
</code></pre><br />
post-solve sudoku board - before implementing Algorithm 2<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/_sD0X6TOq9Vw/Sv5Om-9GUtI/AAAAAAAAASU/5zBgTTbEHW4/s1600-h/step3+-+PostSolve.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" sr="true" src="http://2.bp.blogspot.com/_sD0X6TOq9Vw/Sv5Om-9GUtI/AAAAAAAAASU/5zBgTTbEHW4/s320/step3+-+PostSolve.PNG" /></a><br />
</div><br />
post-solve sudoku board - after implementing Algorithm 2 (Solved)<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/_sD0X6TOq9Vw/Sv5OpSxmyZI/AAAAAAAAASk/2efZ_KlTFiw/s1600-h/step4+-+PostSolve.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" sr="true" src="http://4.bp.blogspot.com/_sD0X6TOq9Vw/Sv5OpSxmyZI/AAAAAAAAASk/2efZ_KlTFiw/s320/step4+-+PostSolve.PNG" /></a><br />
</div><br />
post-solve solution board - before implementing Algorithm 2<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/_sD0X6TOq9Vw/Sv5OoNRwU0I/AAAAAAAAASc/JLcWSC818Ts/s1600-h/step3+-+PostSolveSoln.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" sr="true" src="http://1.bp.blogspot.com/_sD0X6TOq9Vw/Sv5OoNRwU0I/AAAAAAAAASc/JLcWSC818Ts/s400/step3+-+PostSolveSoln.PNG" /></a><br />
</div><br />
post-solve solution board - after implementing Algorithm 2 (Same as the sudoku board)<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/_sD0X6TOq9Vw/Sv5OqTmHWJI/AAAAAAAAASs/LiXq_T-47ug/s1600-h/step4+-+PostSolveSoln.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" sr="true" src="http://1.bp.blogspot.com/_sD0X6TOq9Vw/Sv5OqTmHWJI/AAAAAAAAASs/LiXq_T-47ug/s320/step4+-+PostSolveSoln.PNG" /></a><br />
</div><br />
For the <a href="http://omnibuzz-sql.blogspot.com/2009/11/solving-sudoku-using-sql-server-2005_7764.html">next algorithm</a> will take up a harder puzzle and see how well we fare.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-29286778.post-58642252400743734952009-11-12T15:14:00.007-05:002009-11-23T10:06:00.570-05:00Solving Sudoku using SQL Server 2005 - Step by Step - Part #3<strong>Implementation of RunSolveAlgorithm1:</strong><br />
<br />
In the previous <a href="http://omnibuzz-sql.blogspot.com/2009/11/solving-sudoku-using-sql-server-2005_10.html">post</a> of this <a href="http://omnibuzz-sql.blogspot.com/2009/11/solving-sudoku-using-sql-server-2005.html">series</a>, we created the procedure stub for each algorithm. This post will implement RunSolveAlgorithm1.<br />
<br />
The first algorithm will do the primary clean up on the solution board. It will implement the basic rules of Sudoku. The rule is that a number cannot appear in a cell if it already appears in any other cell in the same row or column or the block in the sudoku board. It will remove those numbers from the possible cadidate values of each cell in the solution board. <br />
<br />
<a name='more'></a>This cannot usually solve the puzzle completely unless the problem is extremely easy. But, this will make sure our other algorithms need not worry about figuring out the obvious.<br />
<br />
So, here goes the implementation. Once you understand the row level update, the column and block level update are pretty much the same. I have added comments in place, so it should be fairly easy to understand.<br />
<br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>ALTER PROC RunSolveAlgorithm1
AS
SET NOCOUNT ON
BEGIN
DECLARE @RowCount SMALLINT,
@Counter SMALLINT
SET @RowCount = 1
SET @Counter = 1
/* We will keep running this algorithm till there are no more
cells left to update in the sudoku board */
WHILE(@RowCount > 0 AND dbo.VerifySolve() = 0)
BEGIN
/* We need to use the counter logic to make sure we don't do
mutiple updates on the same value such that it over writes
each other */
SET @Counter = 0;
WHILE (@Counter <=9)
BEGIN
/*Remove the number from each cell in solution board if it exists in any of the cells
in the same column from the sudoku board */
WITH SUD_XPOS AS
(
SELECT XPOS,VAL,
ROW_NUMBER() OVER (PARTITION BY XPOS ORDER BY VAL ) AS COUNTER
FROM SUDOKU_BOARD
WHERE VAL IS NOT NULL
)
UPDATE SOL SET VAL = REPLACE(SOL.VAL,SUD.VAL,'')
FROM SOLUTION_BOARD SOL, SUD_XPOS SUD
WHERE SOL.XPOS = SUD.XPOS
AND LEN(SOL.VAL) > 1
AND COUNTER = @Counter;
/*Remove the number from each cell in solution board if it exists in any of the cells
in the same row from the sudoku board */
WITH SUD_YPOS AS
(
SELECT YPOS,VAL,
ROW_NUMBER() OVER (PARTITION BY YPOS ORDER BY VAL ) AS COUNTER
FROM SUDOKU_BOARD
WHERE VAL IS NOT NULL
)
UPDATE SOL SET VAL = REPLACE(SOL.VAL,SUD.VAL,'')
FROM SOLUTION_BOARD SOL, SUD_YPOS SUD
WHERE SOL.YPOS = SUD.YPOS
AND LEN(SOL.VAL) > 1
AND COUNTER = @Counter;
/*Remove the number from each cell in solution board if it exists in any of the cells
in the same 3X3 block from the sudoku board */
WITH SUD_BLOCK AS
(
SELECT ((YPOS-1)/3)*3 + (XPOS-1)/3 AS BPOS,VAL,
ROW_NUMBER() OVER (PARTITION BY ((YPOS-1)/3)*3 + (XPOS-1)/3 ORDER BY VAL ) AS COUNTER
FROM SUDOKU_BOARD
WHERE VAL IS NOT NULL
)
UPDATE SOL SET VAL = REPLACE(SOL.VAL,SUD.VAL,'')
FROM SOLUTION_BOARD SOL, SUD_BLOCK SUD
WHERE ((SOL.YPOS-1)/3)*3 + (SOL.XPOS-1)/3 = SUD.BPOS
AND LEN(SOL.VAL) > 1
AND COUNTER = @Counter;
SET @Counter = @Counter + 1
END /* WHILE (@Counter <=9) */
/*If the above updates led to a determining the value of any cell in the solution
board (Only one digit exists in the cell), then we update the sudoku_board */
UPDATE SUD
SET VAL = SOL.VAL
FROM SUDOKU_BOARD SUD, SOLUTION_BOARD SOL
WHERE SUD.XPOS = SOL.XPOS
AND SUD.YPOS = SOL.YPOS
AND LEN(SOL.VAL) = 1
AND SUD.VAL IS NULL
SET @RowCount = @@ROWCOUNT
END /* WHILE(@RowCount > 0 AND dbo.VerifySolve() = 0) */
END
</code></pre><br />
When I call the proc SolveSudoku, you can see that though the problem is not solved, it does fill a few more cells in the sudoku board and the solution board has a lot less choices in the unsolved cells.<br />
<br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>EXEC SolveSudoku
'790,000,300,,000,006,900,,800,030,076,,000,005,002,,005,418,700,,400,700,000,,610,090,008,,002,300,000,,009,000,054'
</code></pre><br />
post-solve sudoku board - before implementing Algorithm 1<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/_sD0X6TOq9Vw/SvxpK1yjLcI/AAAAAAAAAR0/gouRu0akKdM/s1600-h/step2+-+PreSolve.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" sr="true" src="http://1.bp.blogspot.com/_sD0X6TOq9Vw/SvxpK1yjLcI/AAAAAAAAAR0/gouRu0akKdM/s320/step2+-+PreSolve.PNG" /></a><br />
</div><br />
post-solve sudoku board - after implementing Agorithm 1 <br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/_sD0X6TOq9Vw/SvxrVjJhQNI/AAAAAAAAASE/bN8-OpxHfVQ/s1600-h/step3+-+PostSolve.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" sr="true" src="http://2.bp.blogspot.com/_sD0X6TOq9Vw/SvxrVjJhQNI/AAAAAAAAASE/bN8-OpxHfVQ/s320/step3+-+PostSolve.PNG" /></a><br />
</div><br />
post-solve solution board - before implementing Algorithm 1<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/_sD0X6TOq9Vw/SvxrRDAaOVI/AAAAAAAAAR8/nxhDm_NpBnM/s1600-h/step2+-+PostSolveSoln.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" sr="true" src="http://1.bp.blogspot.com/_sD0X6TOq9Vw/SvxrRDAaOVI/AAAAAAAAAR8/nxhDm_NpBnM/s640/step2+-+PostSolveSoln.PNG" /></a><br />
</div><br />
post-solve solution board - after implementing Agorithm 1<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/_sD0X6TOq9Vw/SvxrY2UDOJI/AAAAAAAAASM/BjFrXrJi4MA/s1600-h/step3+-+PostSolveSoln.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" sr="true" src="http://1.bp.blogspot.com/_sD0X6TOq9Vw/SvxrY2UDOJI/AAAAAAAAASM/BjFrXrJi4MA/s400/step3+-+PostSolveSoln.PNG" /></a><br />
</div><br />
That ends this post. Lets see if the <a href="http://omnibuzz-sql.blogspot.com/2009/11/solving-sudoku-using-sql-server-2005_14.html">next algorithm</a> solves this puzzle.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-29286778.post-18354419306807989152009-11-10T13:05:00.008-05:002009-11-23T10:05:02.249-05:00Solving Sudoku using SQL Server 2005 - Step by Step - Part #2In the <a href="http://omnibuzz-sql.blogspot.com/2009/11/solving-sudoku-using-sql-server-2005.html">first part</a> of this series, we created the base objects needed to work on our solution. Now, there are two parts to building the solution:<br />
<ul><li>The core algorithms that will solve the puzzle for us</li>
<li>The surrounding objects that will facilitate the solve and execute the core algorithms</li>
</ul>The core algorithms are the crux of this whole exercise. Each of these algorithm will ideally take the unsolved sudoku board and try to fill as many cells as possible using the logic implemented in the specific alogorithm.<br />
<br />
<a name='more'></a>We will be creating the procedure stubs for the core algorithms (right now I assumed we will have 4 of them) which will be implemented later. All my future posts in these series will be about implementing each of the algorithm stubs given below.<br />
<div><br />
</div><pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>/* YET TO IMPLEMENT */
CREATE PROC RunSolveAlgorithm1
AS
BEGIN
PRINT 'SOLVE ALGORITHM 1 NOT IMPLEMENTED'
RETURN 0
END
GO
/* YET TO IMPLEMENT */
CREATE PROC RunSolveAlgorithm2
AS
BEGIN
PRINT 'SOLVE ALGORITHM 2 NOT IMPLEMENTED'
RETURN 0
END
GO
/* YET TO IMPLEMENT */
CREATE PROC RunSolveAlgorithm3
AS
BEGIN
PRINT 'SOLVE ALGORITHM 3 NOT IMPLEMENTED'
RETURN 0
END
GO
/* YET TO IMPLEMENT */
CREATE PROC RunSolveAlgorithm4
AS
BEGIN
PRINT 'SOLVE ALGORITHM 4 NOT IMPLEMENTED'
RETURN 0
END
GO
</code></pre><br />
Here is the main procedure that will be called to solve the sudoku puzzle. It prints the sudoku board immediately after loading the puzzle (pre-solve) and again after running all the algorithms (post-solve).<br />
<br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>/* This proc is the main proc that will accept the problem as an input and do the following:
Verify that the input is valid
Load the problem into the sudoku and solution board
Call the algorithms to solve the problem
Sample Inputs:
EXEC SolveSudoku
'030,001,000,,006,000,050,,500,000,983,,080,006,302,,000,050,000,,903,800,060,,714,000,009,,020,000,800,,000,400,030'
EXEC SolveSudoku
'790,000,300,,000,006,900,,800,030,076,,000,005,002,,005,418,700,,400,700,000,,610,090,008,,002,300,000,,009,000,054'
*/
CREATE PROC SolveSudoku
(@in_szProblem varchar(120))
AS
SET NOCOUNT ON
BEGIN
declare @szProblem varchar(81)
set @szProblem = replace(@in_szProblem,',','')
/* CHECK IF THE DATA IS VALID */
IF(PATINDEX('%[^0-9]%', @szProblem) > 0)
BEGIN
UPDATE SUDOKU_BOARD SET VAL = NULL;
PRINT 'BAD DATA'
RETURN 0
END
/* PROCEDURE TO LOAD THE INPUT DATA INTO SUDOKU AND THE SOLUTION BOARD */
EXEC LoadInputData @szProblem;
/* RUN SOLVE ALGORITHM 1 */
EXEC RunSolveAlgorithm1
IF(dbo.VerifySolve() = 1) /*SOLVED*/
BEGIN
EXEC PrintBoard
PRINT 'SOLVED'
RETURN 0
END
/* RUN SOLVE ALGORITHM 2 */
EXEC RunSolveAlgorithm2
IF(dbo.VerifySolve() = 1) /*SOLVED*/
BEGIN
EXEC PrintBoard
PRINT 'SOLVED'
RETURN 0
END
/* RUN SOLVE ALGORITHM 3 */
EXEC RunSolveAlgorithm3
IF(dbo.VerifySolve() = 1) /*SOLVED*/
BEGIN
EXEC PrintBoard
PRINT 'SOLVED'
RETURN 0
END
/* RUN SOLVE ALGORITHM 4 */
EXEC RunSolveAlgorithm4
IF(dbo.VerifySolve() = 1) /*SOLVED*/
BEGIN
EXEC PrintBoard
PRINT 'SOLVED'
RETURN 0
END
EXEC PrintBoard /*UNSOLVED*/
PRINT 'UNSOLVED'
RETURN 0
END
GO
</code></pre><br />
<div> The main procedure accepts the input problem as a string. It verifies to make sure the input string contains only numeric data and loads the data into the solution and sudoku board using the following procedure:<br />
</div><div><br />
</div><pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>/* This procedure will place the problem in the sudoku board and
the starting solution used for processing in the solution board.
Input data is a contiguous string filling cells from left to right, top to bottom.
Blank cells will have 0
*/
CREATE PROC LoadInputData
(@in_szData varchar(81))
AS
SET NOCOUNT ON
BEGIN
UPDATE SUDOKU_BOARD
SET VAL = CASE WHEN substring(@in_szData,(YPOS-1)*9 + XPOS,1) = '0'
THEN NULL
ELSE substring(@in_szData,(YPOS-1)*9 + XPOS,1) END
UPDATE SOLUTION_BOARD
SET VAL = CASE WHEN substring(@in_szData,(YPOS-1)*9 + XPOS,1) = '0'
THEN '123456789'
ELSE substring(@in_szData,(YPOS-1)*9 + XPOS,1) END
EXEC PrintBoard
END
GO
</code></pre><br />
You will see that the main procedure calls a function VerifySolve() after running each algorithm. This will check for the following in the sudoku board:<br />
<ul><li>the sum of value of cells across each row (summing up values grouping by YPOS) = 45</li>
<li>the sum of value of cells across each column (summing up values grouping by XPOS) = 45</li>
<li>the sum of value of cells across each 3X3 block(summing up values grouping by (YPOS-1)/3)*3 + (XPOS-1)/3) = 45</li>
<li>There are nine unique values filling the board (An over kill to make sure any scenario that escapes the above 3 checks is caught here). For example, filling the board with all cells as 5 will pass the first 3 checks</li>
</ul>Here is the implementation of the verification function: <br />
<br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>/* This function can be called anytime to verify if the sudoku board has a complete solution */
CREATE FUNCTION dbo.VerifySolve()
RETURNS BIT
AS
BEGIN RETURN(
SELECT CASE WHEN COUNT(DISTINCT SUM_VAL) = 1
AND MAX(SUM_VAL)=45
AND (SELECT COUNT(DISTINCT VAL) FROM SUDOKU_BOARD) = 9
THEN 1 ELSE 0 END
FROM
(
SELECT SUM(VAL) AS SUM_VAL FROM SUDOKU_BOARD GROUP BY ((YPOS-1)/3)*3 + (XPOS-1)/3
UNION ALL
SELECT SUM(VAL) FROM SUDOKU_BOARD GROUP BY YPOS
UNION ALL
SELECT SUM(VAL) FROM SUDOKU_BOARD GROUP BY XPOS
) AS GROUP_TOTAL)
END
GO
</code></pre><br />
Now, we have the basic solution in place. As you can see, we are no closer to solving the puzzle than we were when we started this series. We are just making sure that when we implement an algorithm, we will concentrate on just the core algorithm and nothing else. Since none of the algorithms are implemented, when we call the procedure SolveSudoku, it will just print the sudoku board without solving it. As we start implementing each algorithm, SolveSudoku will start filling more and more cells in the board. Right now, the pre-solve (before running the algorithms) and post-solve (after running the algorithms) sudoku board is the same. Here is an example:<br />
<br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>EXEC SolveSudoku
'790,000,300,,000,006,900,,800,030,076,,000,005,002,,005,418,700,,400,700,000,,610,090,008,,002,300,000,,009,000,054'
</code></pre><br />
pre-solve sudoku board:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://3.bp.blogspot.com/_sD0X6TOq9Vw/Svwxcc-DVNI/AAAAAAAAARc/6b5by6bjPkU/s1600-h/step2+-+PreSolve.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" sr="true" src="http://3.bp.blogspot.com/_sD0X6TOq9Vw/Svwxcc-DVNI/AAAAAAAAARc/6b5by6bjPkU/s320/step2+-+PreSolve.PNG" /></a><br />
</div>post-solve sudoku board:<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/_sD0X6TOq9Vw/SvwxdywbQ3I/AAAAAAAAARk/xfi_lB7Y7I8/s1600-h/step2+-+PostSolve.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" sr="true" src="http://1.bp.blogspot.com/_sD0X6TOq9Vw/SvwxdywbQ3I/AAAAAAAAARk/xfi_lB7Y7I8/s320/step2+-+PostSolve.PNG" /></a><br />
</div><br />
post solve - solution board (EXEC PrintSolution)<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="http://1.bp.blogspot.com/_sD0X6TOq9Vw/SvwzRog8UmI/AAAAAAAAARs/NAeLasQXtHc/s1600-h/step2+-+PostSolveSoln.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" sr="true" src="http://1.bp.blogspot.com/_sD0X6TOq9Vw/SvwzRog8UmI/AAAAAAAAARs/NAeLasQXtHc/s640/step2+-+PostSolveSoln.PNG" /></a><br />
</div><br />
All that is left to do now is implementing the core algorithms. I will be giving the the pre-solve and post solve after implementing each core algorithms. The algorithms will start knocking off the invalid numbers from each cell in the solution board. We have to keep building better algorithms till each cell in the solution board is left with only one number. We have then solved the problem. Long way to go, huh?<br />
<br />
We will start with building the first algorithm in the <a href="http://omnibuzz-sql.blogspot.com/2009/11/solving-sudoku-using-sql-server-2005_12.html">next post</a>.Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-29286778.post-70386161540685693542009-11-09T18:31:00.008-05:002009-11-23T10:11:01.139-05:00Solving Sudoku using SQL Server 2005 - Step by Step - Part #1This one is going to be a series. I thought I was going to come up with a single query to solve Sudoku (without choosing the brute force method). When I started creating the tables I needed, I figured out there there are way too many aspects to solving Sudoku logically. I thought it would be a good idea to give a continous update as I go about building the solution. <br />
<br />
This post is all about creating the necessary tables and basic procedures which I think we need to build the solution. <br />
<br />
<a name='more'></a>First, I am creating a database called Sudoku which will hold all our objects and the numbers table which I hope we will use frequently.<br />
<br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>CREATE DATABASE SUDOKU
GO
USE SUDOKU
GO
/* NUMBERS TABLE USED TO LOAD DEFAULT TABLE DATA*/
CREATE TABLE NUMBERS
(
NUM INT NOT NULL
)
GO
/* FILL THE NUMBER TABLE */
DECLARE @COUNT INT
SET @COUNT = 0
WHILE (@COUNT <9)
BEGIN
SET @COUNT = @COUNT + 1
INSERT INTO NUMBERS SELECT @COUNT
END
GO
</code></pre><br />
We need 2 main tables to start with. One is the SUDOKU_BOARD which will hold the original sudoku problem and will be updated with the confirmed correct value of each cell that we identify in our solve. <br />
The second table is SOLUTION_BOARD, which is like a work table we will play around with to solve the problem. Solution board will hold all possible candidate values for each cell. Both the tables will be prefilled with 81 rows, one row for each cell in the sudoku board. Each row will hold the xy position of the cell and the value of the cell.<br />
<br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>/* THE PROBLEM TO BE SOLVED WILL BE PLACED IN THIS BOARD */
CREATE TABLE SUDOKU_BOARD
(
XPOS SMALLINT NOT NULL CHECK (XPOS BETWEEN 1 AND 9),
YPOS SMALLINT NOT NULL CHECK (YPOS BETWEEN 1 AND 9),
VAL SMALLINT NULL CHECK(VAL BETWEEN 1 AND 9)
)
GO
/* BUILD THE SUDOKU BOARD WITH ALL POSITIONS. THIS IS A ONE TIME OPERATION*/
INSERT INTO SUDOKU_BOARD(XPOS,YPOS)
SELECT X.NUM,Y.NUM FROM NUMBERS X, NUMBERS Y
GO
/* THE SOLUTION WILL BE DERIVED IN THIS BOARD FOR THE GIVEN PROBLEM*/
CREATE TABLE SOLUTION_BOARD
(
XPOS SMALLINT NOT NULL CHECK (XPOS BETWEEN 1 AND 9),
YPOS SMALLINT NOT NULL CHECK (YPOS BETWEEN 1 AND 9),
VAL VARCHAR(9) DEFAULT '123456789'
)
GO
/* BUILD THE SOLUTION BOARD WITH ALL POSITIONS AND DEFAULT VALUES*/
INSERT INTO SOLUTION_BOARD(XPOS,YPOS)
SELECT X.NUM,Y.NUM FROM NUMBERS X, NUMBERS Y
GO
</code></pre><br />
Though, the two tables are effective for our solve, it is not easily to comprehend the data. So, I am creating 2 procedures that will display the board in the way we are used to seeing it - one for viewing the sudoku board and the other for viewing the solution board. We can call these procedures whenever we might need to view either of the tables, during our solves.<br />
<br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>/* CALL THIS PROCEDURE TO VIEW THE SUDOKU BOARD AT ANY TIME*/
CREATE PROC PrintBoard
AS
SET NOCOUNT ON
BEGIN
SELECT TOP 100 PERCENT 'Y' + CAST(YPOS AS CHAR) AS 'Y/X',
ISNULL(CAST([1] AS VARCHAR(1)),'') AS [X1],
ISNULL(CAST([2] AS VARCHAR(1)),'') AS [X2],
ISNULL(CAST([3] AS VARCHAR(1)),'') AS [X3],
ISNULL(CAST([4] AS VARCHAR(1)),'') AS [X4],
ISNULL(CAST([5] AS VARCHAR(1)),'') AS [X5],
ISNULL(CAST([6] AS VARCHAR(1)),'') AS [X6],
ISNULL(CAST([7] AS VARCHAR(1)),'') AS [X7],
ISNULL(CAST([8] AS VARCHAR(1)),'') AS [X8],
ISNULL(CAST([9] AS VARCHAR(1)),'') AS [X9]
FROM SUDOKU_BOARD
PIVOT (SUM(VAL) FOR XPOS IN ([1],[2],[3],[4],[5],[6],[7],[8],[9])) AS SB
ORDER BY YPOS
END
GO
/* CALL THIS PROCEDURE TO VIEW THE SOLUTION BOARD AT ANY TIME*/
CREATE PROC PrintSolution
AS
SET NOCOUNT ON
BEGIN
SELECT TOP 100 PERCENT 'Y' + CAST(YPOS AS CHAR) AS 'Y/X',
[1] AS [X1],
[2] AS [X2],
[3] AS [X3],
[4] AS [X4],
[5] AS [X5],
[6] AS [X6],
[7] AS [X7],
[8] AS [X8],
[9] AS [X9]
FROM SOLUTION_BOARD
PIVOT (MIN(VAL) FOR XPOS IN ([1],[2],[3],[4],[5],[6],[7],[8],[9])) AS SB
ORDER BY YPOS
END
GO
</code></pre><br />
I guess, this should setup the environment that we need to start building our solution. Please feel free to use the scripts above if you want to come up with your own solution. <br />
<br />
If you plan to write a solution, I suggest you read through <a href="http://www.sudokusolver.co.uk/solvemethods.html">this link</a>. They have got a good read on solving sudoku by logic and have a javascript implementation of the same. The solution I plan to write, if not a direct implementation of their logic, will atleast be based on theirs. And I wish to give them the due credit.<br />
<br />
And if you do come up with a solution, please post it in the comments section. I would love to see it.<br />
<br />
The next part of this series is available <a href="http://omnibuzz-sql.blogspot.com/2009/11/solving-sudoku-using-sql-server-2005_10.html">here</a>.Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-29286778.post-1149685638801779732009-11-09T03:56:00.004-05:002009-11-23T10:12:21.833-05:00Object Oriented SQL Programming with SQL Server 2005In this post, I have attempted a crude implementation of Object Oriented SQL Programming using the APPLY operator in SQL Server 2005. I feel that giving some thought in these lines, we can bring in more flexibility, abstraction and reusability to the way we query the database and, may be, create a new style for data access. <br />
<br />
There is only one rule that I am going to follow here. All table access will be done using an INLINE TABLE VALUED function. No query will directly access the table. I would like to illustrate it with a small example. <br />
<br />
<a name='more'></a>I am going to use the AdventureWorks database and write queries to fetch sales order information. To follow the rules defined, I will be creating two functions to get the order header and detail information respectively.<br />
<br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>CREATE FUNCTION GetOrderHeader(@OrderID int)
RETURNS TABLE
AS
RETURN(
SELECT * FROM Sales.SalesOrderHeader
WHERE SalesOrderID = coalesce(@OrderID,SalesOrderID)
)
CREATE FUNCTION GetOrderDetail(@OrderID int,@DetailID int)
RETURNS TABLE
AS
RETURN(
SELECT * FROM Sales.SalesOrderDetail
WHERE SalesOrderID = coalesce(@OrderID,SalesOrderID)
AND SalesOrderDetailID = coalesce(@DetailID,SalesOrderDetailID)
)
</code></pre><br />
With the above 2 functions in place, here are some of the queries I can perform to fetch Order Information<br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>--Fetch all order header information
select * from GetOrderHeader(NULL) A
--Fetch order header information for order ID 43659
select * from GetOrderHeader(43659) A
--Fetch all order header/detail information
select * from GetOrderHeader(NULL) A
CROSS APPLY GetOrderDetail(A.SalesOrderID,NULL) B
--Fetch order header/detail information for Order ID 43659
select * from GetOrderHeader(43659) A
CROSS APPLY GetOrderDetail(A.SalesOrderID,NULL) B
--Fetch information for Order ID 43659 and Detail ID 1
select * from GetOrderHeader(43659) A
CROSS APPLY GetOrderDetail(A.SalesOrderID,1) B
</code></pre><br />
Adding another function to fetch the product information<br />
<br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>CREATE FUNCTION GetProductInfo(@ProductID int,@Name nvarchar(50))
RETURNS TABLE
AS
RETURN(
SELECT * FROM Production.Product
WHERE ProductID = coalesce(@ProductID,ProductID)
AND Name like '%' + isnull(@Name,'') + '%'
)
</code></pre><br />
I can now search for orders which has a particular product using this query<br />
<br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>select * from GetOrderHeader(43659) A
CROSS APPLY GetOrderDetail(A.SalesOrderID,NULL) B
CROSS APPLY GetProductInfo(B.ProductID,'Mountain') C
</code></pre><br />
According to me, the above query is easier to read and much more maintainable than the one that we will usually write:<br />
<br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>SELECT * FROM Sales.SalesOrderHeader A
INNER JOIN Sales.SalesOrderDetail B
ON A.SalesOrderID = B.SalesOrderID
INNER JOIN Production.Product
ON B.ProductID = C.ProductID
WHERE SalesOrderID = 43659
AND Name like '%Mountain%'
</code></pre><br />
This type of querying through functions has its own advantages. <br />
<ul><li>For instance, there might be different types of data access that can happen in a particular table(filter, check for existence, etc). Each of this requirement can be a implemented as a seperate function and functions will be written around a particular object, like <strong>Orders</strong>.</li>
<li>Sometimes, a table may be normalized for better data access. The function can join the normalized tables, code tables, if any. All these can be abstracted from the query </li>
<li>Query will be focus on business implementation while the function will focus on data access.</li>
</ul>Having said all these, SQL Server engine performs joins better than apply. <strong>So, none of the above suggestions can actually be implemented in a production system.</strong> This post is just an attempt to show that there is a scope for changing the way data access is done and if it is a better option or not is something that I leave as an open question. So, what do you think?Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-29286778.post-16818482594474153632009-11-06T01:58:00.003-05:002009-11-23T09:54:43.322-05:00A Scenario to Ponder #15I guess, by now, most of you are familiar with the SQL Server 2005 sample database - <strong>AdventureWorks</strong>. We will use one of the tables from this database to create our scenario - <strong>Production.BillOfMaterials</strong>. To simplify our requirements, we will be selecting only those records in this table, that has the EndDate as NULL.<br />
<br />
<a name='more'></a>Before I go ahead and explain the scenario, lets see how this table is built. Its a hierarchical table. It shows relationship of each component (Column: ComponentID) with its parent component (Column: ProductAssemblyID). There can be mutiple levels of hierarchy. <br />
<br />
For example, take ComponentID 749. It is the ultimate parent, since the ProductAssemblyID for this component is NULL. Finding the components which has the parent (ProductAssemblyID) as 749, we see that we need 14 other components to build 749. One such child component is 519 which inturn needs 4 other components for it to be built and so on.<br />
<br />
<strong>Now, here is the scenario:</strong><br />
Our company uses this table Production.BillOfMaterials to identify the bill of materials for any component. Lets say that the company has decided to discontinue production/use of a particular Component A. <br />
<br />
To optimize the inventory, we need to come up with a query using this table that will generate the list of components that can also be discontinued along with this component A based on the following rules:<br />
<ul><li>All parent components (at any level) which uses component A can be discontinued.</li>
<li>All child components (at any level) which is used to build component A but is not used to build any other component can be discontinued.</li>
<li>Child component (at any level) which is used to build a parent (dentified in rule 1) but is not used to build any other component can also be discontinued.</li>
</ul>Can you come up with a query to achieve this? The query will accept one input - the discontinued ComponentID.<br />
<br />
Please post your answer in the comments section.Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-29286778.post-4716273244483114662009-10-28T15:14:00.003-04:002009-11-23T10:14:43.155-05:00Understanding Transaction Isolation Levels in SQL ServerOne of the most important concepts pertaining to any DBMS that every database programmer must know is the <strong>Transaction Isolation Levels</strong>. Sometimes, even the most seasoned database developers get confused on how multiple connections running in different isolation levels affect each other. This post is an attempt to explain through examples on how the isolation levels differ from each other. <br />
<br />
<a name='more'></a>The following table (taken from SQL Server Books Online) shows the different isolation levels (SQL-99 Standard) and its effect on concurrency. <br />
<a href="http://2.bp.blogspot.com/_sD0X6TOq9Vw/SukIctrcMtI/AAAAAAAAAQE/xV_ufr1UYz0/s1600-h/Isolation+Levels.PNG"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5397854917771408082" src="http://2.bp.blogspot.com/_sD0X6TOq9Vw/SukIctrcMtI/AAAAAAAAAQE/xV_ufr1UYz0/s400/Isolation+Levels.PNG" style="display: block; height: 107px; margin: 0px auto 10px; text-align: center; width: 400px;" /></a> Read uncommitted is the lowest isolation level and allows highest concurrency with least locking. Serializable is highest isolation level with lowest concurrency and holds exclusive range lock on the data in transaction.<br />
Rather than looking at each Isolation Level and figuring out what it does, I thought it will make sense to understand how each concurrency problem is addressed as isolation level goes higher:<br />
<ul><li><strong>DIRTY READ</strong>: Ability to read uncommitted changes made by another transaction. </li>
<li><strong>NON-REPEATABLE READ</strong>: Read locks are released immediately after the data is read. If the first transaction, that allows non-repeatable read, selects a row from a table, a second transaction can update (and commit) the same row even before the first transaction is complete. It means that if the first transaction selects the data again after the second transaction has completed, it will read the updated data. </li>
<li><strong>PHANTOM</strong>: Read locks are held till the end of the transaction but no range locks are acquired. If the first transaction selects all the records in a table, a second transaction can insert (and commit) into the table. If the first transaction, reads through the table again, it will see the inserted (phantom) record. </li>
</ul>The table below is a different view of the isolation levels. We will be using this table as a reference for our testing. We will take each scenario and compare the highest isolation level that allows against the lowest isolation level that denies.<br />
<a href="http://1.bp.blogspot.com/_sD0X6TOq9Vw/SukKC0FeOEI/AAAAAAAAAQM/HFvQJdKZ-Jo/s1600-h/Isolation+Level+2.PNG"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5397856671837861954" src="http://1.bp.blogspot.com/_sD0X6TOq9Vw/SukKC0FeOEI/AAAAAAAAAQM/HFvQJdKZ-Jo/s640/Isolation+Level+2.PNG" style="display: block; margin: 0px auto 10px; text-align: center;" /></a> Let's first set the environment for testing. <br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>/* TEST_TABLE IS THE TABLE THAT WILL PARTICIPATE IN THE TRANSACTION */
CREATE TABLE TEST_TABLE
(
ID INT PRIMARY KEY,
NAME VARCHAR(10)
)
/* SOME TEST DATA FOR OUR TEST_TABLE */
INSERT INTO TEST_TABLE
SELECT 1, 'SMITH'
UNION ALL
SELECT 3, 'ADAMS'
UNION ALL
SELECT 5, 'SANDERS'
/* THIS TABLE WILL HOLD THE LOG OF WHAT IS HAPPENING. WILL BE EXPLAINED LATER */
CREATE TABLE TRANSACTION_STATUS
(
EXEC_ORDER INT IDENTITY(1,1),
PROCESS_DESCRIPTION VARCHAR(100),
ABSOLUTE_TIME DATETIME
)
/* A BETTER VIEW OF THE LOG TABLE FOR OUR ANALYSIS */
CREATE VIEW VW_TRANSACTION_STATUS_LOG
AS
SELECT TOP 100 PERCENT
TS1.PROCESS_DESCRIPTION,
DATEDIFF(SS,TS2.ABSOLUTE_TIME,TS1.ABSOLUTE_TIME) AS RELATIVE_TIME
FROM
TRANSACTION_STATUS TS1,
TRANSACTION_STATUS TS2
WHERE
TS2.EXEC_ORDER = 1
ORDER BY
TS1.EXEC_ORDER
</code></pre><br />
Now, we create a transaction (Connection 1) that will select from the TEST_TABLE twice, with a delay of 10 seconds between them. We will also be logging each step happening in the transaction into our TRANSACTION_STATUS table. Given below is a sample screen shot of connection 1. Taking the logging part out, only on the script within the selected area is pertinent to our testing.<br />
<br />
<a href="http://2.bp.blogspot.com/_sD0X6TOq9Vw/SuoTSYys42I/AAAAAAAAAQU/abxmh7Gn1pg/s1600-h/3Connection1.PNG"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5398148309970641762" src="http://2.bp.blogspot.com/_sD0X6TOq9Vw/SuoTSYys42I/AAAAAAAAAQU/abxmh7Gn1pg/s640/3Connection1.PNG" style="display: block; margin: 0px auto 10px; text-align: center;" /></a><br />
We create another transaction (connection 2), that will start 5 seconds after the first transaction (connection 1). The transaction will update a record in the test table and wait for 10 seconds and rollback the transaction.<br />
<br />
<a href="http://2.bp.blogspot.com/_sD0X6TOq9Vw/SuoTShkfWUI/AAAAAAAAAQc/h8iJZtsb7Io/s1600-h/4Connection2.PNG"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5398148312326953282" src="http://2.bp.blogspot.com/_sD0X6TOq9Vw/SuoTShkfWUI/AAAAAAAAAQc/h8iJZtsb7Io/s640/4Connection2.PNG" style="display: block; margin: 0px auto 10px; text-align: center;" /></a><br />
Now, if we start both the transactions simltaneously and look at the log, we will know the sequence of execution of commands across the two transactions.<br />
<strong>Note: </strong>TEST_TABLE will be refreshed to have the three rows as given below, before starting each test scenario.<br />
<a href="http://3.bp.blogspot.com/_sD0X6TOq9Vw/SuoX-_rbFgI/AAAAAAAAARU/NjMhVs5y1Ro/s1600-h/Table.PNG"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5398153474369852930" src="http://3.bp.blogspot.com/_sD0X6TOq9Vw/SuoX-_rbFgI/AAAAAAAAARU/NjMhVs5y1Ro/s400/Table.PNG" style="cursor: hand; display: block; height: 119px; margin: 0px auto 10px; text-align: center; width: 284px;" /></a>With, the setup done, lets start with the first scenario:<br />
<strong>Scenario 1 - Dirty Read:</strong> Comparing READ UNCOMMITTED and READ COMMITTED isolation level.<br />
READ UNCOMMITTED: The below screen shot shows 3 connections. I tried to keep the image pretty self-explanatory, but I will walk through the screen for the first one:<br />
<ul><li>Connection 1 runs at READ UNCOMMITTED isolation level. Connection 1 and 2 were started simultaneously. </li>
<li>You can see from the result set of Connection 1 that the second query of connection 1 reads through the uncommitted data of Connection 2. </li>
<li>Connection 3 shows the event log of the sequence of the commands between the first two connections. </li>
</ul><a href="http://3.bp.blogspot.com/_sD0X6TOq9Vw/SuoTS-1vzKI/AAAAAAAAAQk/E-vBsQPhN5Q/s1600-h/5Read+Uncommitted+1.PNG"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5398148320183962786" src="http://3.bp.blogspot.com/_sD0X6TOq9Vw/SuoTS-1vzKI/AAAAAAAAAQk/E-vBsQPhN5Q/s640/5Read+Uncommitted+1.PNG" style="display: block; margin: 0px auto 10px; text-align: center;" /></a> <br />
READ COMMITTED: Here is what happens when the first connection runs in READ COMMITTED isolation level.<br />
<a href="http://1.bp.blogspot.com/_sD0X6TOq9Vw/SuoXDkZagSI/AAAAAAAAAQs/I-tepYxrbWU/s1600-h/6Read+committed+1.PNG"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5398152453434278178" src="http://1.bp.blogspot.com/_sD0X6TOq9Vw/SuoXDkZagSI/AAAAAAAAAQs/I-tepYxrbWU/s640/6Read+committed+1.PNG" style="display: block; margin: 0px auto 10px; text-align: center;" /></a><strong></strong><br />
<strong>Scenario 2 - Non-repeatable Read: </strong>Comparing READ COMMITTED and REPEATABLE READ isolation level.<br />
READ COMMITTED: The second connection updates the table and commits. And connection 1 reads the updated data in the second select.<br />
<br />
<a href="http://2.bp.blogspot.com/_sD0X6TOq9Vw/SuoXEKyxjZI/AAAAAAAAAQ0/HzwHOpEahLg/s1600-h/7Read+committed+2.PNG"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5398152463741193618" src="http://2.bp.blogspot.com/_sD0X6TOq9Vw/SuoXEKyxjZI/AAAAAAAAAQ0/HzwHOpEahLg/s640/7Read+committed+2.PNG" style="display: block; margin: 0px auto 10px; text-align: center;" /></a> <br />
<br />
REPEATABLE READ: And the same in REPEATABLE READ isolation level.<br />
<br />
<a href="http://3.bp.blogspot.com/_sD0X6TOq9Vw/SuoXEYHdUeI/AAAAAAAAAQ8/YZN4uF1y8nk/s1600-h/8Repeatable+Read+1.PNG"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5398152467317608930" src="http://3.bp.blogspot.com/_sD0X6TOq9Vw/SuoXEYHdUeI/AAAAAAAAAQ8/YZN4uF1y8nk/s640/8Repeatable+Read+1.PNG" style="display: block; margin: 0px auto 10px; text-align: center;" /></a><br />
<br />
<strong>Phantom: </strong>Comparing REPEATABLE READ and SERIALIZABLE isolation level<br />
REPEATABLE READ: Connection 2 inserts a record and connection 1 reads the inserted record in the second select.<br />
<br />
<a href="http://4.bp.blogspot.com/_sD0X6TOq9Vw/SuoXEkBpTvI/AAAAAAAAARE/M1QHrtWkHKU/s1600-h/9Repeatable+Read+2.PNG"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5398152470514454258" src="http://4.bp.blogspot.com/_sD0X6TOq9Vw/SuoXEkBpTvI/AAAAAAAAARE/M1QHrtWkHKU/s640/9Repeatable+Read+2.PNG" style="display: block; margin: 0px auto 10px; text-align: center;" /></a><br />
<br />
<br />
SERIALIZABLE: Connection 2 had to wait for connection 1 to commit before it can insert.<br />
<a href="http://1.bp.blogspot.com/_sD0X6TOq9Vw/SuoXE_epAzI/AAAAAAAAARM/mONr8zu7lyI/s1600-h/9Serializable+1.PNG"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5398152477883826994" src="http://1.bp.blogspot.com/_sD0X6TOq9Vw/SuoXE_epAzI/AAAAAAAAARM/mONr8zu7lyI/s640/9Serializable+1.PNG" style="display: block; margin: 0px auto 10px; text-align: center;" /></a><br />
You can see that SERIALIZABLE isolation level gives the best data consistency within a transaction. However, any attempt to update the table is put on hold till the transaction is complete. Though, we would love to have all our transactions running in SERIALIZABLE isolation level, we have to go for the lowest isolation level (that will work for the transaction) to gain performance and allow concurrency.<br />
<br />
The default isolation level (READ COMMITTED) of SQL Server, should work in most scenarios as it makes sure you read only clean (committed) data. However, specific scenarios might require you to go for a higher level of isolation. I would try avoiding READ UNCOMMITTED isolation level (or using the NOLOCK table hint). Instead, as a best practice, try to keep the duration of transactions as short as possible. <br />
<br />
That ends this post. For further reading, there is a wealth of information on isolation levels on the web and BOL (of course). <br />
<br />
In the above post, I haven't considered READ COMMITTED SNAPSHOT and DATABASE SNAPSHOT introduced in SQL Server 2005. I hope I can write a post on that later. Till then, happy querying!!Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-29286778.post-19769173809044697412009-10-23T12:23:00.002-04:002009-11-23T10:23:24.819-05:00Custom Sorting and Paging in SQL Server Stored ProcedureThis post is about achieving pagination using stored procedure, without using dynamic SQL. The solution presented should work for both SQL Server 2000 and 2005.<br />
<br />
Most of the web application displaying transactional data will have a data grid with sorting and paging functionality and some filter criteria.<br />
<br />
We usually use the default sorting/paging functionality available in the datagrid. But, this requires the entire search resultset to be cached in the client side, which is fine for a few hundered records but can be a drag as the number of records increase.<br />
<br />
<a name='more'></a>I was pulled into a situation where there were millions of records in the table and the filter criteria was not good enough to bring down the number of records being sent to the client side.<br />
<br />
To fix this issue, we had to implement the sorting and paging in the database and send only the required data to be viewed in the page.<br />
<br />
A quite common solution is to use dynamic SQL. I however wanted to come up with a solution without using dynamic SQL. So here it goes.<br />
<br />
These are my requirements:<br />
1) In AdventureWorks database I need to write a stored procedure to select data from Sales.SalesOrderDetail<br />
2) The table should be joined with the Production.Product table to get the product name for the product code<br />
3) The SP should allow sorting (ASC or DESC) on the following columns<br />
<br />
<ul><li>ORDER_ID </li>
<li>TRACKING_NBR</li>
<li>PRODUCT_NAME </li>
<li>MODIFIED_DATE </li>
</ul>4) The default sorting will be based on ORDER_ID, DETAIL_ID<br />
5)The filtering can be done on columns available for sorting<br />
6) The SP will accept the page number and page size as input and return only that page based on the sort and filter codition<br />
7) The SP will also have to return the total records that are fetched based on the filter condition without pagination. This will be returned as an output parameter.<br />
<br />
The above requirements should encompass all the functionality expected out of a database side pagination.<br />
<br />
Below is the solution without using dynamic SQL.<br />
Please Note: I am not the guy who came out with all the techniques used to build the solution SP. Many of the techniques had been commonly used even before I knew what SQL Server was. I am just collating all the ideas in one place.<br />
<br />
Please feel free to give your comments on how to improve this. If you think I missed a common functionality required in the SP, let me know, I will be happy to add it.<br />
<br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>
CREATE PROC SalesOrderDetailCustomSort
(
@in_SalesOrderID INT = NULL, /* FILTER - ORDER ID */
@in_CarrierTrackingNumber VARCHAR(25) = '', /* FILTER - LIKE SEARCH ON TRACKING NUMBER */
@in_ProductName NVARCHAR(50) = '', /* FILTER - PRODUCT NAME */
@in_ModifiedDateFrom DATETIME = '1753-01-01', /* FILTER - MODIFIED DATE FROM */
@in_ModifiedDateTo DATETIME = '9999-12-31', /* FILTER - MODIFIED DATE TO */
@PageSize SMALLINT = 25, /* NUMBER OF RECORDS PER PAGE */
@TargetPage SMALLINT = 1, /* PAGE THAT NEEDS TO BE RETURNED */
@OrderBy VARCHAR(50) = '', /* ORDER BY COLUMN. WILL HAVE THE SAME NAME
AS THE RESULT SET COLUMN. IN OUR EXAMPLE
WE ALLOW SORTING ON THE FOLLOWING COLUMNS
ORDER_ID
TRACKING_NBR
PRODUCT_NAME
MODIFIED_DATE
*/
@SortOrder VARCHAR(4) = '', /* SORT ORDER - ASC OR DESC */
@TotalRecCount INT OUTPUT /* TOTAL NUMBER OF RECORDS FETCHED
BASED ON THE FILTERS INCLUDING
ALL PAGES */
)
AS
BEGIN
/* DECLARE LOCAL VARIABLES FOR FILTER CONDITIONS */
DECLARE @CarrierTrackingNumber VARCHAR(27), /* 2 CHARACTERS MORE THAN THE INPUT BECAUSE
WE WILL BE DOING A LIKE SEARCH */
@ProductName NVARCHAR(52),
@ModifiedDateFrom DATETIME,
@ModifiedDateTo DATETIME
SELECT @CarrierTrackingNumber = '%' + @in_CarrierTrackingNumber + '%',
@ProductName = '%' + @in_ProductName + '%',
@ModifiedDateFrom = @in_ModifiedDateFrom,
@ModifiedDateTo = @in_ModifiedDateTo
/* THE TABLE BELOW WILL STORE THE PRIMARY KEYS TO ALL THE RECORDS THAT SATISFY THE FILTER
CRITERIA, SORTED IN THE ORDER REQUIRED. THIS IS NEEDED TO FIND THE TOTAL RECORDS FETCHED
BY THE FILTER CRITERIA */
DECLARE @SORTED_DETAILS_LIST TABLE
(
ID INT IDENTITY(1,1) PRIMARY KEY,
ORDER_ID INT,
DETAIL_ID INT
)
INSERT INTO @SORTED_DETAILS_LIST
(
ORDER_ID,
DETAIL_ID
)
SELECT
SalesOrderID,
SalesOrderDetailID
FROM
Sales.SalesOrderDetail OD
INNER JOIN Production.Product PR
ON
OD.ProductID = PR.ProductID
WHERE
OD.SalesOrderID = ISNULL(@in_SalesOrderID,OD.SalesOrderID)
AND OD.CarrierTrackingNumber LIKE @CarrierTrackingNumber
AND PR.Name LIKE @ProductName
AND OD.ModifiedDate BETWEEN @ModifiedDateFrom AND @ModifiedDateTo
ORDER BY
/* WE NEED TO CONVERT ALL THE COLUMNS THAT CAN BE QUALIFY FOR
A SORT POSITION TO HAVE THE SAME DATATYPE. SO CONVERTING
ALL OF THEM TO VARCHAR */
CASE WHEN @OrderBy = 'PRODUCT_NAME' AND @SortOrder <> 'DESC'
THEN CAST(PR.Name AS VARCHAR(50))
WHEN @OrderBy = 'TRACKING_NBR' AND @SortOrder <> 'DESC'
THEN OD.CarrierTrackingNumber
WHEN @OrderBy = 'MODIFIED_DATE' AND @SortOrder <> 'DESC'
THEN CONVERT(VARCHAR,OD.ModifiedDate,102)
ELSE NULL
END ASC,
/* THE SAME HAS TO BE REPEATED FOR DESCENDING SORT ORDER */
CASE WHEN @OrderBy = 'PRODUCT_NAME' AND @SortOrder = 'DESC'
THEN CAST(PR.Name AS VARCHAR(50))
WHEN @OrderBy = 'TRACKING_NBR' AND @SortOrder = 'DESC'
THEN OD.CarrierTrackingNumber
WHEN @OrderBy = 'MODIFIED_DATE' AND @SortOrder = 'DESC'
THEN CONVERT(VARCHAR,OD.ModifiedDate,102)
ELSE NULL
END DESC,
/* WE WILL ALWAYS SORT BY ORDER_ID AND DETAIL_ID EVEN IF NO SORT CONDITIONS ARE SPECIFIED */
CASE WHEN @SortOrder <> 'DESC'
THEN OD.SalesOrderID ELSE NULL END ASC,
CASE WHEN @SortOrder = 'DESC'
THEN OD.SalesOrderID ELSE NULL END DESC,
CASE WHEN @SortOrder <> 'DESC'
THEN OD.SalesOrderDetailID ELSE NULL END ASC,
CASE WHEN @SortOrder = 'DESC'
THEN OD.SalesOrderDetailID ELSE NULL END DESC
/* STORE THE TOTAL RECORDS SELECTED BY THE LAST QUERY INTO THE OUTPUT PARAMETER */
SET @TotalRecCount = @@ROWCOUNT
/* FETCH THE COMPLETE DATA TO BE RETURNED BACK ONLY FOR THOSE ORDERS THAT ARE SELECTED BASED
ON FILTER AND PAGINATION REQUIREMENT */
SELECT
SDL.ID AS ROW_ID,
OD.SalesOrderID AS ORDER_ID,
OD.SalesOrderDetailID AS DETAIL_ID,
OD.CarrierTrackingNumber AS TRACKING_NBR,
OD.OrderQty AS ORDERED_QTY,
PR.Name AS PRODUCT_NAME,
OD.SpecialOfferID AS SPECIAL_OFFER,
OD.UnitPrice AS UNIT_PRICE,
OD.UnitPriceDiscount AS DISCOUNT,
OD.LineTotal AS TOTAL_COST,
OD.ModifiedDate AS MODIFIED_DATE
FROM
@SORTED_DETAILS_LIST SDL
INNER JOIN Sales.SalesOrderDetail OD
ON
SDL.ORDER_ID = OD.SalesOrderID
AND SDL.DETAIL_ID = OD.SalesOrderDetailID
INNER JOIN Production.Product PR
ON
OD.ProductID = PR.ProductID
WHERE
SDL.ID BETWEEN (@TargetPage-1)*@PageSize + 1 AND (@TargetPage)*@PageSize
ORDER BY
SDL.ID ASC
END
</code></pre>When I execute the SP with the following parameters:<br />
<br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>DECLARE @OutTotalRecCount int
EXEC SalesOrderDetailCustomSort @in_ModifiedDateFrom = '2003-01-01',
@in_ProductName = 'Tour',
@OrderBy = 'TRACKING_NBR',
@PageSize = 25,
@TargetPage = 8,
@TotalRecCount = @OutTotalRecCount OUTPUT
SELECT @OutTotalRecCount AS TOTAL_RECORDS_FETCHED
</code></pre><br />
This is the result set that is returned back:<br />
<br />
<a href="http://1.bp.blogspot.com/_sD0X6TOq9Vw/SuJpUFTQV6I/AAAAAAAAAPE/EfTGEiJplOY/s1600-h/Output.bmp"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5395991097284515746" src="http://1.bp.blogspot.com/_sD0X6TOq9Vw/SuJpUFTQV6I/AAAAAAAAAPE/EfTGEiJplOY/s400/Output.bmp" style="cursor: hand; display: block; height: 192px; margin: 0px auto 10px; text-align: center; width: 400px;" /></a>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-29286778.post-71901409510324633932009-10-10T11:44:00.001-04:002009-11-23T10:24:23.151-05:00A Scenario to Ponder #14Lots of things happened since the last post I made in this blog. I got married. Didn't do much justice to my blog followers since then. My wife couldn't stand me being interested in ONLY her :) So, she pushed me to get this blog started again and I hope she keeps pushing me to continue this blog.<br />
<br />
Now, Here is the scenario.<br />
I had been playing Chicktionary (google it if you had not heard of it) in Club Bing. There are some other versions of the same game - Word Warp, Text Twist, Jumble, etc. In a gist, its a game where you are given a chain of letters and you need to come up with as many words as you can with these letters in a given time. It's a pretty addictive game. <br />
<br />
<a name='more'></a>Within a few days, I thought it will be even more interesting if I can write a query which will generate the words for me when I give the chain of letters.<br />
<br />
To begin with that, I had to set the environment:<br />
1. I need the list of words that my query will be searching from. On googling I found that the following link had a good list of words:<br />
<br />
<a href="http://www.sil.org/linguistics/wordlists/english/wordlist/wordsEn.txt">http://www.sil.org/linguistics/wordlists/english/wordlist/wordsEn.txt</a><br />
<br />
2. I loaded this list to a table [WordList] ([Word] varchar(50)) in a SQL Server database.<br />
<br />
3. I know that none of these games accept any word that is less than 3 characters long. So I deleted those words from the table.<br />
<br />
4. When you do all the above steps, you should have 109441 words in your table.<br />
<br />
With the environment set, I need to write a query (or a stored procedure) that will accept a string of letters and get me all the words in the WordList table that can be formed using the string of letters and it should be displayed in order of longest word to the shortest word.<br />
<br />
There is always a brute force method to solve these kind of scenarios. But, since this game is timed, I want the most optimal solution to get the result generated as soon as possible.<br />
<br />
So, Get me the best solution you can think of. Post as many solutions as you want/can. It can be in SQL Server 2000 or SQL Server 2005. The solution that I have is for SQL Server 2005. <br />
Happy Querying!! <br />
Here is an example Output I am looking for the input string: USILDF<br />
<br />
<a href="http://1.bp.blogspot.com/_sD0X6TOq9Vw/StC41pPsBYI/AAAAAAAAAO8/z1SIroXgYP0/s1600-h/Output.JPG"><img alt="" border="0" id="BLOGGER_PHOTO_ID_5391011985706714498" src="http://1.bp.blogspot.com/_sD0X6TOq9Vw/StC41pPsBYI/AAAAAAAAAO8/z1SIroXgYP0/s400/Output.JPG" style="cursor: hand; float: left; height: 219px; margin: 0px 10px 10px 0px; width: 400px;" /></a><br />
<span style="font-size: 78%;"></span><br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<span style="font-size: 100%;"></span><br />
<span style="font-size: 100%;"></span><br />
<span style="font-size: 100%;">This is most optimized solution I can come up with for this scenario. If you can find a better solution, please post in the comments section. I would love to see it.<br />
</span><br />
<br />
<pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"><code>DECLARE @INPUT_WORD VARCHAR(10)
SET @INPUT_WORD = 'USILDF';
WITH NUMBERS AS
(
SELECT 1 AS NUM
UNION ALL
SELECT NUM+1 FROM NUMBERS WHERE NUM+1 <= LEN(@INPUT_WORD)
),
WRDLIST AS
(
SELECT WORD,SUBSTRING(WORD,NUM,1) CHR,COUNT(*) CNT
FROM WORDLIST, NUMBERS
GROUP BY WORD,SUBSTRING(WORD,NUM,1)
),
INPUTWORD AS
(
SELECT SUBSTRING(@INPUT_WORD,NUM,1) CHR,COUNT(*) CNT
FROM NUMBERS
GROUP BY SUBSTRING(@INPUT_WORD,NUM,1)
)
SELECT
A.WORD
FROM
WRDLIST A, INPUTWORD B
WHERE
A.CHR = B.CHR
AND A.CNT <=B.CNT
GROUP BY
A.WORD
HAVING
SUM(A.CNT) = LEN(A.WORD)
ORDER BY
LEN(A.WORD) DESC,A.WORD DESC
</code></pre>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-29286778.post-72869571141836287022007-10-09T11:26:00.002-04:002009-11-23T16:05:18.456-05:00A Scenario to Ponder #13<span style="font-family: arial;">Last February, I had</span> <span style="font-family: arial;">come over to Atlanta, Georgia from India for a new project and had been quite overwhelmed by the new way of life, the work and the country that I wasn't able to do justice to this blog. I apologize for those who had been following this mini series of scenarios. Hope I can get some time now to revive this.</span><br />
<br />
<span style="font-family: arial;">Here goes the scenario:</span><br />
<br />
<a name='more'></a><span style="font-family: arial;">I have a response/resolution time reporting requirement for a Service Call Management System:<br />
</span><span style="font-family: arial;">My main table is #ServiceCalls. </span><span style="font-family: arial;">The table definition goes like this:<br />
</span><span style="font-family: Arial;"></span><br />
<strong><span style="font-family: verdana;">create table #ServiceCalls</span></strong><br />
<strong><span style="font-family: verdana;">(</span></strong><br />
<strong><span style="font-family: verdana;">SvcCallNo int,</span></strong><br />
<strong><span style="font-family: verdana;">SvcCallDesc varchar(100),</span></strong><br />
<strong><span style="font-family: verdana;">CreateTime datetime,</span></strong><br />
<strong><span style="font-family: Verdana;">ResolutionTime datetime</span></strong><br />
<strong><span style="font-family: verdana;">)</span></strong><br />
<strong><span style="font-family: Verdana;"></span></strong><br />
<u><span style="font-family: arial;">Here is the data for this table</span><br />
</u><br />
<span style="font-family: times new roman; font-size: 85%;">insert into #ServiceCalls<br />
</span><span style="font-family: times new roman;"><span style="font-size: 78%;"><span style="font-size: 85%;">select 4709,'I have the complaint #4709','Oct 9 2007 6:59AM','Oct 10 2007 12:59PM' union all<br />
select 4716,'I have the complaint #4716','Oct 8 2007 12:23PM','Oct 10 2007 6:23PM' union all<br />
select 4685,'I have the complaint #4685','Oct 7 2007 12:04PM','Oct 9 2007 6:04PM' union all<br />
select 4695,'I have the complaint #4695','Oct 6 2007 10:59AM','Oct 9 2007 4:59PM' union all<br />
select 4654,'I have the complaint #4654','Oct 5 2007 10:29AM','Oct 8 2007 4:29PM' union all<br />
select 4692,'I have the complaint #4692','Oct 4 2007 10:00AM','Oct 8 2007 4:00PM' union all<br />
select 4637,'I have the complaint #4637','Oct 3 2007 9:27AM','Oct 8 2007 3:27PM' union all<br />
select 4674,'I have the complaint #4674','Oct 2 2007 1:52PM','Oct 6 2007 7:52PM' union all<br />
select 4689,'I have the complaint #4689','Oct 1 2007 9:36AM','Oct 5 2007 3:36PM' union all<br />
select 4700,'I have the complaint #4700','Oct 1 2007 4:43AM','Oct 4 2007 10:43AM'</span> </span></span><br />
With the above table, I need to generate a report having these fields.<br />
SvcCallNo, ActualCreateTime, ActualResolutionTime, Effort<br />
<span style="font-family: arial;"><strong><u>Points to Note:</u></strong></span><br />
<span style="font-family: arial;">1. Service calls are worked upon between 9:00 AM and 6:00 PM Monday thru Friday (Working Hours).</span><br />
<span style="font-family: arial;"><strong>2. ActualCreateTime </strong>is the closest working time on or after the CreateTime.</span><br />
<span style="font-family: arial;">For Example, for SvcCallNo 4709, the ActualCreateTime is 'Oct 9 2007 9:00AM'</span><br />
<span style="font-family: arial;">3. <strong>ActualResolutionTime</strong> is the closest working time on or before the ResolutionTimeFor Example, for SvcCallNo 4716 the ActualResolutionTime is 'Oct 10 2007 6:00PM' </span><br />
<span style="font-family: arial;">4. Effort is the number of working hours between the actualCreateTime and ActualResolutionTime</span><br />
<span style="font-family: arial;">How do I go about writing the query. The solution can be in SQL Server 2000 or 2005.</span><br />
<span style="font-family: arial;"><strong>Hint:</strong> You can create a calendar table and solve this (its still a challenge). If you can come up with an efficient query without a calendar table for SQL Server 2000 that would be pure genius.</span> <br />
<br />
<span style="font-family: times new roman;"><span style="font-size: 78%;"><br />
</span></span><span style="font-family: times new roman;"><span style="font-size: 78%;"></span></span><span style="font-family: Arial;"></span>Unknownnoreply@blogger.com7tag:blogger.com,1999:blog-29286778.post-82481465028383724502007-02-07T00:22:00.002-05:002009-11-23T10:57:39.094-05:00A Scenario to Ponder #12<span style="font-family: arial;">Its been a long time since I had used this medium of communication. Without getting into the excuses part of the dormancy I will get on with the question...</span><br />
<span style="font-family: arial;"></span><br />
<span style="font-family: arial;">Here is the scenario. Say, I am in a business consulting company and I am given a table EmployeeCustomerOrders.</span><br />
<span style="font-family: arial;"></span><br />
<span style="font-family: arial;">The table can be generated from the orders table in the Northwind database using this query.</span><br />
<br />
<a name='more'></a><strong><span style="font-family: trebuchet ms;">use Northwind<br />
go<br />
select employeeid, customerid, count(orderid) as OrderCount </span></strong><strong><span style="font-family: trebuchet ms;">into #EmployeeCustomerOrders<br />
from orders<br />
group by employeeID, customerID<br />
order by employeeid, customerid</span></strong><br />
<br />
<span style="font-family: arial;">Here you can see that an employee can process multiple customers and similarly a customer can be processed by multiple employees. The consulting experts had told that, to improve the sales, the following changes have to be made:</span><br />
<ul><li><span style="font-family: arial;">A customer should be tied to only one employee</span></li>
<li><span style="font-family: arial;">An employee can process for multiple customers</span></li>
<li><span style="font-family: arial;">The employee who has created most orders should be mapped to the customers who have made the least orders and vise-versa</span></li>
</ul><span style="font-family: arial;">My job here is to come up with that mapping between customers and employees. Which customers goes to which employees. </span><br />
<span style="font-family: arial;"><strong><u>For example:</u></strong> say there are 100 customers and 10 employees. The employee with the maximum orders will be mapped to the 10 customers who have made the least orders and the employees with the minimum orders will be mapped to the top 10 customers who gave the maximum orders.</span><br />
<span style="font-family: arial;">The result will have two columns: EmployeeID, CustomerID</span><br />
<span style="font-family: arial;">How do I go about getting the query. The solution can be in SQL Server 2000 or 2005.</span>Unknownnoreply@blogger.com5tag:blogger.com,1999:blog-29286778.post-62606520586950846312006-12-04T02:02:00.009-05:002009-11-23T10:45:27.366-05:00Obvious SQL Tip #2<strong><u><span style="font-family: arial;">Thinking in sets:</span></u></strong><br />
<span style="font-family: arial;">The thought process that goes into set- based programming is slightly different from writing procedural code.<br />
<br />
Say, I have 2 tables -one holding order header information and the other holding the details of the products ordered. I want to get the items ordered for all the orders:<br />
</span><br />
<ul><li><span style="font-family: arial;">The procedural way of looking at it is "for each order in the orders table get the order details from the [order details] table". </span></li>
<li><span style="font-family: arial;">In DB terms, its "for all orders in the orders table get the order details from the [order details] table".</span></li>
</ul><span style="font-family: arial;"></span><br />
<a name='more'></a>Its the subtleties that makes the difference.<br />
<br />
<span style="font-family: arial;"><u><strong>Tip:</strong></u> When it comes to database programming, you no longer look at each row. You should understand that a table is a collection of rows with similar characterstics. Or its more like, "you seen one row.. you've seen 'em all".<br />
<br />
Say, for example, you need only the orders placed in US and not the rest of the world.</span> <br />
<ul><li><span style="font-family: arial;">In procedural way "for each order check if the order is placed in US. If so, then get the corresponding order details else ignore the order".</span></li>
<li><span style="font-family: arial;">In sets "Get the order details for all the orders placed in US"</span></li>
</ul><span style="font-family: arial;">Thinking the procedural way will only lead to one conclusion for any requirement- CURSORS. Stop thinking the procedural way, if not possible, stop thinking :)<br />
<br />
<u><strong>Tip:</strong></u> With your procedural code, you build your result set from scratch one row at a time. Don't do that. Take all the rows and eliminate whichever is not necessary.</span><br />
<span style="font-family: arial;"><br />
<u>Here is an analogy:</u> Say you have a bag with 100 coins. Of that, 10 coins are silver and 90 are gold. You want to take the 10 silver coins out as fast as you can. How will you do it? If you think you will take one coin out at a time and check whether its silver or gold, then database programming is not the <span style="font-family: arial;">best option for you.<br />
<br />
<u><strong>Get the procedural code out of your database:</strong></u></span><br />
</span><br />
<ul><li><span style="font-family: arial;">CURSORs with IF.. ELSE usually can be replaced with CASE statements in SELECT.</span></li>
<li><span style="font-family: arial;">Scalar user defined functions (S-UDFs) in SQL Server are the bridge between procedural code and T-SQL code. Or, to be precise, its just procedural code in T-SQL. Avoid it if possible.</span></li>
<li><span style="font-family: arial;">In a S-UDFs it might be tempting to use recursion for scenarios like converting integer to binary or getting the factorial. It might be an excellent code, but while loop performs better than recursion, CLR functions (SQL Server 2005) performs better than while loop, doing the calculation in the front end is definitely better than using a CLR function. </span></li>
</ul>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-29286778.post-63248072876724341392006-11-29T06:31:00.001-05:002009-11-23T11:37:03.292-05:00A Scenario to Ponder #11<span style="font-family: arial;">This time it's a pretty interesting scenario to solve. I have a table called the distance_tbl holding distance between different cities. </span><br />
<span style="font-family: arial;">Here is the table definition:</span><br />
<br />
<strong><span style="font-family: trebuchet ms;">create table #distance_tbl (city1 char(1), city2 char(1), distance int)</span></strong><br />
<br />
<span style="font-family: arial;">And the sample data:</span><br />
<br />
<a name='more'></a><strong><span style="font-family: trebuchet ms;">insert into #distance_tbl values('A','B',200)</span></strong> <br />
<strong><span style="font-family: trebuchet ms;">insert into #distance_tbl values('A','C',100)</span></strong><br />
<strong><span style="font-family: trebuchet ms;">insert into #distance_tbl values('C','B',90)</span></strong><br />
<strong><span style="font-family: trebuchet ms;">insert into #distance_tbl values('C','D',300)</span></strong><br />
<strong><span style="font-family: trebuchet ms;">insert into #distance_tbl values('C','E',200)</span></strong><br />
<strong><span style="font-family: trebuchet ms;">insert into #distance_tbl values('E','F',50)</span></strong><br />
<strong><span style="font-family: trebuchet ms;">insert into #distance_tbl values('F','D',50)</span></strong><br />
<strong><span style="font-family: trebuchet ms;">insert into #distance_tbl values('F','C',220)</span></strong><br />
<br />
<span style="font-family: arial;">Lets assume the information is pretty comprehensive and so the table is pretty huge.</span><br />
<span style="font-family: arial;">I want to put this information to good use. So, I am planning to build a site where people can come and search for the shortest distance between two cities and also the route to go to the city.</span><br />
<span style="font-family: arial;"></span><br />
<span style="font-family: arial;">The site will inturn use a query to get the information. Two paramters come from the user to the query. The start city and the end city, say @Start and @end and both are of type char(1).</span><br />
<span style="font-family: arial;"></span><br />
<span style="font-family: arial;">How the result is presented, I leave it to your imagination. But here are a few examples on the information required for a given input.</span><br />
<br />
<strong><span style="font-family: arial;">Example 1:</span></strong><br />
<span style="font-family: arial;"><u>Input:</u> @Start = 'A', @end 'B'</span><br />
<span style="font-family: arial;"><u>Output:</u> Distance = 190 and you go from A to C and C to B</span><br />
<span style="font-family: arial;"></span><br />
<strong><span style="font-family: arial;">Example 2:</span></strong><br />
<span style="font-family: arial;"><u>Input:</u> @Start = 'C', @End 'D'</span><br />
<span style="font-family: arial;"><u>Output:</u> Distance = 270 and you go from C to F and F to D</span><br />
<br />
<u><strong><span style="font-family: arial;">A few rules to note:</span></strong></u><br />
<span style="font-family: arial;">In the table, A to C is 100 miles also means C to A is 100 miles and will not be stored as a seperate row.</span><br />
<span style="font-family: arial;">If two different paths have the same shortest distance, then I need to get the path that touches the least number of cities.</span><br />
<span style="font-family: arial;"></span><br />
<span style="font-family: arial;">The solution can be given in SQL Server 2000 or SQL Server 2005 and you may use temp tables, cursors or while loops, if necessary.</span>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-29286778.post-54522388311766175522006-11-27T08:46:00.001-05:002009-11-23T11:39:33.919-05:00Obvious SQL Tip #1<span style="font-family: arial;">I don't really know why I am starting this series, if it will be really useful for anyone and whether I will be able to keep it running for long. But, its here because I am convinced that sometimes the most obvious escapes our mind. And as someone rightly said "common sense is not so common".<br />
<br />
I just intend to illustrate few common issues(one per post) where we try to come up with complicate queries and realise, in the end, that we have a painfully obvious alternative. But, the question is - Is it obvious and is it really an alternative? Read ahead... </span><br />
<span style="font-family: arial;"><br />
</span><br />
<a name='more'></a>My requirement is this. I have SQL Server 2000. In the <strong>northwind</strong> database, I need to select some information for the order (in the <strong>orders</strong> table) which has the maximum freight charge.<br />
<br />
The query that strikes our mind immediately is this:<br />
<br />
<strong><span style="font-family: trebuchet ms;">select OrderID, CustomerID, EmployeeID, OrderDate, freight<br />
from orders<br />
where freight = (select max(freight) from orders)</span></strong><br />
<br />
<span style="font-family: arial;">On running the above query, I get this result:</span><br />
<br />
<br />
<a href="http://photos1.blogger.com/x/blogger2/378/3575/1600/586265/Tips11.jpg"><img alt="" border="0" src="http://photos1.blogger.com/x/blogger2/378/3575/400/148789/Tips11.jpg" style="cursor: hand; display: block; margin: 0px auto 10px; text-align: center;" /></a><br />
<br />
<span style="font-family: arial;">What we are doing here is a self-join which, if you notice carefully, can be avoided. Can I use this query to get the result?</span><br />
<br />
<span style="font-family: trebuchet ms;"><strong>select top 1 OrderID, CustomerID, EmployeeID, OrderDate, freight<br />
from orders<br />
order by freight desc</strong></span><br />
<br />
<span style="font-family: arial;">Of course, I can. And I reduced one join. Isn't this great? Yes, it is.<br />
<br />
Happily basking in the glory of ineptitude, seldom do we realise that there can be more than one order having the same max freight charge and we got the same result just by chance.<br />
<br />
May be, If I wanted to find the last order by orderdate, my safe query will look this way:<br />
</span><br />
<strong><span style="font-family: trebuchet ms;">select OrderID, CustomerID, EmployeeID, OrderDate, freight<br />
from orders<br />
where orderdate = (select max(orderdate) from orders)</span></strong><br />
<br />
<span style="font-family: arial;">This is the result:</span><br />
<br />
<a href="http://photos1.blogger.com/x/blogger2/378/3575/1600/961592/Tips12.jpg"><img alt="" border="0" src="http://photos1.blogger.com/x/blogger2/378/3575/400/553367/Tips12.jpg" style="cursor: hand; display: block; margin: 0px auto 10px; text-align: center;" /></a><br />
<br />
<span style="font-family: arial;">And my seemingly better query:</span><br />
<br />
<span style="font-family: trebuchet ms;"><strong>select top 1 OrderID, CustomerID, EmployeeID, OrderDate, freight<br />
from orders order by orderdate desc</strong></span><br />
<br />
<span style="font-family: arial;">This gives me the wrong result (of course):</span><br />
<br />
<a href="http://photos1.blogger.com/x/blogger2/378/3575/1600/760073/Tips13.jpg"><img alt="" border="0" src="http://photos1.blogger.com/x/blogger2/378/3575/400/802438/Tips13.jpg" style="cursor: hand; display: block; margin: 0px auto 10px; text-align: center;" /></a><br />
<br />
<span style="font-family: arial;">Now, I have realised the problem, but I don't want to abandon my approach. I would write my query this way to get the correct result:</span><br />
<br />
<span style="font-family: trebuchet ms;"><strong>select top 1 with ties OrderID, CustomerID, EmployeeID, OrderDate, freight<br />
from orders<br />
order by orderdate desc</strong></span><br />
<br />
<span style="font-family: arial;">It got me the result. Let's see how well it performs when compared to my safe query. So, I run the following queries in parallel:</span><br />
<br />
<span style="font-family: trebuchet ms;"><strong>--Query #1<br />
select top 1 with ties OrderID, CustomerID, EmployeeID, OrderDate, freight<br />
from orders<br />
order by orderdate desc<br />
<br />
--Query #2<br />
select OrderID, CustomerID, EmployeeID, OrderDate, freight<br />
from orders<br />
where orderdate = (select max(orderdate) from orders)</strong></span><br />
<br />
<span style="font-family: arial;">and when I check the query execution plan, I am in for a surprise:</span><br />
<br />
<a href="http://photos1.blogger.com/x/blogger2/378/3575/1600/30590/Tips14.jpg"><img alt="" border="0" src="http://photos1.blogger.com/x/blogger2/378/3575/400/688565/Tips14.jpg" style="cursor: hand; display: block; margin: 0px auto 10px; text-align: center;" /></a><br />
<br />
<span style="font-family: arial;">I find that Query #1 incurs almost double the cost of Query #2 (or the safe query) and Query #2 doesn't have a self-join (The reason I started the exercise was to avoid the self join which is not there). I still don't lose hope. I want to make my Query #1 performing. Assuming that my orderdate will always be lesser than or equal to the current date and I can convert my scan to seek. So I add up a useless filter to it and my query now, is this:</span><br />
<br />
<span style="font-family: trebuchet ms;"><strong>select top 1 with ties OrderID, CustomerID, EmployeeID, OrderDate, freight<br />
from orders<br />
where orderdate <= getdate() order by orderdate desc</strong></span> <br />
<span style="font-family: arial;">Here is the query execution plan for the above query:</span> <br />
<a href="http://photos1.blogger.com/x/blogger2/378/3575/1600/44966/Tips15.jpg"><img alt="" border="0" src="http://photos1.blogger.com/x/blogger2/378/3575/400/458375/Tips15.jpg" style="cursor: hand; display: block; margin: 0px auto 10px; text-align: center;" /></a><br />
<br />
<span style="font-family: arial;">Now, I am able to bring the query to execute as fast as my safe query. Good that I did it without a join. But, I see now that my query looks more complicated, has an assumption which might fail anytime and is giving me the same execution plan and cost as my safe query. So, which query do you choose - <strong>good and safe</strong> or just <strong>good</strong>?<br />
<br />
<strong><u>Moral of the story:</u></strong><br />
"Don't stop thinking too hard for better alternatives, but remember that it doesn't become a better alternative just because you thought too hard" :)</span>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-29286778.post-67319174378378792902006-11-22T10:49:00.000-05:002009-11-05T10:00:00.131-05:00A Scenario to Ponder #10<span style="font-family:arial;">Say,I have a requirement. My business demands an auto-generated key (or an identity) for a table called <strong>CharTbl</strong>. But, it needs to be a character identity column. The sequence generation should be as given in the figure below (partial result displayed).</span><br /><br /><a href="http://photos1.blogger.com/blogger2/378/3575/1600/Scenario10.jpg"><img style="DISPLAY: block; MARGIN: 0px auto 10px; CURSOR: hand; TEXT-ALIGN: center" alt="" src="http://photos1.blogger.com/blogger2/378/3575/400/Scenario10.jpg" border="0" /></a><br /><span style="font-family:arial;">How would I go about creating the table definition?</span>Unknownnoreply@blogger.com3tag:blogger.com,1999:blog-29286778.post-42538187270260581012006-11-20T06:26:00.001-05:002009-11-23T11:40:38.412-05:00Pivot Query Generator for SQL Server 2000<span style="font-family: arial;">This post is written with reference to the post </span><a href="http://omnibuzz-sql.blogspot.com/2006/07/pivot-query-generator-for-sql-server.html"><span style="font-family: arial;">Pivot Query Generator for SQL Server 2005</span></a><span style="font-family: arial;">. Please read through the post to get the context and the data setup before you read any further.<br />
<br />
Now, that you have read through the other post, you will realise that in SQL Server 2005, we had the PIVOT operator to generate the crosstab reports. But in case of SQL Server 2000, we will have to use an indirect approach to generate the cross-tab. Here is the SQL Server 2000 counter part of the Pivot Query Generator of SQL Server 2005.<br />
<br />
</span><br />
<a name='more'></a>This SP is still in a rudimentary stage and does not handle join conditions. You may use it to see how to generate cross-tab reports and extend it to your requirements. Please feel free to modify the SP to suit to your requirements. And if you can see a way by which it can be improved, leave a comment and I will try to update it if possible.<br />
<br />
Though, the implementation of this SP is different from the one written for SQL Server 2005, the end result should be the same. And you can get the query generated from the <strong>messages </strong>tab of your query analyzer.<br />
<br />
Here is the stored procedure definition:<br />
<span style="font-family: trebuchet ms;"><strong><br />
create proc GeneratePivotQuery (<br />
@TableName varchar(100), -- Table to select from<br />
@AggregateFunction varchar(10), -- Aggregate to be done on the value column<br />
@ValueColumn varchar(100),<br />
@PivotColumn varchar(100), -- Column for which the values needs to be transposed<br />
@FilterCondition varchar(1000), -- Filtering to be done on choosing the pivot values<br />
@OtherColumns varchar(1000)) -- Columns selected other than the pivot columns<br />
as<br />
begin<br />
declare @List varchar(2000)<br />
set @List = '';<br />
<br />
declare @ConcatQuery varchar(4000)<br />
declare @CurColumn varchar(100)<br />
<br />
set @ConcatQuery = 'DECLARE C1 CURSOR global FAST_FORWARD FOR select distinct '+<br />
@PivotColumn + ' from ' + @TableName + isnull(' where ' + @FilterCondition,'')<br />
<br />
exec(@ConcatQuery)<br />
<br />
open C1<br />
<br />
fetch next from C1 into @CurColumn<br />
<br />
set @List = @list + @AggregateFunction + '( case when [' + @PivotColumn +<br />
'] = ''' + @CurColumn + ''' then ' +<br />
@ValueColumn + ' else null end ) as [' + @CurColumn + ']'<br />
<br />
while (@@fetch_status = 0)<br />
begin<br />
<br />
fetch next from C1 into @CurColumn<br />
if(@@fetch_status = 0)<br />
set @List = @list + ',' + char(10) + char(9) + @AggregateFunction + '( case when [' + @PivotColumn +<br />
'] = ''' + @CurColumn + ''' then ' +<br />
@ValueColumn + ' else null end ) as [' + @CurColumn + ']'<br />
<br />
end<br />
<br />
close c1<br />
deallocate c1<br />
<br />
/* this will print the query used to generate the pivoted result */<br />
print '/*query begin*/' + char(10) + char(10) + 'select '+ char(10) + char(9) + @OtherColumns + ',' + char(10)+ char(9) + @List + char(10) + ' from ' + @TableName +<br />
char(10) + ' group by ' + char(10)+ char(9) + @OtherColumns + char(10) + char(10) + '/*query end*/' + replicate(char(10),5)<br />
<br />
/* this will generate the result set */<br />
exec ('select ' + @OtherColumns + ',' + @List + ' from ' + @TableName +<br />
' group by ' + @OtherColumns )<br />
<br />
end<br />
</strong></span><br />
<span style="font-family: arial;">Here is the call to the stored procedure, that will pivot those values in city2 where city2 is between 'B' and 'D' and display the result. You can get the query, used to generate the result, from the messages tab.</span><br />
<span style="font-family: Trebuchet MS;"><strong><br />
DECLARE @TableName varchar(100)<br />
DECLARE @AggregateFunction varchar(10)<br />
DECLARE @ValueColumn varchar(100)<br />
DECLARE @PivotColumn varchar(100)<br />
DECLARE @FilterCondition varchar(1000)<br />
DECLARE @OtherColumns varchar(1000)<br />
<br />
SELECT @TableName = 'distance_tbl'<br />
SELECT @AggregateFunction = 'sum'<br />
SELECT @ValueColumn = 'distance'<br />
SELECT @PivotColumn = 'city2'<br />
SELECT @FilterCondition = 'city2 between ''B'' and ''D'''<br />
SELECT @OtherColumns = 'city1'<br />
<br />
EXEC [dbo].[GeneratePivotQuery]<br />
@TableName,<br />
@AggregateFunction,<br />
@ValueColumn,<br />
@PivotColumn,<br />
@FilterCondition,<br />
@OtherColumns</strong></span><span style="font-family: Trebuchet MS;"><br />
</span><span style="font-family: Trebuchet MS;"></span><br />
<strong></strong>Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-29286778.post-8594450224677692962006-11-17T08:23:00.002-05:002009-11-23T11:45:12.204-05:00Parameter Sniffing & Stored Procedures Execution Plan<span style="font-family: arial;">This post is an attempt to explain what parameter sniffing is all about and how it affects the performance of a stored procedure. I would be walking through an example to demonstrate what the effect of parameter sniffing is. I will be showing the query execution plan generated for SQL Server 2005. To my knowledge, there is no change, in how this process works, between the two versions and you should be able to relate it to SQL Server 2000 as well.</span><br />
<br />
<a name='more'></a><span style="font-family: arial;">According to the white paper,</span><a href="http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx"><span style="font-family: arial;">Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005 </span></a><span style="font-family: arial;">published in the Microsoft Site:</span><br />
<br />
<span style="font-family: arial;"><em>"Parameter sniffing" refers to a process whereby SQL Server's execution environment "sniffs" the current parameter values during compilation or recompilation, and passes it along to the query optimizer so that they can be used to generate potentially faster query execution plans. The word "current" refers to the parameter values present in the statement call that caused a compilation or a recompilation.</em> </span><br />
<br />
<span style="font-family: arial;">Before I go ahead with the example, you need to understand that the query execution plan generated by the query optimizer depends on a lot of factors and parameter sniffing is just one of them. So the execution plans I show here might not be the execution plan you get if you run the same query on your server.</span><br />
<br />
<span style="font-family: arial;">Lets consider the <strong>Orders </strong>table in the <strong>Northwind</strong> database.</span><br />
<br />
<span style="font-family: arial;">Say, I create a stored procedure with the definition as:</span><br />
<br />
<span style="font-family: trebuchet ms;"><strong>create procedure GetOrderForCustomers(@CustID varchar(20))<br />
as<br />
begin<br />
select * from orders<br />
where customerid = @CustID<br />
end</strong></span><br />
<br />
<span style="font-family: arial;">Remember, that the query execution plan is not generated when you create the procedure. It gets created and cached the first time you run it.</span><br />
<br />
<span style="font-family: arial;">First lets look at the distribution of the number of orders for each customer</span><br />
<br />
<span style="font-family: arial;">Running the following query in the Northwind database:</span><br />
<br />
<span style="font-family: trebuchet ms;"><strong>select top 100 percent customerid,count(*) as OrderCount </strong></span><span style="font-family: trebuchet ms;"><strong>from orders<br />
group by customerid </strong></span><span style="font-family: trebuchet ms;"><strong>order by count(*) </strong></span><br />
<br />
<span style="font-family: arial;">We get the number of orders placed by each customer (Only partial result shown)</span><br />
<br />
<a href="http://photos1.blogger.com/x/blogger2/378/3575/1600/985309/PSResult.jpg"><img alt="" border="0" src="http://photos1.blogger.com/x/blogger2/378/3575/400/214300/PSResult.jpg" style="cursor: hand; display: block; margin: 0px auto 10px; text-align: center;" /></a><br />
<br />
<span style="font-family: arial;">The first and the last customer are the ones we are interested in: </span><br />
<br />
<span style="font-family: arial;">CENTC has OrderCount (min) = 1</span><br />
<br />
<span style="font-family: arial;">SAVEA has OrderCount (max) = 31</span><br />
<br />
<span style="font-family: arial;">Lets say, I want to find all the orders for CustomerID = 'SAVEA' by calling the stored procedure that we created before.</span><br />
<br />
<span style="font-family: trebuchet ms;"><strong>exec GetOrderForCustomers 'SAVEA'</strong></span><br />
<br />
<span style="font-family: arial;">Since this is the first time the stored procedure is called, it will create an optimized query execution plan and execute it.</span><br />
<br />
<span style="font-family: arial;">Now, the stored proc will return me 31 rows. Let's look at the query execution plan.</span><br />
<br />
<br />
<a href="http://photos1.blogger.com/x/blogger2/378/3575/1600/771604/PSSAVEA.jpg"><img alt="" border="0" src="http://photos1.blogger.com/x/blogger2/378/3575/400/230758/PSSAVEA.jpg" style="cursor: hand; display: block; margin: 0px auto 10px; text-align: center;" /></a> <span style="font-family: arial;">Two values in the Clustered Index Scan information are of interest to us.</span><br />
<ul><li><span style="font-family: arial;">Actual Number of Rows 31</span></li>
<li><span style="font-family: arial;">Estimated Number of Rows 31.0747</span></li>
</ul><br />
<span style="font-family: arial;">How did the optimizer correctly estimate the actual number of rows?</span><br />
<br />
<span style="font-family: arial;">It's because of what we call "parameter sniffing". The optimizer created the plan knowing the fact that it was going to get the information for the customerID 'SAVEA' and hence retrieve 31 rows. </span><br />
<br />
<span style="font-family: arial;">Then how did the optimizer know that 'SAVEA' has 31 orders?</span><br />
<br />
<span style="font-family: arial;">SQL Server internally maintains the statistics and distribution of the values in the columns used for filtering. Which, inturn, is nothing but the information in the result of this query (which we used above)</span><br />
<br />
<span style="font-family: trebuchet ms;"><strong>select top 100 percent customerid,count(*) as OrderCount<br />
from orders </strong></span><span style="font-family: trebuchet ms;"><strong>group by customerid<br />
order by count(*)</strong></span><br />
<br />
<span style="font-family: arial;">Then, what is the problem with parameter sniffing, if its helping the query optimizer do better optimization?<br />
<br />
Check out the query execution plan, again, for a different input (CustomerID = 'CENTC'):</span><br />
<br />
<span style="font-family: trebuchet ms;"><strong>exec GetOrderForCustomers 'CENTC'<br />
</strong></span><br />
<br />
<a href="http://photos1.blogger.com/x/blogger2/378/3575/1600/919700/PSSAVEA_centc.jpg"><img alt="" border="0" src="http://photos1.blogger.com/x/blogger2/378/3575/400/32311/PSSAVEA_centc.jpg" style="cursor: hand; display: block; margin: 0px auto 10px; text-align: center;" /></a> <span style="font-family: arial;">Did you notice that the estimated number of rows is still 31 when the actual number of rows is 1? This plan was optimized for retriving 31 rows (in the first run) and the plan stays in the cache for reuse till server is restarted or the procedure is recompiled or if the proc cache is removed because of any other reason. And so, we need to understand that this plan may or may not work with the same efficiency for retrieving 1 row.</span><br />
<br />
<span style="font-family: arial;">Now, lets see what is the efficient plan (in my server) for retrieving the orders for 'CENTC'. For that I need to clear the current execution plan. Let's drop and recreate the stored procedure so that the plan cache is removed.</span><br />
<br />
<span style="font-family: trebuchet ms;"><strong>drop proc GetOrderForCustomers</strong></span><span style="font-family: trebuchet ms;"><strong><br />
go<br />
create procedure GetOrderForCustomers(@CustID varchar(20))<br />
as<br />
begin<br />
select * from orders<br />
where customerid = @CustID<br />
end</strong></span><br />
<br />
<span style="font-family: arial;">Now that the plan cache is cleared, a fresh query execution plan will be generated for the input I give in the following procedure call.</span><br />
<br />
<span style="font-family: trebuchet ms;"><strong>exec GetOrderForCustomers 'CENTC'</strong></span><br />
<br />
<span style="font-family: arial;">Let's look at the query execution plan now.</span><br />
<br />
<br />
<a href="http://photos1.blogger.com/x/blogger2/378/3575/1600/633254/PSCENTC.jpg"><img alt="" border="0" src="http://photos1.blogger.com/x/blogger2/378/3575/400/980024/PSCENTC.jpg" style="cursor: hand; display: block; margin: 0px auto 10px; text-align: center;" /></a> <span style="font-family: arial;">Now, do you notice that the plan has changed and the estimates also have changed</span> <br />
<ul><li><span style="font-family: arial;">Actual Number of Rows 1</span></li>
<li><span style="font-family: arial;">Estimated Number of Rows 1.00241</span></li>
</ul><br />
<span style="font-family: arial;">If you call the procedure again with 'SAVEA' as the input, you will see that the plan will be the same and estimated number of rows will still be 1 though the actual rowcount will be 31.</span><br />
<br />
<span style="font-family: arial;">So, either way it all depends on which input I give the first time I call the stored procedure.</span><br />
<br />
<span style="font-family: arial;">But, It's impossible to keep track of when the plan was created and what was the input it used. In that case, we have an option here to disable the dependency on the input parameter. </span><br />
<br />
<span style="font-family: arial;">We can create local variables and assign the input parameter to the local variables and use the local variables in the query. And in that case, since we didn't use the procedural parameter directly in the query, it will generate a generic query execution plan.</span><br />
<br />
<span style="font-family: arial;">The same stored procedure can be written this way to avoid parameter sniffing:</span> <br />
<br />
<span style="font-family: trebuchet ms;"><strong>create procedure GetOrderForCustomersWithoutPS(@CustID varchar(20))<br />
as<br />
begin<br />
declare @LocCustID varchar(20)<br />
set @LocCustID = @CustID</strong></span><br />
<br />
<span style="font-family: trebuchet ms;"><strong>select * from orders<br />
where customerid = @LocCustID<br />
end</strong></span><br />
<br />
Now, when you run the stored procedure with the input as 'SAVEA'<br />
<br />
<span style="font-family: trebuchet ms;"><strong>exec GetOrderForCustomersWithoutPS 'SAVEA'</strong></span><br />
<br />
<span style="font-family: arial;">This is the query execution plan that I get.</span><br />
<a href="http://photos1.blogger.com/x/blogger2/378/3575/1600/671076/WOPS.jpg"><img alt="" border="0" src="http://photos1.blogger.com/x/blogger2/378/3575/400/127836/WOPS.jpg" style="cursor: hand; display: block; margin: 0px auto 10px; text-align: center;" /></a><span style="font-family: arial;"> Here you see that the actual number of rows is 31 whereas the estimated number of rows is not 31, but its close to 9. Its clear in this case that the query optimizer did not use the parameter value for generating the query execution plan. Then, from where did it get the value 9?</span><br />
<span style="font-family: arial;">You may guess, just like I did. We saw the distribution of the number of orders for each customer. Now, lets find the average number of orders per customer.</span><span style="font-family: arial;">This query finds the average of ordercount of all customers we got from the previous query:</span> <br />
<span style="font-family: trebuchet ms;"><strong>select avg(OrderCount) from<br />
(select top 100 percent customerid,count(*) as OrderCount<br />
from orders<br />
group by customerid<br />
order by count(*)) b</strong></span> <br />
<span style="font-family: arial;">Or it can be simply written this way:</span> <br />
<span style="font-family: trebuchet ms;"><strong>select 1.0*count(*)/count(distinct customerid) from orders</strong></span> <br />
<span style="font-family: arial;">You will see that the output is:</span><br />
<a href="http://photos1.blogger.com/x/blogger2/378/3575/1600/197223/PSResult2.jpg"><img alt="" border="0" src="http://photos1.blogger.com/x/blogger2/378/3575/400/400727/PSResult2.jpg" style="cursor: hand; display: block; margin: 0px auto 10px; text-align: center;" /></a><span style="font-family: arial;"></span><br />
<span style="font-family: arial;">Well, this is the estimated rowcount that we saw in the execution plan.</span><span style="font-family: arial;">If you look at it, for the current data, 67 out of 89 distinct customers have ordercount between 4 (9-5) and 14 (9+5) orders. </span><span style="font-family: arial;">So, execution plan generated should work good for the majority of the customers. Hence, disabling parameter sniffing is a good choice in this case.</span><br />
<u><span style="font-family: arial;">A few points to note:</span></u> <br />
<ul><li><span style="font-family: arial;">Parameter sniffing can be enabled or disabled at the stored procedure level. </span></li>
<li><span style="font-family: arial;">We need not use local variables (to disable parameter sniffing) if the amount of data you will retrieve from the table is evenly distributed for all values of the filtered column (example, search by primary key or unique key)</span></li>
<li><span style="font-family: arial;">Parameter sniffing can be disabled (by using local variables) if we see a bell curve distribution in the number of rows retrieved for the filtered column.</span></li>
</ul>Unknownnoreply@blogger.com70tag:blogger.com,1999:blog-29286778.post-28883636832600138142006-11-15T08:48:00.001-05:002009-11-23T11:42:13.803-05:00A Scenario to Ponder #9<span style="font-family: arial;">This scenario is a slightly modifed (simplified) version of a question posted in a discussion forum.</span><br />
<span style="font-family: arial;">Lets say I have a <strong>SQL Server 2000 </strong>box. And I have the following table.</span><br />
<br />
<span style="font-family: trebuchet ms;"><strong>create table #runningavg(id int identity(1,1), curval decimal(16,10))</strong></span><br />
<br />
<span style="font-family: arial;">And the following data:</span><br />
<br />
<a name='more'></a><strong><span style="font-family: trebuchet ms;">insert into #runningavg(curval) values(12)<br />
insert into #runningavg(curval) values(14)<br />
insert into #runningavg(curval) values(20)<br />
insert into #runningavg(curval) values(30)<br />
insert into #runningavg(curval) values(10)<br />
insert into #runningavg(curval) values(6)<br />
insert into #runningavg(curval) values(16)<br />
insert into #runningavg(curval) values(8)<br />
insert into #runningavg(curval) values(4)<br />
insert into #runningavg(curval) values(2)<br />
insert into #runningavg(curval) values(32)</span></strong> <br />
<span style="font-family: arial;">Now, with the identity column filled, here is the table data:</span><br />
<br />
<br />
<a href="http://photos1.blogger.com/blogger2/378/3575/1600/result91.jpg"><img alt="" border="0" src="http://photos1.blogger.com/blogger2/378/3575/400/result91.jpg" style="cursor: hand; display: block; margin: 0px auto 10px; text-align: center;" /></a><br />
<span style="font-family: arial;">I would like to come up with a query that will give me the following result:</span> <br />
<br />
<a href="http://photos1.blogger.com/blogger2/378/3575/1600/result92.jpg"><img alt="" border="0" src="http://photos1.blogger.com/blogger2/378/3575/400/result92.jpg" style="cursor: hand; display: block; margin: 0px auto 10px; text-align: center;" /></a><br />
<span style="font-family: arial;">In the result, there is one calculated column called the <strong>average</strong>.</span><br />
<br />
<span style="font-family: arial;">The formula goes this way:</span><br />
<span style="font-family: arial;"><strong>average(n) = (average(n-1) + curval(n))/2 </strong></span><br />
<span style="font-family: arial;">where n is the id column. </span><br />
<span style="font-family: arial;">And </span><br />
<strong><span style="font-family: arial;">average(0) = 0</span></strong><br />
<br />
<span style="font-family: arial;">The SQL Server 2005 query for the above result goes like this:</span><br />
<br />
<strong><span style="font-family: trebuchet ms;">with cte as<br />
(select *,cast(curval/2 as decimal(16,10)) as average from #runningavg where id = 1<br />
union all<br />
select a.*, cast((a.curval + b.average)/2 as decimal(16,10)) from #runningavg a, cte b<br />
where a.id = b.id + 1<br />
)<br />
select * from cte</span></strong><br />
<br />
<span style="font-family: arial;">Though, this seems easy in SQL Server 2005, I wouldn't recommend this solution for the following reasons:</span><br />
<ul><li><span style="font-family: arial;">Its a performance intensive query and near implementation of a cursor solution (since it processes one row at a time).</span></li>
<li><span style="font-family: arial;">It won't work if the number of rows is more than 32767.</span></li>
</ul><br />
<span style="font-family: arial;"></span><br />
<span style="font-family: arial;">What is the best possible solution (performance wise) that you can come up with? </span><br />
<span style="font-family: arial;">The implementation can either be for SQL Server 2000 or 2005.</span>Unknownnoreply@blogger.com6tag:blogger.com,1999:blog-29286778.post-15987948598015136112006-11-13T23:14:00.001-05:002009-11-23T11:47:02.943-05:00Recursive Self Join - A futile endeavor!!<span style="font-family: arial;">This post is just a walkthrough of my attempt to come up with an elegant solution in SQL Server 2005 to a (yet) seemingly impossible problem. </span><span style="font-family: arial;">I don't really know if I am trying to reinvent the wheel but I did realise that the approach I followed was terribly flawed.</span><br />
<br />
<span style="font-family: arial;"></span><br />
<span style="font-family: arial;">What I wanted (and yet want) to achieve might be best explained with an example. Take the table Employees in the Northwind database. With the recursive common table expressions in SQL Server 2005, I can get all the subordinates of an employee (say employeeID = 2) with this query.</span><br />
<br />
<a name='more'></a><strong><span style="font-family: trebuchet ms;">--Query 1<br />
with cte as<br />
(<br />
select lastname,firstname, employeeID, reportsto<br />
from employees where employeeID = 2<br />
union all<br />
select a.lastname,a.firstname,a.employeeID,a.reportsto<br />
from employees a, cte b where a.reportsto=b.employeeid<br />
)<br />
select * from cte</span></strong><br />
<br />
<span style="font-family: arial;">Here is the result for the above query:</span><br />
<br />
<br />
<a href="http://photos1.blogger.com/blogger2/378/3575/1600/Result1.jpg"><img alt="" border="0" src="http://photos1.blogger.com/blogger2/378/3575/320/Result1.jpg" style="cursor: hand; display: block; margin: 0px auto 10px; text-align: center;" /></a><br />
<span style="font-family: arial;">Now, I wanted the result to be flattened to have a horizontal roll down from the manager to the subordinates (at the leaf level), like this:</span> <br />
<br />
<a href="http://photos1.blogger.com/blogger2/378/3575/1600/Result3.3.jpg"><img alt="" border="0" src="http://photos1.blogger.com/blogger2/378/3575/400/Result3.jpg" style="cursor: hand; display: block; margin: 0px auto 10px; text-align: center;" /></a><br />
<span style="font-family: arial;">Now, I can write a query that will get all the subordinates upto the second level (the above result) this way:</span><br />
<br />
<strong><span style="font-family: trebuchet ms;">--Query 2<br />
select<br />
c.lastname as ln_mgr2,c.firstname as fn_mgr2,c.employeeID as mgr2,<br />
b.lastname as ln_mgr,b.firstname as fn_mgr,b.employeeID as mgr,<br />
a.lastname as ln_emp,a.firstname as fn_emp,a.employeeID as emp<br />
from employees a<br />
right outer join employees b on a.reportsto = b.employeeid<br />
right outer join employees c on b.reportsto = c.employeeid<br />
where<br />
c.employeeid = 2<br />
</span></strong><br />
<span style="font-family: arial;">The above query cannot be used if the depth in the heriarchy increases or decreases.<br />
<br />
I was wondering whether I could come up with a depth-agnostic implementation with all the new features that we have in SQL Server 2005. I realised that the only T-SQL feature (to my knowledge) that can get me close to the solution was the APPLY operator.<br />
<br />
So, I started by creating a recursive user-defined table valued function (parameterized view) like this: </span><br />
<br />
<span style="font-family: trebuchet ms;"><strong>create function GetAllSubordinates(@EmpId int)<br />
returns table<br />
as<br />
return(<br />
SELECT a.EmployeeID,a.LastName,a.FirstName, b.*<br />
FROM northwind.dbo.Employees a<br />
outer apply dbo.GetAllSubordinates(a.EmployeeId) as b<br />
where a.ReportsTo = @EmpId<br />
)<br />
</strong></span><br />
<span style="font-family: arial;">The above function wasn't getting created since it was referencing itself. Ignoring the signs, I went about creating a work around, by creating a dummy function under the same name and using an alter function to overwrite it, like this:<br />
</span><br />
<span style="font-family: trebuchet ms;"><strong>create function GetAllSubordinates(@EmpId int)<br />
returns table<br />
as<br />
return(select 1 as dummy)<br />
<br />
go<br />
<br />
alter function GetAllSubordinates(@EmpId int)<br />
returns table<br />
as<br />
return(<br />
SELECT a.EmployeeID,a.LastName,a.FirstName, b.*<br />
FROM northwind.dbo.Employees a<br />
outer apply dbo.GetAllSubordinates(a.EmployeeID) as b<br />
where a.ReportsTo = @EmpId<br />
)<br />
</strong></span><br />
<span style="font-family: arial;">Now, in an ideal case, this function, when called this way:</span><br />
<br />
<strong><span style="font-family: trebuchet ms;">select * from GetAllSubordinates(2)</span></strong><br />
<br />
<span style="font-family: arial;">will find all the subordinates of 2 and will pass them to the function again to get their subordinates and will go on till you don't get any subordinates for an employee (termination condition) and return the result.<br />
<br />
Didn't find a reason why it should fail. But when I actually executed it, the following error was thrown:<br />
"<strong>View or function 'dbo.GetAllSubordinates' contains a self-reference. Views or functions cannot reference themselves directly or indirectly.</strong>"<br />
<br />
With a misconception that this too can be circumvented, I went ahead and tried to use OPENROWSET function and somewhere in the middle of the implementation it struck me why my approach can never work.<br />
<br />
The definition from BOL "<u>The APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query.</u>"<br />
<br />
In other words, its a union of all the cross joins between each row in the outer table with the result from the table valued function for that row.<br />
<br />
It means that all the rows returned after the cross apply should follow the same schema, which, in turn, means that the schema of any function should be fixed during the design time.<br />
<br />
The schema for the table returned by the function <strong>GetAllSubordinates </strong>will vary based on the depth of the leaf level employee from the input employeeID. And, you can't UNION result sets with different schema. Maybe, we can make the table returned by the function GetAllSubordinates to have the same schema (which I haven't figured out how, yet) by making the rest of the columns to be null. But, then, to get that I don't have to go for a recursive function, I can do it by just using <strong>query 2. </strong><br />
<br />
I have a feeling (though I cannot put my finger on it now) that the answer to the question "Why the PIVOT operator cannot take a select sub-query in the IN clause?" is much more complex and very much related to the issue we have now (though, at the database engine level).<br />
<br />
Or is there a simple explanation that I miss? </span>Unknownnoreply@blogger.com1