Nov 27, 2006

Obvious SQL Tip #1

I don't really know why I am starting this series, if it will be really useful for anyone and whether I will be able to keep it running for long. But, its here because I am convinced that sometimes the most obvious escapes our mind. And as someone rightly said "common sense is not so common".

I just intend to illustrate few common issues(one per post) where we try to come up with complicate queries and realise, in the end, that we have a painfully obvious alternative. But, the question is - Is it obvious and is it really an alternative? Read ahead...



My requirement is this. I have SQL Server 2000. In the northwind database, I need to select some information for the order (in the orders table) which has the maximum freight charge.

The query that strikes our mind immediately is this:

select OrderID, CustomerID, EmployeeID, OrderDate, freight
from orders
where freight = (select max(freight) from orders)


On running the above query, I get this result:




What we are doing here is a self-join which, if you notice carefully, can be avoided. Can I use this query to get the result?

select top 1 OrderID, CustomerID, EmployeeID, OrderDate, freight
from orders
order by freight desc


Of course, I can. And I reduced one join. Isn't this great? Yes, it is.

Happily basking in the glory of ineptitude, seldom do we realise that there can be more than one order having the same max freight charge and we got the same result just by chance.

May be, If I wanted to find the last order by orderdate, my safe query will look this way:

select OrderID, CustomerID, EmployeeID, OrderDate, freight
from orders
where orderdate = (select max(orderdate) from orders)


This is the result:



And my seemingly better query:

select top 1 OrderID, CustomerID, EmployeeID, OrderDate, freight
from orders order by orderdate desc


This gives me the wrong result (of course):



Now, I have realised the problem, but I don't want to abandon my approach. I would write my query this way to get the correct result:

select top 1 with ties OrderID, CustomerID, EmployeeID, OrderDate, freight
from orders
order by orderdate desc


It got me the result. Let's see how well it performs when compared to my safe query. So, I run the following queries in parallel:

--Query #1
select top 1 with ties OrderID, CustomerID, EmployeeID, OrderDate, freight
from orders
order by orderdate desc

--Query #2
select OrderID, CustomerID, EmployeeID, OrderDate, freight
from orders
where orderdate = (select max(orderdate) from orders)


and when I check the query execution plan, I am in for a surprise:



I find that Query #1 incurs almost double the cost of Query #2 (or the safe query) and Query #2 doesn't have a self-join (The reason I started the exercise was to avoid the self join which is not there). I still don't lose hope. I want to make my Query #1 performing. Assuming that my orderdate will always be lesser than or equal to the current date and I can convert my scan to seek. So I add up a useless filter to it and my query now, is this:

select top 1 with ties OrderID, CustomerID, EmployeeID, OrderDate, freight
from orders
where orderdate <= getdate() order by orderdate desc

Here is the query execution plan for the above query:


Now, I am able to bring the query to execute as fast as my safe query. Good that I did it without a join. But, I see now that my query looks more complicated, has an assumption which might fail anytime and is giving me the same execution plan and cost as my safe query. So, which query do you choose - good and safe or just good?

Moral of the story:
"Don't stop thinking too hard for better alternatives, but remember that it doesn't become a better alternative just because you thought too hard" :)

1 comments:

Anonymous said...

select top 1
OrderID,
CustomerID,
EmployeeID,
MAX(OrderDate),
MAX(freight)
from orders
group by OrderID,
CustomerID,
EmployeeID
Order by MAX(freight) DESC, MAX(OrderDate) DESC

Post a Comment