Jun 5, 2006

Recursive function with cross apply vs CTE in SQL Server 2005

Common Table Expressions (CTE) may not be the best way, performance wise, (in all cases) for recursive output. I don't want to come out too strong.
But I guess it can be proved with a small test.

Lets take the database AdventureWorks.
Say, I want to find all the subordinates for a given manager (here I take it as null for the top most manager) from the table HumanResources.Employee.

So the query goes like this with the Common Table Expression (CTE) feature of SQL Server 2005.

with EmpMgr(emp_id,mgr_id) as
(
select EmployeeID, ManagerID from HumanResources.Employee
where ManagerID is null
union all
select EmployeeID, ManagerID from EmpMgr, HumanResources.Employee a
where a.ManagerID = EmpMgr.emp_id
)
select * from EmpMgr

The above query gets the result, Its the easiest to write and maintain.

But lets try to do it a bit differently. Suppose I create a function like this, which will recursively call itself till there are no more rows returned.

Create function f1(@in_iMgr int, @level int)
returns @tbl table
(emp_id int,
mgr_id int,
level int,
primary key (level,emp_id))
begin

insert into @tbl select EmployeeID, ManagerID,@level
from HumanResources.Employee a
where (@in_iMgr = 0 AND a.ManagerID IS NULL )
OR a.ManagerID = @in_iMgr

insert into @tbl select b.emp_id,b.mgr_id,b.level
from @tbl a cross apply f1(a.emp_id,@level + 1) b
return

end

And I call this function

select emp_id, mgr_id from dbo.f1(0,0)
order by level asc

I get the same result as the CTE and I see that the cost is less than half of what the CTE takes.

Again, we cannot trust the Execution Plan to give us the actual cost.

But the time taken on the same machine for these two queries were consistently different and the function with cross apply was faster.

But there is definitely a breakeven point. There is a performance degradation on using the recursive function as the number of rows returned or the depth of recursion increases.
And frankly CROSS APPLY doesn't seem to use indexes effectively (Correct me if I am wrong)

But, the function approach beats CTE hands down if the number of rows returned is lesser than ,say, round about 5000. This number is what I found out with some experimentation.
I have given the approach. Implement it to your requirements and check it for yourself.

Let me know if you have a different opinion.


2 comments:

Bill Graefe Jr said...

Thanks for the handy code. One critical point: The primary key definition in the output table is critical or SS05 will complain about using direct or indirect or direct usage of the function within itself.

Anonymous said...

CTE approach works beautifully when you have the recursive part of your CTE perfectly indexed. Mind posting your query plans? Honestly, the test you have conducted is too shallow to come to a conclusion.

Post a Comment