Nov 13, 2006

Recursive Self Join - A futile endeavor!!

This post is just a walkthrough of my attempt to come up with an elegant solution in SQL Server 2005 to a (yet) seemingly impossible problem. I don't really know if I am trying to reinvent the wheel but I did realise that the approach I followed was terribly flawed.


What I wanted (and yet want) to achieve might be best explained with an example. Take the table Employees in the Northwind database. With the recursive common table expressions in SQL Server 2005, I can get all the subordinates of an employee (say employeeID = 2) with this query.

--Query 1
with cte as
(
select lastname,firstname, employeeID, reportsto
from employees where employeeID = 2
union all
select a.lastname,a.firstname,a.employeeID,a.reportsto
from employees a, cte b where a.reportsto=b.employeeid
)
select * from cte


Here is the result for the above query:



Now, I wanted the result to be flattened to have a horizontal roll down from the manager to the subordinates (at the leaf level), like this:


Now, I can write a query that will get all the subordinates upto the second level (the above result) this way:

--Query 2
select
c.lastname as ln_mgr2,c.firstname as fn_mgr2,c.employeeID as mgr2,
b.lastname as ln_mgr,b.firstname as fn_mgr,b.employeeID as mgr,
a.lastname as ln_emp,a.firstname as fn_emp,a.employeeID as emp
from employees a
right outer join employees b on a.reportsto = b.employeeid
right outer join employees c on b.reportsto = c.employeeid
where
c.employeeid = 2

The above query cannot be used if the depth in the heriarchy increases or decreases.

I was wondering whether I could come up with a depth-agnostic implementation with all the new features that we have in SQL Server 2005. I realised that the only T-SQL feature (to my knowledge) that can get me close to the solution was the APPLY operator.

So, I started by creating a recursive user-defined table valued function (parameterized view) like this:


create function GetAllSubordinates(@EmpId int)
returns table
as
return(
SELECT a.EmployeeID,a.LastName,a.FirstName, b.*
FROM northwind.dbo.Employees a
outer apply dbo.GetAllSubordinates(a.EmployeeId) as b
where a.ReportsTo = @EmpId
)

The above function wasn't getting created since it was referencing itself. Ignoring the signs, I went about creating a work around, by creating a dummy function under the same name and using an alter function to overwrite it, like this:

create function GetAllSubordinates(@EmpId int)
returns table
as
return(select 1 as dummy)

go

alter function GetAllSubordinates(@EmpId int)
returns table
as
return(
SELECT a.EmployeeID,a.LastName,a.FirstName, b.*
FROM northwind.dbo.Employees a
outer apply dbo.GetAllSubordinates(a.EmployeeID) as b
where a.ReportsTo = @EmpId
)

Now, in an ideal case, this function, when called this way:

select * from GetAllSubordinates(2)

will find all the subordinates of 2 and will pass them to the function again to get their subordinates and will go on till you don't get any subordinates for an employee (termination condition) and return the result.

Didn't find a reason why it should fail. But when I actually executed it, the following error was thrown:
"View or function 'dbo.GetAllSubordinates' contains a self-reference. Views or functions cannot reference themselves directly or indirectly."

With a misconception that this too can be circumvented, I went ahead and tried to use OPENROWSET function and somewhere in the middle of the implementation it struck me why my approach can never work.

The definition from BOL "The APPLY operator allows you to invoke a table-valued function for each row returned by an outer table expression of a query."

In other words, its a union of all the cross joins between each row in the outer table with the result from the table valued function for that row.

It means that all the rows returned after the cross apply should follow the same schema, which, in turn, means that the schema of any function should be fixed during the design time.

The schema for the table returned by the function GetAllSubordinates will vary based on the depth of the leaf level employee from the input employeeID. And, you can't UNION result sets with different schema. Maybe, we can make the table returned by the function GetAllSubordinates to have the same schema (which I haven't figured out how, yet) by making the rest of the columns to be null. But, then, to get that I don't have to go for a recursive function, I can do it by just using query 2.

I have a feeling (though I cannot put my finger on it now) that the answer to the question "Why the PIVOT operator cannot take a select sub-query in the IN clause?" is much more complex and very much related to the issue we have now (though, at the database engine level).

Or is there a simple explanation that I miss?

1 comments:

Anonymous said...

Recursive SQL may let a table grow vertically.
The main reason why a table cannot "grow" horizontally is that both a table function and a CTE must decide on its number of columns in advance.
It's a conceptual aspect: tables have fixed column structure but varying row content.

The only way around that I can see is to let the last column "grow" by concatenation new values to it.

So in your example you could create a table with all employees (first column), each with a comma-separated list of their subordinates, by using a recursive CTE.

Post a Comment