Nov 15, 2006

A Scenario to Ponder #9

This scenario is a slightly modifed (simplified) version of a question posted in a discussion forum.
Lets say I have a SQL Server 2000 box. And I have the following table.

create table #runningavg(id int identity(1,1), curval decimal(16,10))

And the following data:

insert into #runningavg(curval) values(12)
insert into #runningavg(curval) values(14)
insert into #runningavg(curval) values(20)
insert into #runningavg(curval) values(30)
insert into #runningavg(curval) values(10)
insert into #runningavg(curval) values(6)
insert into #runningavg(curval) values(16)
insert into #runningavg(curval) values(8)
insert into #runningavg(curval) values(4)
insert into #runningavg(curval) values(2)
insert into #runningavg(curval) values(32)

Now, with the identity column filled, here is the table data:



I would like to come up with a query that will give me the following result:


In the result, there is one calculated column called the average.

The formula goes this way:
average(n) = (average(n-1) + curval(n))/2
where n is the id column.
And
average(0) = 0

The SQL Server 2005 query for the above result goes like this:

with cte as
(select *,cast(curval/2 as decimal(16,10)) as average from #runningavg where id = 1
union all
select a.*, cast((a.curval + b.average)/2 as decimal(16,10)) from #runningavg a, cte b
where a.id = b.id + 1
)
select * from cte


Though, this seems easy in SQL Server 2005, I wouldn't recommend this solution for the following reasons:

  • Its a performance intensive query and near implementation of a cursor solution (since it processes one row at a time).
  • It won't work if the number of rows is more than 32767.


What is the best possible solution (performance wise) that you can come up with?
The implementation can either be for SQL Server 2000 or 2005.

6 comments:

Anonymous said...

Here you go:

select r.id, sum(r.vals)
from
(select r1.id, r2.curval / power(2.,r1.id - r2.id+1) as vals
from #runningavg r1
join
#runningavg r2
on r2.id <= r1.id
) r
group by r.id

Works in either system. It works by having a think about what you're actually asking for.

Mitch Wheat said...

Hi

Are you sure you want an average calculated that way? It's not really the running average...

Omnibuzz said...

Very true, its not a running average. I was just coming up with a question. Not really a real time scenario..

Basically, all these questions are for the drive against using cursors :)

Omnibuzz said...

Rob, beautiful dude. Just answer I had.. Here is my solution..

select a.id,a.curval,sum(b.curval*power(cast(2 as decimal(16,10)),b.id-a.id-1))
from #runningavg a, #runningavg b
where a.id >=b.id
group by a.id,a.curval
order by a.id

But, I think it can be improved. I guess we will be better off using a temporary table than this query.

I tried doing a load test on this..
added a few more rows to the table with this batch..

declare @a int
set @a = 1
while (@a < 15)
begin
insert into #runningavg (curval) select curval from #runningavg
set @a = @a + 1
end

Now I tried to run the CTE solution I gave and the other query with the join, and cte was performing much better. Though I would attribute it to the calculations in the latter, it calls for some serious thinking on when to go for cursors.
Your opinions please!

Anonymous said...

A traditional cursor would be your best option. It's not limited to the recursion level.

And yes, whenever you're doing running totals, a cursor becomes the best option for larger datasets.

Unknown said...

I came up with this solution, rather similar to Rob's:

select runl.id
, runl.curval
, sum(power(2,runr.id-1)*runr.curval)/power(2,runl.id) average

from #runningavg runl
join #runningavg runr on runl.id >= runr.id
group by runl.id, runl.curval
order by runl.id

Post a Comment