Nov 15, 2006

A Scenario to Ponder #9

This scenario is a slightly modifed (simplified) version of a question posted in a discussion forum.
Lets say I have a SQL Server 2000 box. And I have the following table.

create table #runningavg(id int identity(1,1), curval decimal(16,10))

And the following data:

insert into #runningavg(curval) values(12)
insert into #runningavg(curval) values(14)
insert into #runningavg(curval) values(20)
insert into #runningavg(curval) values(30)
insert into #runningavg(curval) values(10)
insert into #runningavg(curval) values(6)
insert into #runningavg(curval) values(16)
insert into #runningavg(curval) values(8)
insert into #runningavg(curval) values(4)
insert into #runningavg(curval) values(2)
insert into #runningavg(curval) values(32)

Now, with the identity column filled, here is the table data:



I would like to come up with a query that will give me the following result:


In the result, there is one calculated column called the average.

The formula goes this way:
average(n) = (average(n-1) + curval(n))/2
where n is the id column.
And
average(0) = 0

The SQL Server 2005 query for the above result goes like this:

with cte as
(select *,cast(curval/2 as decimal(16,10)) as average from #runningavg where id = 1
union all
select a.*, cast((a.curval + b.average)/2 as decimal(16,10)) from #runningavg a, cte b
where a.id = b.id + 1
)
select * from cte


Though, this seems easy in SQL Server 2005, I wouldn't recommend this solution for the following reasons:
  • Its a performance intensive query and near implementation of a cursor solution (since it processes one row at a time).
  • It won't work if the number of rows is more than 32767.


What is the best possible solution (performance wise) that you can come up with?
The implementation can either be for SQL Server 2000 or 2005.

6 comments:

  1. Here you go:

    select r.id, sum(r.vals)
    from
    (select r1.id, r2.curval / power(2.,r1.id - r2.id+1) as vals
    from #runningavg r1
    join
    #runningavg r2
    on r2.id <= r1.id
    ) r
    group by r.id

    Works in either system. It works by having a think about what you're actually asking for.

    ReplyDelete
  2. Hi

    Are you sure you want an average calculated that way? It's not really the running average...

    ReplyDelete
  3. Very true, its not a running average. I was just coming up with a question. Not really a real time scenario..

    Basically, all these questions are for the drive against using cursors :)

    ReplyDelete
  4. Rob, beautiful dude. Just answer I had.. Here is my solution..

    select a.id,a.curval,sum(b.curval*power(cast(2 as decimal(16,10)),b.id-a.id-1))
    from #runningavg a, #runningavg b
    where a.id >=b.id
    group by a.id,a.curval
    order by a.id

    But, I think it can be improved. I guess we will be better off using a temporary table than this query.

    I tried doing a load test on this..
    added a few more rows to the table with this batch..

    declare @a int
    set @a = 1
    while (@a < 15)
    begin
    insert into #runningavg (curval) select curval from #runningavg
    set @a = @a + 1
    end

    Now I tried to run the CTE solution I gave and the other query with the join, and cte was performing much better. Though I would attribute it to the calculations in the latter, it calls for some serious thinking on when to go for cursors.
    Your opinions please!

    ReplyDelete
  5. A traditional cursor would be your best option. It's not limited to the recursion level.

    And yes, whenever you're doing running totals, a cursor becomes the best option for larger datasets.

    ReplyDelete
  6. I came up with this solution, rather similar to Rob's:

    select runl.id
    , runl.curval
    , sum(power(2,runr.id-1)*runr.curval)/power(2,runl.id) average

    from #runningavg runl
    join #runningavg runr on runl.id >= runr.id
    group by runl.id, runl.curval
    order by runl.id

    ReplyDelete