Jul 11, 2006

Calculating the running total in a table

The scenario I am trying to discuss is to have a table which gives the running total of the values when you select from it. What I am not discussing is to generate the running total for a select query.

Okay, so here is the table definition.

create table calculation
(a int identity(1,1),
b int)

Let me insert a few rows.

insert into calculation(b) values(100)
insert into calculation(b) values(200)
insert into calculation(b) values(300)
insert into calculation(b) values(150)
insert into calculation(b) values(180)
insert into calculation(b) values(190)


One way to generate the running total is to create a view on the table.

create view v1
as
select top 100 percent t1.a,t1.b,sum(t2.b) as c from calculation t1, calculation t2
where t1.a>=t2.a
group by t1.a,t1.b
order by t1.a


Now a select on this view will give me the running total along with the value.

a b c
--
---------
1 100 100
2 200 300
3 300 600
4 150 750
5 180 930
6 190 1120



I can also get the running total at the table level without creating the view. I will be doing it using a function like this.

create function fn1(@a int, @b int)
returns int
as
begin
declare @result int
set @result = @b
set @result = @result + (select isnull(sum(b),0) from calculation where a< @a) return @result end


Now I create the table

create table calculation
(a int identity(1,1),
b int,
c as dbo.fn1(a,b))


After I run the same insert script, I can do a direct select on the table.

The issue doesn't end here. If I try to run the select from the view in the first solution and from the table in the second and try to compare the query execution plan, the second one (using function) seems to be faster. Why? Because the optimizer doesn't calculate the processing time taken by the function. So it doesn't give me an answer as to which is better. I think the one with the view will be much better (of course with the index on the ID column). What do you think?
And is there a better way of handling this

Updated on 9th Nov 2009:
After looking at Jeff Moden comments, I realized that though the solution presented does the job, it's no way a worthy solution. And this post cannot be complete with just them. So, I filled the table with 10,000 rows as Jeff suggested and ran a select from the view. It took 25 seconds in my machine. Not good enough. I didn't want to go to the obvious solution of using cursors to solve this, so I played around. 

I then figured out a way. It looked alright at first. Here is (not) the solution:

create table #temp (id int identity(1,1) primary key, b int, running_total bigint)

insert into #temp (b) select b from calculation order by a

declare @current_total bigint
set @current_total = 0
set @b = 0

update #temp set @current_total = @current_total + b,
                            running_total   = @current_total
from #temp

select * from #temp

drop table #temp 

It gave me the answer that I wanted. And it ran in 0 seconds. However, I cannot use this solution(what a waste). The result of my query is dependent on the execution plan used to do the update and I make 2 assumptions here (both bad ones):

  • The update will happen in the order of the primary key (which is our sort order)
  • The @current_total variable gets updated for each row
This takes us back to using cursors as the best solution for this scenario (atleast until I can think of anything else). You can check here for the solution using cursors and other variations of the cross join.

5 comments:

han said...

Is there also possibility to calculate a running total in the next situation.

Column 1 Location
Column 2 Date
Column 3 Number

I want a running total by location and order by date.

Jeff Moden said...

Ugh... correlated sub-queries make for "triangular joins" which are about half of a full Cartesian join. Try either solution with a tiny little 20,000 rows and see what I mean...

Jeff Moden said...

So... did you guys ever try either solution with the 20k rows?

Omnibuzz said...

Good point, Jeff. I just looked at your comment. I am thinking of a better way to do the calculation. Do you have any ideas? Please do share your thoughts.
-Omni

Anonymous said...

My very sincere appologies for not posting back. The code you have for the UPDATE method that relies on the order of the clustered index is tried, true, and very old (since the days in Sybase which is where SQL Server originated from).

Please see the following article for the simple rules you need to follow.
http://www.sqlservercentral.com/articles/T-SQL/68467/

--Jeff Moden

Post a Comment