Oct 28, 2009

Understanding Transaction Isolation Levels in SQL Server

One of the most important concepts pertaining to any DBMS that every database programmer must know is the Transaction Isolation Levels. Sometimes, even the most seasoned database developers get confused on how multiple connections running in different isolation levels affect each other. This post is an attempt to explain through examples on how the isolation levels differ from each other.

The following table (taken from SQL Server Books Online) shows the different isolation levels (SQL-99 Standard) and its effect on concurrency.
Read uncommitted is the lowest isolation level and allows highest concurrency with least locking. Serializable is highest isolation level with lowest concurrency and holds exclusive range lock on the data in transaction.
Rather than looking at each Isolation Level and figuring out what it does, I thought it will make sense to understand how each concurrency problem is addressed as isolation level goes higher:

  • DIRTY READ: Ability to read uncommitted changes made by another transaction.
  • NON-REPEATABLE READ: Read locks are released immediately after the data is read. If the first transaction, that allows non-repeatable read, selects a row from a table, a second transaction can update (and commit) the same row even before the first transaction is complete. It means that if the first transaction selects the data again after the second transaction has completed, it will read the updated data.
  • PHANTOM: Read locks are held till the end of the transaction but no range locks are acquired. If the first transaction selects all the records in a table, a second transaction can insert (and commit) into the table. If the first transaction, reads through the table again, it will see the inserted (phantom) record.
The table below is a different view of the isolation levels. We will be using this table as a reference for our testing. We will take each scenario and compare the highest isolation level that allows against the lowest isolation level that denies.
Let's first set the environment for testing.
/* TEST_TABLE IS THE TABLE THAT WILL PARTICIPATE IN THE TRANSACTION */
CREATE TABLE TEST_TABLE
(
ID   INT         PRIMARY KEY,
NAME VARCHAR(10)
)

/* SOME TEST DATA FOR OUR TEST_TABLE */
INSERT INTO TEST_TABLE
SELECT 1, 'SMITH'
UNION ALL
SELECT 3, 'ADAMS'
UNION ALL
SELECT 5, 'SANDERS'

/* THIS TABLE WILL HOLD THE LOG OF WHAT IS HAPPENING. WILL BE EXPLAINED LATER */
CREATE TABLE TRANSACTION_STATUS
(
EXEC_ORDER          INT          IDENTITY(1,1),
PROCESS_DESCRIPTION VARCHAR(100),
ABSOLUTE_TIME       DATETIME
)

/* A BETTER VIEW OF THE LOG TABLE FOR OUR ANALYSIS */
CREATE VIEW VW_TRANSACTION_STATUS_LOG
AS
SELECT TOP 100 PERCENT
TS1.PROCESS_DESCRIPTION,
DATEDIFF(SS,TS2.ABSOLUTE_TIME,TS1.ABSOLUTE_TIME) AS RELATIVE_TIME
FROM
TRANSACTION_STATUS TS1,
TRANSACTION_STATUS TS2
WHERE
TS2.EXEC_ORDER = 1 
ORDER BY
TS1.EXEC_ORDER

Now, we create a transaction (Connection 1) that will select from the TEST_TABLE twice, with a delay of 10 seconds between them. We will also be logging each step happening in the transaction into our TRANSACTION_STATUS table. Given below is a sample screen shot of connection 1. Taking the logging part out, only on the script within the selected area is pertinent to our testing.


We create another transaction (connection 2), that will start 5 seconds after the first transaction (connection 1). The transaction will update a record in the test table and wait for 10 seconds and rollback the transaction.


Now, if we start both the transactions simltaneously and look at the log, we will know the sequence of execution of commands across the two transactions.
Note: TEST_TABLE will be refreshed to have the three rows as given below, before starting each test scenario.
With, the setup done, lets start with the first scenario:
Scenario 1 - Dirty Read: Comparing READ UNCOMMITTED and READ COMMITTED isolation level.
READ UNCOMMITTED: The below screen shot shows 3 connections. I tried to keep the image pretty self-explanatory, but I will walk through the screen for the first one:
  • Connection 1 runs at READ UNCOMMITTED isolation level. Connection 1 and 2 were started simultaneously.
  • You can see from the result set of Connection 1 that the second query of connection 1 reads through the uncommitted data of Connection 2.
  • Connection 3 shows the event log of the sequence of the commands between the first two connections.

READ COMMITTED: Here is what happens when the first connection runs in READ COMMITTED isolation level.

Scenario 2 - Non-repeatable Read: Comparing READ COMMITTED and REPEATABLE READ isolation level.
READ COMMITTED: The second connection updates the table and commits. And connection 1 reads the updated data in the second select.



REPEATABLE READ: And the same in REPEATABLE READ isolation level.



Phantom: Comparing REPEATABLE READ and SERIALIZABLE isolation level
REPEATABLE READ: Connection 2 inserts a record and connection 1 reads the inserted record in the second select.




SERIALIZABLE: Connection 2 had to wait for connection 1 to commit before it can insert.

You can see that SERIALIZABLE isolation level gives the best data consistency within a transaction. However, any attempt to update the table is put on hold till the transaction is complete. Though, we would love to have all our transactions running in SERIALIZABLE isolation level, we have to go for the lowest isolation level (that will work for the transaction) to gain performance and allow concurrency.

The default isolation level (READ COMMITTED) of SQL Server, should work in most scenarios as it makes sure you read only clean (committed) data. However, specific scenarios might require you to go for a higher level of isolation. I would try avoiding READ UNCOMMITTED isolation level (or using the NOLOCK table hint). Instead, as a best practice, try to keep the duration of transactions as short as possible.

That ends this post. For further reading, there is a wealth of information on isolation levels on the web and BOL (of course).

In the above post, I haven't considered READ COMMITTED SNAPSHOT and DATABASE SNAPSHOT introduced in SQL Server 2005. I hope I can write a post on that later. Till then, happy querying!!

0 comments:

Post a Comment