Editorials

Immutable Data

Is there such a thing as immutable data in a relational database? Immutable data is not a term usually applied to a database, it is more of an Object oriented term. Immutable objects are those objects where he values (properties) of an object may not be modified after it is instantiated. The properties may only be set through construction. Once the properties are set, if you need to modify one, you must create a new object with the new values.

In SQL that would be the equivalent of not allowing an update to occur on data. If you wanted to modify the values in a row, you must create a new row, replacing the values from the previous instance that have been modified with new values, and copying the values from the previous instance that have not changed. So the columns behave as properties, and the rows behave as immutable objects.

Immutable objects are not always used in object oriented programming due to the overhead of constantly cloning an object, simply to modify one or more properties. They become very popular when you are working with multi-threaded applications. You don’t want one thread to modify a property of an object that is being modified by another thread, all at the same time. Using immutable objects allows for the coordination of multiple threads to make sure things are modified in the correct order, and changes from one thread do not corrupt the changes from another thread.

The thing that prompted me to make this comparison was the need to resolve some database deadlocks on tables that are essentially immutable. The tables have versioning. Each time a change is made to a record it is performed on a new row, with a new version. Therefore any record with an existing version is in essence immutable. This relates to deadlocks in this fashion. One of the problems that can occur on a database having deadlocks is that there are some fast and slow processes accessing the same table. If the slow process requires locking that may be blocking other faster or slower processes, it can bring the system to a stop during high peak activity. Even if the slow process is only using a read lock, with the purpose to guarantee a repeatable read of the data, it can block other processes wanting to create new records.

Since the table data is versioned, it could be considered immutable from a logical standpoint. If so, it can’t (technically) be modified any further. Any changes result in a new version. So, when you want to get a snapshot in time, you could feasibly ignore locking by using a NO LOCK or READUNCOMMITTED query hint.

There is some risk in this process. The reason is that there may be other code which is allowed to modify existing records. The only thing making the data immutable is your application logic. There is no such concept as immutable in a SQL database by default. However, if you are confident in your database access processes, you could reasonable remove the locking from your long running process that is read only, and reduce contention.

There are many schools of thought on the use of locks, or even not using locks at all. Is this an instance you would consider using No Locking? Feel free to leave a comment with your position.

Cheers,

Ben