Getting Started with Multiversion Concurrency Control (MVCC) in PostgreSQL
Introduction to MVCC in PostgreSQL
MVCC is one of the magic pieces that makes Postgres and other relational databases what they are. It is a key aspect of PostgreSQL's framework and is used to implement transaction partitioning and concurrency control. As such, it is important for developers and database administrators to have a strong understanding of how MVCC works and how it can impact database performance and scalability.
A database user has the option of reading from or writing to a database, or doing both. However, a problem while creating (updating) databases is that our databases lock the data causing concurrency issues and making it challenging for readers to view the updated data. The readers might only see a partially updated version of the data or a version with errors. These concurrent problems could result in deadlocks and unsuccessful transactions. PostgreSQL uses Multi-Version Concurrency Control to control access to data in a database and ensures consistency while allowing multiple transactions to happen simultaneously in the database. In simple terms, MVCC creates several versions of a single database record, enabling various transactions to access different versions of one database record without conflicting with one another. This allows several transactions to run simultaneously, and eliminates the need for locks or blocking operations to enhance the database's performance and responsiveness.
The Behavior of MVCC
The most important aspect in Multiversion Concurrency Control in terms of user-visible behavior is that “readers never block writers” and “writers never block readers.” This behavior is primarily controlled by snapshots which determine what a user can see in a database. A snapshot in MVCC is a virtual read-only, point-in-time copy of the data recorded at the start of each SQL statement in a READ COMMITTED transaction isolation mode, and at the transaction start in a SERIALIZABLE translation isolation mode. MVCC provides a consistent view of the data to each transaction, even as other transactions are modifying the data. Each transaction has its own snapshot, which represents the state of the database at the beginning of the transaction. The snapshot comprises all information about all the table rows (tuples) in the database and controls the table rows that are visible to the transaction at the time the snapshot is taken.
When a new snapshot is taken, the following information is gathered:
- the highest-numbered committed transaction
- the transaction numbers currently executing
Using this information from a snapshot, Postgres can determine if a transaction’s actions should be visible to an executing statement.
How Does PostgreSQL Handle Concurrency?
MVCC allows multiple transactions to read and write to the database at the same time without interfering with each other while each transaction operates on a "snapshot" of the database which is a consistent view of the data at a specific time.
For an example illustrating how MVCC works in PostgreSQL, let’s assume we have a database and a table called Inventory within it - the table has the following columns:
- Product ID (unique key)
- Product Description
- Amount(USD)
- Customer Name
Product ID | Product Description | Amount (USD) | Customer Name |
---|---|---|---|
100121 | Apple Headphones | 500.00 | Raymond Ryan |
100122 | T-shirt | 102.00 | Raymond Ryan |
100122 | C-Type USB Cable | 68.00 | Harry Abubakar |
100124 | Pair of Spanner | 121.00 | Steven Snipes |
Let’s say that two of the database transactions (Transaction 1 and Transaction 2) above are trying to update the amount of the product which has an ID of 100121 at the same time. The two transactions are taking place simultaneously so Transaction 1 begins.
If you are looking for an easy and powerful SQL client and database manager, then you've got to try DbVisualizer. It connects to nearly any database.
Creating a Snapshot of the Database Table
When Transaction 1 (T1) starts, it creates a snapshot of the "Inventory" table as shown below:
Product ID | Product Description | Amount (USD) | Customer Name |
---|---|---|---|
100121 | Apple Headphones | 500.00 | Raymond Ryan |
100122 | T-shirt | 102.00 | Raymond Ryan |
100122 | C-Type USB Cable | 68.00 | Harry Abubakar |
100124 | Pair of Spanner | 121.00 | Steven Snipes |
Concurrent Transactions Creating Their Own Snapshots
Now, when Transaction 2 begins, it also creates its own snapshot of the database table as shown below:
Product ID | Product Description | Amount (USD) | Customer Name |
---|---|---|---|
100121 | Apple Headphones | 500.00 | Raymond Ryan |
100122 | T-shirt | 102.00 | Raymond Ryan |
100122 | C-Type USB Cable | 68.00 | Harry Abubakar |
100124 | Pair of Spanner | 121.00 | Steven Snipes |
Utilizing Multiversion Concurrency Control
If the Transaction 1 tries to update the amount of product with an ID of 100121 to $550.00 and Transaction 2 tries to update the amount of the same product to $580.00 at the same time, PostgreSQL will utilize the technique of Multiversion Concurrency Control to create a new version of the record for each transaction as shown below:
Product ID | Product Description | Amount (USD) | Customer Name |
---|---|---|---|
100121 | Apple Headphones | 550.00 | Raymond Ryan |
100121 | Apple Headphones | 500.00 | Raymond Ryan |
100122 | T-shirt | 102.00 | Raymond Ryan |
100122 | C-Type USB Cable | 68.00 | Harry Abubakar |
100124 | Pair of Spanner | 121.00 | Steven Snipes |
Product ID | Product Description | Amount (USD) | Customer Name |
---|---|---|---|
100121 | Apple Headphones | 580.00 | Raymond Ryan |
100121 | Apple Headphones | 500.00 | Raymond Ryan |
100122 | T-shirt | 102.00 | Raymond Ryan |
100122 | C-Type USB Cable | 68.00 | Harry Abubakar |
100124 | Pair of Spanner | 121.00 | Steven Snipes |
Execution and Commit of Transactions
After T1 executes its updates, the new version of the database record with an amount of $550.00 becomes the current version.
Product ID | Product Description | Amount (USD) | Customer Name |
---|---|---|---|
100121 | Apple Headphones | 550.00 | Raymond Ryan |
100122 | T-shirt | 102.00 | Raymond Ryan |
100122 | C-Type USB Cable | 68.00 | Harry Abubakar |
100124 | Pair of Spanner | 121.00 | Steven Snipes |
Now, when T2 tries to query the database for the row to commit changes, it sees the old value of 500.00 for the amount of product with an ID of 100121. This is because T2's snapshot of the data was taken at the time it started, which was before T1 committed its changes. PostgreSQL will recognize that the record it is trying to update has already been modified by T1 and hence cannot update it. Here, T2's transaction will create a serialization error, indicating that it conflicted with T1's transaction because T2’s snapshot is outdated.
Refreshing the Snapshot
T2 will have to refresh its snapshot by starting a new transaction and taking a new snapshot of the database which will include the row of the product with the ID of 100121 and a price of 550.00.
Creating a New Version of the Row with a New Transaction ID
T2 will now modify the row of the product with an ID of 100121 by changing the amount of the product from 550.00 to 580.00 and then commit its changes. PostgreSQL will then create a new version of the row with the new product amount value and an internal transaction ID of T2.
PostgreSQL MVCC Internal Process Flow
MVCC maintains multiple versions of each row in the database, so that different transactions can see different versions of the same data without blocking each other. Here, we'll explore the internal process flow of MVCC in PostgreSQL.
Suppose we have a table named inventory with columns product id, product description, amount and customer name:
Here's how the MVCC behavior in PostgreSQL works for transactions T1 and T2:
Transaction T1
- Transaction T1 begins by taking a snapshot of the database using the SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; statement. This tells PostgreSQL to use MVCC to ensure that T1 sees a consistent view of the data at the time it started.
- T1 then reads the current version of the row with ID 100121 using the SELECT amount FROM inventory WHERE id = 100121; statement. T1 sees the current amount value of the product with ID 100121 at this point.
- T1 now modifies the row with ID 100121 using the UPDATE inventory SET amount = 550 WHERE id = 100121; statement. This creates a new version of the row with a transaction ID of T1 and a new amount value of 550.
- T1 then commits the transaction using the COMMIT; statement.
- Transaction T2 begins by taking a snapshot of the database using the SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; statement. This tells PostgreSQL to use MVCC to ensure that T2 sees a consistent view of the data at the time it started.
- T2 then reads the current version of the row with ID 100121 using the SELECT amount FROM inventory WHERE id = 100121; statement. However, here, T2 sees the version of the row that was current when T1 began, which has an amount value of 500.
- T2 then modifies the row with ID 100121 using the UPDATE inventory SET amount = 580 WHERE id = 100121; statement. This creates a new version of the row with a transaction ID of T2 and a new amount value of 580.
- T2 commits the transaction using the COMMIT; statement.
As a result, the inventory table will now have two versions of the row with ID 100121: one with an amount of 550 and a transaction ID of T1, and another with an amount of 580 and a transaction ID of T2. Each transaction sees a consistent view of the data at the time it started, and PostgreSQL ensures that transactions do not interfere with each other, even during modification of the same data.
Pros & Cons Of MVCC
Pros
- MVCC allows multiple transactions to access the same database objects concurrently, without locking them. This can result in high concurrency and scalability and hence improve the performance of database applications in a multiuser environment.
- It helps in the prediction of concurrent query behavior.
- MVCC is more often faster than other concurrency control techniques, especially in circumstances with high read-to-write ratios.
Cons
- When there are numerous transactions under processing, MVCC will require more memory as compared to other concurrency control techniques.
- There is a risk of transaction ID wraparound. In PostgreSQL, MVCC uses transaction IDs to track committed updates to data. These transaction IDs are like counters that increase by one every time a new transaction is started. However, the IDs have a maximum value they can reach. If too many transactions happen and the ID counters reach their maximum value, they will reset to zero which can lead to errors.
Conclusion
In this article, you have learnt about the general idea of what MVCC is and how it functions in PostgreSQL by using snapshots to manage the visibility of transactions to avoid the need for locking. MVCC permits higher levels of concurrency, which reduces contention in a database application and improves performance. By understanding the principles and best practices of MVCC, developers can design more efficient and scalable applications that take full advantage of the power of PostgreSQL.
Make sure to follow the DbVisualizer Table blog to learn more about database development, and until next time.