Chanler

Chanler

"Dark Horse MySQL" II. InnoDB Engine

Logical Storage Structure#

image.png|500

In the InnoDB storage engine, each row of data contains not only the values of all fields defined in the table but also several important components:

  1. 6-byte transaction ID trx id
    In InnoDB, each transaction is assigned a unique transaction ID, and each row of data contains the value of this transaction ID, which is used to implement transaction-level multi-version concurrency control (MVCC).
  2. 7-byte rollback pointer roll pointer
    To support transaction rollback operations, each row of data includes a rollback pointer that points to the corresponding record in the rollback log. When a transaction needs to roll back, the rollback pointer can quickly locate the corresponding rollback record and restore the data.
  3. Non-null column bitmap
    The non-null column bitmap is a mechanism used by the InnoDB storage engine to improve storage efficiency. Since many columns in a table often contain NULL values, the information of these columns can be represented using a bitmap, thereby compressing storage space as much as possible when storing data.
  4. Record header
    Each row of data contains a record header, which includes some metadata, such as whether the data has been deleted or locked.
  5. Version number
    In InnoDB's MVCC implementation, each row of data also contains a version number to indicate the current version of the row. When a transaction updates a row of data, it actually creates a new record based on the original row of data and distinguishes between the two records using the version number.

In summary, in addition to the values of the fields defined in the table, InnoDB rows also contain many other metadata, which play an important role in implementing transaction support, MVCC, storage efficiency, and data recovery in the InnoDB storage engine.

Architecture#

Below is the InnoDB architecture diagram, with the left side showing the memory structure and the right side showing the disk structure.

image.png|500

Memory Structure - Buffer Pool#

Operate on the data in the buffer pool and refresh it to the disk at a certain frequency to reduce disk IO.

image.png|500

The buffer pool is managed at the page level and is divided into three types:

  1. free page: unused page
  2. clean page: page that has been used but not modified
  3. dirty page: modified page that has been used, at this point the data is inconsistent with the data on the disk and needs to be flushed to the disk.

Memory Structure - Change Buffer#

Cache data changes in the change buffer. When data is read in the future, the data is merged and restored to the buffer pool, and the merged data will be flushed to the disk.

image.png|500

The significance of the change buffer:
Secondary indexes are usually non-unique and inserted randomly. The range of delete change operations is relatively large, causing a lot of disk IO. Merging data in the buffer pool can reduce disk IO.

Memory Structure - Adaptive Hash Index#

Optimize data queries on the buffer pool.

image.png|500

Memory Structure - Log Buffer#

The log buffer stores log data such as redo logs and undo logs, with a default size of 16MB, and is periodically flushed to the disk to reduce frequent disk IO.

image.png|500

Disk Structure - System Tablespace#

image.png|500

Disk Structure - File-Per-Table Tablespaces:#

image.png|500

Disk Structure - General Tablespaces#

image.png|500

Disk Structure - Undo Tablespaces Temporary Tablespaces#

image.png|500

Disk Structure - Doublewrite Buffer Files Redo Log#

image.png|500

Background Threads#

The role of background threads: flushing memory to disk.

image.png|500

Transaction Principles#

A transaction is a set of operations that form an indivisible unit of work. A transaction submits or rolls back all operations as a whole to the system, meaning these operations either all succeed or all fail.

ACID Properties

  • Atomicity: A transaction is the smallest indivisible unit of operation; it either succeeds entirely or fails entirely.
  • Consistency: When a transaction is completed, all data must remain in a consistent state.
  • Isolation: The isolation mechanism provided by the database system ensures that transactions run in an independent environment unaffected by external concurrent operations.
  • Durability: Once a transaction is committed or rolled back, its changes to the data in the database are permanent.

Atomicity - undo log
Durability - redo log
Consistency - undo log + redo log
Isolation - locks + MVCC

redo log - Durability#

When a dirty page is flushed to disk and an error occurs, data recovery is performed based on the redo log.

image.png|500

WAL (write-ahead logging) writes logs first, with logs being appended sequentially to disk IO rather than flushing dirty pages with random disk IO.

undo log - Atomicity#

The rollback log (undo log) records information before data is modified, providing rollback and MVCC.

The undo log is mainly used for rollback, so it records logical logs and the opposite operation statements, allowing rollback to read the corresponding content and roll back.

image.png|500

MVCC#

RU: Directly read the most recent data, no MVCC, no locks, nothing.
RC: Each read generates a snapshot read, allowing access to the latest committed data.
RR: The first select statement after the transaction starts generates a snapshot read, which continues to use this read view to ensure repeatable reads.
SE: Each read is a current read, which will be blocked and also locked.

Current read:
Reads the latest version of the record, ensuring that other concurrent transactions cannot modify the current record during the read, locking the read record, such as:
select..lock in share mode (shared lock), select..for update, update, insert, delete (exclusive locks) are all forms of current reads.

Snapshot read:
A simple select (without locking) is a snapshot read, which reads the visible version of the record data, which may be historical data. It is a non-blocking read without locks.

  • Read Committed: Each select generates a snapshot read.
  • Repeatable Read: The first select statement after the transaction starts is the point of snapshot read.
  • Serializable: Snapshot reads degrade to current reads.

MVCC:
Full name Multi-Version Concurrency Control, refers to maintaining multiple versions of data to ensure that read and write operations do not conflict. Snapshot reads provide a non-blocking read feature for MySQL's implementation of MVCC. The specific implementation of MVCC relies on three implicit fields in the database records, undo log, and readView.

Implementation Principle - Three Implicit Fields#

DB_TRX_ID, DB_ROLL_PTR, DB_ROW_ID

image.png|500

Implementation Principle - undo log#

The rollback log (undo log) is generated during insert, update, and delete operations, facilitating data rollback.
The undo log generated during insert is only needed during rollback and can be immediately deleted after the transaction is committed.
The undo log generated during update and delete is needed not only during rollback but also during snapshot reads and will not be immediately deleted.

image.png|500

Different transactions or the same transaction modifying the same record will lead to the generation of a version chain for that record's undo log, with the head of the chain being the latest old record and the tail being the earliest old record.

Implementation Principle - readView#

Each transaction generates a different readView when reading data.

image.png|500

readView version chain access rules

  1. First check if the current transaction trx_id equals creator_trx_id; if so, access it; if not, go to 2.
  2. Check if the current transaction trx_id is between min_trx_id and max_trx_id (maximum trx_id + 1) and is no longer active (committed); if so, access it; if not, go to 3.
  3. Find the maximum version less than min_trx_id (the earliest committed version).

image.png|500

Read Committed

In the first snapshot read of transaction 5, the version with trx_id = 2 meets the requirements of condition 2, trx_id < min_trx_id = 3, allowing read of transaction 2 which has been committed.

In the second snapshot read of transaction 5, the version with trx_id = 3 meets the requirements of condition 2, trx_id < min_trx_id = 4, allowing read of transaction 3 which has been committed.

image.png|500

Repeatable Read

The RR transaction isolation level generates a readView only during the first snapshot read, and subsequently reuses that readView.

image.png|500

This article was synchronized and updated to xLog by Mix Space. The original link is https://blog.0xling.cyou/posts/mysql/mysql-2

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.