Sunday 26 November 2023

Learning PostgreSQL - Architecture

I have started learning a new database, i.e. PostgreSQL, which is a powerful and open source relational database system. 

This is something new that I am trying, so I will start this series where I will try to publish a post every week with what I learned. Basically, these posts are nothing but notes that I am taking during the learning journey of PostgreSQL database.

This can also help those who are also in the process of learning this database system, and to those who already learned PostgreSQL, it can be a refresher.

Without wasting more time in introduction, let's start with the Architecture of the PostgreSQL. For Oracle DBAs (someone like me 😉), I will try to compare it with Oracle terminology so that it may help or speedup the learning.

PostgreSQL is a relational database management system with a client-server architecture. At server side, the PostgreSQL's processes and shared memory will work together and build an instance, which will be handling the access to the data. Client programs connect to the instance and request read and write operations. 

Let's start with Memory Architecture,

Example of Memory Architecture

Memory architecture of PostgreSQL can be divided into 2 areas:
  1. Local memory: it is allocated by each backend process for query processing. In Oracle terminology, we can consider this as Private Global Area (PGA) which is allocated per process. This memory area is further divided into sub-areas:
    • work_mem: Whenever a process uses sorting or table join operation, then the process will utilize this area.
    • maintenance_work_mem: As the name suggests, it will be used for maintenance operation (e.g. VACUUM, REINDEX)
    • temp_buffers: This is used when the process requires sorting of temporary tables
  2. Shared memory: it is allocated when the PostgreSQL instance is started, and it is used by all the processes of a PostgreSQL server.
    • Shared buffer pool: This area is used for processing of pages, PostgreSQL will load pages within the tables and indexes from datafiles and make operates on them in this area. In Oracle terminology, we can consider this as Shared Global Area (SGA).
    • WAL buffer: This buffer holds WAL data (transactional log) before writing to a persistent area. This is to ensure that no data is lost in case of server failure or crash. This is just like Redo Logs, which we have in Oracle.
    • Commit log: It keeps the state of all transactions (e.g. In Progress, Committed or Aborted) for the concurrency control.
These are important memory area that are allocated when PostgreSQL is started, apart from this there are several other areas which are allocated which handles access control mechanisms (e.g. shared and exclusive locks, lightweight locks, etc.), for background processes like checkpointer and autovacuum and to keep track of transaction processing, such as checkpoints and two-phase commit.

Now let's see how Process architecture is for PostgreSQL,
Example of Process Architecture

A collection of multiple processes cooperatively managing one database cluster is usually referred to as a ‘PostgreSQL Server’.  Below are the types of processes in PostgreSQL:
  1. postgres (server process): It is the parent process of all other processes related to database cluster management.
  2. postgres (backend process): This process handles all the queries which are issued by connected client. It communicates with the client using a single TCP connection and gets terminated when the client disconnects.
  3. Background processes: There are various background processes which are responsible for different tasks. The list of background processes is:
    • background writer: This writes dirty buffers from shared buffer pool to a persistent storage on a regular basis. (In versions 9.1 or earlier, this process was also responsible for the checkpoint) In Oracle, this is achieved by BWnn process.
    • checkpointer: Performs the checkpoint process
    • autovacuum launcher: Responsible for periodically invoking autovacuum worker process to perform vacuum.
    • WAL writer: Writes WAL data from wal buffers to persistent storage periodically, the file is usually called as WAL file. In Oracle, LGWR is responsible for flushing the redo data into the redo log file.
    • statistics collector: As the name suggests, it collects statistics information and stores in pg_stat_activity and pg_stat_database, etc. (This process is deprecated in PostgreSQL 15 onwards)
    • logger: Writes error messages into log files.
    • archiver: Executes archiving of WAL log files. (ARCn in Oracle)
    • logical replication launcher: Launches logical replication workers for every enabled subscription (Used for replication purpose, more on this in later posts)
    • wal sender: Runs on a primary server, sends WAL data to standby server.
    • wal receiver: Runs on secondary server, receives and replays the WAL data.
In the next post, we will discuss installation types and database versioning, PostgreSQL follows.

No comments :

Post a Comment