Wednesday 20 December 2023

Learning PostgreSQL - Database Layouts (Logical and Physical)

As you might already know that I have started writing about my journey in learning of PostgreSQL as posts, which are sort of notes to me and professionals like me. If you are following along with the series, then there are 2 posts which I have already written about Architecture and Installation of PostgreSQL database.

In this post, I will share Logical and Physical structure of the PostgreSQL database. This will be an overview of the database layouts, we will go deep like discussing internal layout of a page in future posts.

Logical Structure of Database cluster

In other RDBMS databases, the clusters usually mean a collection of database servers, but in PostgreSQL it is a collection of databases managed by a PostgreSQL server. The below figure will illustrate what do I mean by this:

Let's breakdown the figure in points that we can understand:
  • Database cluster: As I mentioned earlier in this post, in PostgreSQL, this means collection of databases.
  • Databases: In simple terms, a database is a collection of database objects like tables, indexes, etc.

If you are from a database background then you will know the other terms in the figure, like Table/Relation, Indexes, Views, Sequences, Functions, etc. Which are database objects.

All the database objects in PostgreSQL are internally managed by respective object identifiers (OID), in Oracle terms we call it as Object ID. The OID in PostgreSQL is unsigned 4-byte integers. In PostgreSQL, a database will also have an OID. Like Oracle, in PostgreSQL we have data dictionary which we call system catalogs. This will also be discussed in a separate post, as this is a different topic altogether.

Physical Structure of Database cluster

In Oracle, we have our datafiles stored at filesystem level either in OMF (Oracle Managed Filesystem) or NON-OMF, and if you are using ASM (Automatic Storage Management) then it will be stored in an integrated file system and volume manager build specifically for Oracle database files.

The configuration and data files used by PostgreSQL database cluster (group of databases) are stored together within a directory, which is called data directory. This is commonly referred as PGDATA (this is basically a name of environmental variable that used to define data directory). Common location for PGDATA is /var/lib/pgsql/data.

The above figure gives you a glimpse of what a data directory could look like, this does not contain all the folders that are present. So let's start discussing the folders and files one by one, first we will start with files:

  • PG_VERSION: Contains the major version of PostgreSQL. This is something like Oracle's oratab which contains the database name and the ORACLE_HOME they are running from.
  • pg_hba.conf: This file control the authentication to the PostgreSQL databases.
  • pg_ident.conf: Used to control PostgreSQL's username mapping.
  • postgresql.conf: This file will have all the configuration parameters and their values. This is like Oracle's init.ora but unlike init.ora, it has all the parameters that are present for the version with their default values.
  • postgresql.auto.conf: When we set some dynamic parameters using ALTER SYSTEM (version 9.4+), this file will be updated.
  • postmater.opts: Contains information about how postgres server process was started.

This is just an overview of what these files are used for, I will make a separate post to discuss these files in more detail. Now lets discuss the folders present in the data directory.

  • base/: This contains subdirectories containing per-database subdirectories, which are used to store database objects.
  • current_logfiles/: Has log file(s) which are written by the logging collector. 
  • global/: Contains cluster-wide tables, like pg_database and pg_control
  • pg_commit_ts/: Has transaction commit timestamp data (versions 9.5+)
  • pg_xact/: Contains transaction commit state data, this folder was introduced in version 10+, earlier this subdirectory was called as pg_clog.
  • pg_dynshmem/: Contains files used by the dynamic shared memory subsystem.
  • pg_logical/: Subdirectory containing status of data for logical decoding.
  • pg_multixact/: Contains multitransaction status data which is used for shared row locks
  • pg_repslot/: Has replication slot data
  • pg_stat/: Holds permanent files for statistics subsystem
  • pg_tblspc/: Has symbolic links to tablespaces
  • pg_wal/: This subdirectory keeps segments files for WAL (Write Ahead Logging).
The same layout has been described in the official document, you can also read that document for more information if you want to have on some of the folders.

I know you might have been overwhelmed with information about configuration files/folders, but don't worry, the important files and folders will be having a separate post in which it will be discussed in great detail.

This is all for this post, I will write a new post covering either configuration file information or some other interesting topic about PostgreSQL.

No comments :

Post a Comment