Skip to main content

Optimal data alignment in tables (PG110)

Alignment and Padding in PostgreSQL

When creating a new table, columns of different data types take up different amounts of disk space. The order in which the columns are written determines the amount of space that is used by the rows in the table. Postgres may add padding to columns with smaller sizes to fulfill alignment requirements of the data type of the consecutive column with a larger size.

For example, let’s consider a table with four smallint and one bigint type columns. When a column with a smaller size, such as a smallint (2 bytes), has a consecutive column with a larger size, such as bigint (8 bytes), Postgres adds 6 bytes of padding to the column of type smallint in order to fulfill an alignment requirement of 8 bytes for the consecutive bigint column. However, the space used in padding (6 bytes) can be utilized by rearranging the three other smallint type columns from the table before the bigint type column:

wrong

correct

info

The catalog pg_type stores information about alignment requirements for various data types. You can access the typealign column of this catalog table in order to determine alignment requirements for data types. To learn more about the catalog pg_type and interpretation of the typealign column, visit the official documentation here.

Save up on precious space with optimal data alignment

Creating a table with optimal data alignment can help minimize the amount of required disk space. This document describes how we can save space in a database just by changing the order of columns. Let’s see it in action.

Example

Consider the following table on a 64-bit system.

CREATE TABLE users (
user_id bigint PRIMARY KEY,
active_status boolean,
data_balance integer,
plan_period smallint
);

Here is what a portion of the table might look like after inserting values:

SELECT
*
FROM
users
Output
 user_id | active_status | data_balance | plan_period
---------+---------------+--------------+-------------
0 | f | 731147 | 267
1 | t | 901448 | 26
2 | t | 496823 | 24
3 | t | 985877 | 280
4 | f | 857124 | 83
.
.
.
1999995 | f | 993259 | 83
1999996 | f | 926033 | 332
1999997 | f | 889133 | 280
1999998 | t | 943793 | 7
1999999 | t | 550192 | 146

Each tuple in the table takes up 24 bytes of total memory without the header. The user_id occupies the first 8 bytes.

Here, the active_status column is of type boolean and takes up only one byte. However, because the next consecutive column data_balance is an integer with size of 4 bytes, PostgreSQL automatically adds 3 bytes of padding to active_status so its size can match with the 4 bytes size of the consecutive data_balance column.

The second 8 bytes of the row are occupied by 1 byte of active_status, 3 bytes of padding, and then 4 bytes of data_balance.

Lastly, the plan_period column has a size of only 2 bytes. However, PostgreSQL adds 6 bytes of padding in order to complete the 8 bytes for the end of the row. The following table displays the data:

ColumnTypeSizePadding
user_idbigint8 bytesNone
active_statusboolean1 byte3 bytes
data_balanceinteger4 bytesNone
plan_periodsmallint2 bytes6 bytes

example1

Now, let’s see how much space the table takes with the following command:

SELECT pg_size_pretty(pg_relation_size('users'));
Output
 pg_size_pretty
----------------
100 MB
(1 row)

Our table occupies 100 MB of space. Let’s see how we can optimize the occupied space by rearranging the columns.

Let’s create the same table with the rearranged rows as follows:

CREATE TABLE users_optimized (
user_id bigint PRIMARY KEY,
data_balance integer,
plan_period smallint,
active_status boolean
);

Here is what a portion of the table might look like after inserting values:

SELECT
*
FROM
users_optimized
Output
 user_id | data_balance | plan_period | active_status
---------+--------------+-------------+---------------
0 | 730878 | 267 | f
1 | 410081 | 329 | t
2 | 207715 | 181 | t
3 | 332717 | 360 | f
4 | 967756 | 313 | t
.
.
.
1999995 | 692890 | 363 | t
1999996 | 878947 | 338 | f
1999997 | 525196 | 325 | f
1999998 | 170783 | 344 | t
1999999 | 97850 | 201 | t

Now, let’s see how much space the table takes with the following command:

SELECT pg_size_pretty(pg_relation_size(users_optimized));
Output
 pg_size_pretty
----------------
84 MB
(1 row)

Awesome! We have saved almost 16 MB of space (which is also 16% in our example), just by changing the order of the columns.

Let’s understand how that happened:

The user_id occupies the first 8 bytes. Next, data_balance is an integer that takes up 4 bytes, followed by the plan_period column which has a size of only 2 bytes, followed by 1 byte of active_status and then 1 byte of padding for the end of the row.

In this case, each tuple in the table takes 16 bytes of total memory without the header, and only 1 byte of padding is used, in contrast to 24 bytes of total memory with 9 bytes of total padding involved in our previous example. Here is the data:

ColumnTypeSizePadding
user_idbigint8 bytesNone
data_balanceinteger4 bytesNone
plan_periodsmallint2 bytesNone
active_statusboolean1 byte1 byte

example2

Determining optimal data alignment for a table is easy with Atlas CI

As we have observed in the previous section, creating a table with non-optimal data alignment can cost more space, and thus more money. At the same time, calculating the disk space and determining the best arrangement of columns manually can be confusing and error-prone.

With the help of Atlas CI, you can now easily check whether your table alignment is optimal. If the alignment is not optimal, Atlas CI will suggest the best possible order of columns to make the alignment optimal and thus help you save maximum space (and money!).

info

Teams using GitHub that wish to ensure all changes to their database schema are safe can use the Atlas GitHub Action. If you’re just getting started, you can visit the documentation to set-up Atlas CI GitHub action

Let’s create a migration file in Atlas format with the following SQL command:

CREATE TABLE users (
user_id bigint PRIMARY KEY,
active_status boolean,
data_balance integer,
plan_period smallint
);

Once the Atlas CI check is completed, you should see the following code suggestion in your pull request:

atlasci

Awesome! The Atlas CI check has analyzed the migration file and suggested the best possible arrangement of columns in order to optimize the space taken by padding per row.

info

Atlas CI provides different Analyzer implementations that are useful for determining the safety of migration scripts. To learn more about other checks currently supported by Atlas CI, visit the documentation about analyzers here.

Conclusion

In this section, we learned about how we can reduce disk space costs without removing any data, just by arranging the table columns in an efficient order to reduce padding space.

Additionally, we have also learned how using Atlas CI can help us determine the best optimal order of columns in a table, removing the hassle of manually calculating the disk cost before and after rearranging columns.

Need More Help?​

Join the Ariga Discord Server for early access to features and the ability to provide exclusive feedback that improves your Database Management Tooling.

Sign up to our newsletter to stay up to date about Atlas, our OSS database schema management tool, and our cloud platform Atlas Cloud.