PostgreSQL is already very fast with loading loading large amounts of data. You can follow this post to get some recommendations for loading data as fast as possible. In addition you can create unlogged tables, but this on the table level and not the whole cluster. With this patch there will be another option: wal_level=none. With this, only minimal WAL is written, but of course at the cost of losing durability. If the cluster crashes in that mode, the whole cluster is corrupted and can not anymore be started. If you accept that risks, this can be something for you, especially when do data warehousing and load time is one of the most important factors.
To have a baseline to start with lets create a simple file we’ll use for loading a table:
postgres=# create table t1 ( a int, b text, c date ); CREATE TABLE postgres=# insert into t1 select x, md5(x::text), now() from generate_series(1,6000000) x; INSERT 0 6000000 postgres=# copy t1 to '/var/tmp/demo.txt'; COPY 6000000 postgres=# \! ls -lha /var/tmp/demo.txt -rw-r--r--. 1 postgres postgres 297M Nov 23 15:51 /var/tmp/demo.txt postgres=#
My current wal_level is replica, so lets change that to minimal:
postgres=# alter system set wal_level = minimal; ALTER SYSTEM postgres=# \! pg_ctl restart -m fast waiting for server to shut down.... done server stopped waiting for server to start....2020-11-23 15:53:23.424 CET - 1 - 209537 - - @ LOG: redirecting log output to logging collector process 2020-11-23 15:53:23.424 CET - 2 - 209537 - - @ HINT: Future log output will appear in directory "pg_log". done server started postgres=# show wal_level; FATAL: terminating connection due to administrator command server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. postgres=# show wal_level; wal_level ----------- minimal (1 row)
How long does it take to load that file into a new table with wal_level=minimal and how much WAL was generated?
postgres=# create table t2 ( like t1 ); CREATE TABLE postgres=# \timing Timing is on. postgres=# select pg_current_wal_lsn(); pg_current_wal_lsn -------------------- 0/39872628 (1 row) Time: 0.757 ms postgres=# copy t2 from '/var/tmp/demo.txt'; COPY 6000000 Time: 10008.335 ms (00:10.008) postgres=# select pg_current_wal_lsn(); pg_current_wal_lsn -------------------- 0/4C693DD8 (1 row) Time: 0.857 ms postgres=# select pg_wal_lsn_diff('0/4C693DD8','0/39872628'); pg_wal_lsn_diff ----------------- 316807088 (1 row) Time: 2.714 ms
The time is around 10 second and we have generated around 316MB of WAL. How does that change if we go with wal_level=none?
ALTER SYSTEM Time: 28.625 ms postgres=# \! pg_ctl restart -m fast waiting for server to shut down.... done server stopped waiting for server to start....2020-11-23 16:00:25.648 CET - 1 - 209599 - - @ LOG: redirecting log output to logging collector process 2020-11-23 16:00:25.648 CET - 2 - 209599 - - @ HINT: Future log output will appear in directory "pg_log". done server started postgres=# show wal_level; FATAL: terminating connection due to administrator command server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. Time: 21.286 ms postgres=# show wal_level; wal_level ----------- none (1 row) Time: 1.251 ms
Same test as above:
postgres=# create table t3 ( like t1 ); CREATE TABLE Time: 44.676 ms postgres=# select pg_current_wal_lsn(); pg_current_wal_lsn -------------------- 0/4CA0A550 (1 row) Time: 7.053 ms postgres=# copy t3 from '/var/tmp/demo.txt'; COPY 6000000 Time: 7968.204 ms (00:07.968) postgres=# select pg_current_wal_lsn(); pg_current_wal_lsn -------------------- 0/4CA0A550 (1 row) Time: 0.948 ms postgres=# select pg_wal_lsn_diff('0/4CA0A550','0/4CA0A550'); pg_wal_lsn_diff ----------------- 0 (1 row) Time: 3.857 ms
We come down to 7 seconds and no WAL generated at all. That means faster loading and no space consumption in the pg_wal directory. Really nice, but be aware that the cluster gets corrupted when it crashes during loading:
postgres=# \! ps -ef | grep "postgres -D" postgres 209599 1 0 16:00 ? 00:00:00 /u01/app/postgres/product/DEV/db_1/bin/postgres -D /u02/pgdata/DEV postgres 209644 209534 0 16:04 pts/1 00:00:00 sh -c ps -ef | grep "postgres -D" postgres 209646 209644 0 16:04 pts/1 00:00:00 grep postgres -D postgres=# create table t4 ( like t1 ); CREATE TABLE Time: 3.731 ms postgres=# copy t4 from '/var/tmp/demo.txt'; COPY 6000000 Time: 8070.995 ms (00:08.071)
In another session kill the postmaster while the load is running:
postgres@centos8pg:/home/postgres/ [pgdev] kill -9 209599
If you try to restart the cluster afterwards this is the result:
2020-11-23 16:05:17.441 CET - 1 - 210089 - - @ LOG: database system was interrupted; last known up at 2020-11-23 16:00:25 CET 2020-11-23 16:05:17.441 CET - 2 - 210089 - - @ FATAL: detected an unexpected server shutdown when WAL logging was disabled 2020-11-23 16:05:17.441 CET - 3 - 210089 - - @ HINT: It looks like you need to deploy a new cluster from your full backup again. 2020-11-23 16:05:17.444 CET - 7 - 210087 - - @ LOG: startup process (PID 210089) exited with exit code 1 2020-11-23 16:05:17.444 CET - 8 - 210087 - - @ LOG: aborting startup due to startup process failure 2020-11-23 16:05:17.449 CET - 9 - 210087 - - @ LOG: database system is shut dow
If you can accept that, this can be a huge step to load data faster than it is possible now.
Cet article Even faster data loading with PostgreSQL 14? wal_level=none est apparu en premier sur Blog dbi services.