When I read Franck's post about Variations on 1M rows insert (1): bulk insert I thought doing quite the same in PostgreSQL might be interesting. Lets start by using the same test tables, one using a primary key and the other one without a primary key:
create table DEMO ("id" int , "text" varchar(15), "number" int); create table DEMO_PK ("id" int , "text" varchar(15), "number" int, constraint demo_pk_pk primary key (id) ) ; postgres=# \d demo* Table "public.demo" Column | Type | Modifiers --------+-----------------------+----------- id | integer | text | character varying(15) | number | integer | Table "public.demo_pk" Column | Type | Modifiers --------+-----------------------+----------- id | integer | not null text | character varying(15) | number | integer | Indexes: "demo_pk_pk" PRIMARY KEY, btree (id) Index "public.demo_pk_pk" Column | Type | Definition --------+---------+------------ id | integer | id primary key, btree, for table "public.demo_pk"
I am using a Virtual Box VM with Oracle Linux 7.1 (64bit) using 512mb of memory and one vCPU for the Tests:
cat /etc/oracle-release Oracle Linux Server release 7.1 cat /proc/meminfo | head -1 MemTotal: 502612 kB cat /proc/cpuinfo | grep proc processor : 0
Similar to Franck's plsql block I'll use a plpgsql for the first tests:
\timing on show autocommit; truncate table DEMO; DO $$DECLARE l_people_array varchar(15)[12] := '{"Marc", "Bill", "George", "Eliot", "Matt", "Trey", "Tracy","Greg", "Steve", "Kristina", "Katie", "Jeff"}'; n int; BEGIN for i in 0..1e6 loop n:=trunc(random()*1000+1); insert into DEMO values( i , l_people_array[floor((random()*11))+1::int] , n ); end loop; END$$;
(The code for the demo_pk table is exactly the same, except for the table name). It does not exactly the same as Frank's plsql does but it does comparable things, I believe :)
The PostgreSQL version I'll use is 9.4.1:
postgres=# select version(); version -------------------------------------------------------------------------------------------------------------------------- PostgreSQL 9.4.1 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit (1 row)
Lets start by executing the test without a primary key and thus no index:
postgres=# \i 1mio_rows_no_pk.sql Timing is on. autocommit ------------ on (1 row) Time: 0.183 ms TRUNCATE TABLE Time: 71.339 ms DO Time: 6861.547 ms
Not so bad. Doing the same with a primary key and thus an index to maintain:
postgres-# \i 1mio_rows_pk.sql Timing is on. autocommit ------------ on (1 row) Time: 0.494 ms TRUNCATE TABLE Time: 37.314 ms DO Time: 10900.631 ms
Around 3 seconds more, but again: Not so bad. As there is no "bulk collect" or "bulk insert" in PostgreSQL I can not do the same tests as Franck. But PostgreSQL knows a copy command, lets look at that. First I'll unload the table to a flat file so I have something to load:
postgres=# copy demo to '/home/postgres/demo.txt'; COPY 1000001 Time: 239.161 ms
Believe it or not: it took 240 ms to unload 1000001 rows from a table to a flat file. Now lets do the opposite and load the data back to the table using the copy command. The script used for doing this looks like this for both variations (with and without primary key):
\timing on show autocommit; truncate table demo; copy DEMO from '/home/postgres/demo.txt';
First one without the primary key:
postgres=# \i 1mio_rows_no_pk_copy.sql Timing is on. autocommit ------------ on (1 row) Time: 0.663 ms TRUNCATE TABLE Time: 32.026 ms COPY 1000001 Time: 1877.480 ms
The second one with the primary key:
postgres=# \i 1mio_rows_pk_copy.sql Timing is on. TRUNCATE TABLE Time: 33.370 ms COPY 1000001 Time: 6227.844 ms
This makes a big difference: 1,8 seconds compared to 6,2 seconds. But still amazingly fast. For those who do know PostgreSQL a little bit and might wonder about fsync:
postgres=# show fsync; fsync ------- on (1 row)
If I turn this off, without the primary key:
postgres=# alter system set fsync='off'; ALTER SYSTEM Time: 62.066 ms postgres=# select pg_reload_conf(); LOG: received SIGHUP, reloading configuration files LOG: parameter "fsync" changed to "off" pg_reload_conf ---------------- t (1 row) Time: 7.253 ms postgres=# show fsync; fsync ------- off (1 row) Time: 0.601 ms postgres=# \i 1mio_rows_no_pk_copy.sql Timing is on. autocommit ------------ on (1 row) Time: 0.277 ms TRUNCATE TABLE Time: 10.064 ms COPY 1000001 Time: 611.455 ms postgres=# \i 1mio_rows_pk_copy.sql Timing is on. TRUNCATE TABLE Time: 11.768 ms COPY 1000001 Time: 4674.273 ms
Around 0,6 seconds without the primary key and around 4 seconds with the primary key. But remember that you might loose data if fsync is off and the server crashed during the load.
Lets wait what Frank will do in the next post of his series.