Loading data from S3 to AWS RDS for PostgreSQL

AWS RDS for PostgreSQL comes with an extension that allows you to fetch data from AWS S3 and to write back data to AWS S3. The use case for this is obvious: Either you use other AWS services that write data to S3 and you want to further process that data in PostgreSQL, or you want other AWS services to consume data from PostgreSQL by providing that data in S3. Let’s have a look at how that works.

The extension AWS is providing for working with S3 from inside PostgreSQL is called “aws_s3”:

postgres=> select * from pg_available_extensions where name like '%aws%';
    name     | default_version | installed_version |                   comment                   
-------------+-----------------+-------------------+---------------------------------------------
 aws_commons | 1.0             |                   | Common data types across AWS services
 aws_s3      | 1.0             |                   | AWS S3 extension for importing data from S3
(2 rows)

If you try to install the extension you’ll notice that there is a dependency on the “aws_commons” extension:

postgres=> create extension aws_s3;
ERROR:  required extension "aws_commons" is not installed
HINT:  Use CREATE EXTENSION ... CASCADE to install required extensions too.

You can install both extensions in one step using the “CASCADE” option:

postgres=> create extension aws_s3 cascade;
NOTICE:  installing required extension "aws_commons"
CREATE EXTENSION

These extensions provide a couple of helper functions (aws_commons) and the function to import a file from S3 (aws_s3):

postgres=> \dx+ aws_commons
             Objects in extension "aws_commons"
                     Object description                      
-------------------------------------------------------------
 function aws_commons.create_aws_credentials(text,text,text)
 function aws_commons.create_s3_uri(text,text,text)
 schema aws_commons
 type aws_commons._aws_credentials_1
 type aws_commons._s3_uri_1
(5 rows)

postgres=> \dx+ aws_s3
                                       Objects in extension "aws_s3"
                                            Object description                                             
-----------------------------------------------------------------------------------------------------------
 function aws_s3.table_import_from_s3(text,text,text,aws_commons._s3_uri_1,aws_commons._aws_credentials_1)
 function aws_s3.table_import_from_s3(text,text,text,text,text,text,text,text,text)
 schema aws_s3
(3 rows)

Having the extension ready we need a file we can import, so lets create one (exactly the same file as in the previous post, but a bit less rows):

dwe@dwe:~/Downloads$ cat gen_data.sh 
#!/bin/bash
 
FILE="/home/dwe/Downloads/sample.csv"
rm -rf ${FILE}
 
for i in {1..1000000}; do
    echo "${i},firstname${i},lastname${i},xxx${i}@xxx.com,street${i},country${i},description${i}" >> ${FILE}
done

dwe@dwe:~/Downloads$ chmod +x gen_data.sh 
dwe@dwe:~/Downloads$ ./gen_data.sh 
dwe@dwe:~/Downloads$ head -5 sample.csv 
1,firstname1,lastname1,xxx1@xxx.com,street1,country1,description1
2,firstname2,lastname2,xxx2@xxx.com,street2,country2,description2
3,firstname3,lastname3,xxx3@xxx.com,street3,country3,description3
4,firstname4,lastname4,xxx4@xxx.com,street4,country4,description4
5,firstname5,lastname5,xxx5@xxx.com,street5,country5,description5
dwe@dwe:~/Downloads$ ls -lha sample.csv 
-rw-rw-r-- 1 dwe dwe 96M Nov 10 11:11 sample.csv

We’ll be using a new bucket for this demo, so lets create one and then upload the file we just generated:

dwe@dwe:~/Downloads$ aws s3 mb s3://s3-rds-demo --region eu-central-1
make_bucket: s3-rds-demo
dwe@dwe:~/Downloads$ aws s3 cp sample.csv s3://s3-rds-demo/
upload: ./sample.csv to s3://s3-rds-demo/sample.csv

Before we can do anything against S3 from RDS for PostgreSQL we need to setup the required permissions. You can use security credentials for this, but it is recommended to use IAM roles and policies. The first step is to create a policy that allows listing the bucket, read and write (write is required for writing data to S3 later on):

dwe@dwe:~$ aws iam create-policy \
>    --policy-name rds-s3-policy \
>    --policy-document '{
>      "Version": "2012-10-17",
>      "Statement": [
>        {
>          "Sid": "s3import",
>          "Action": [
>            "s3:GetObject",
>            "s3:ListBucket",
>            "S3:PutObject"
>          ],
>          "Effect": "Allow",
>          "Resource": [
>            "arn:aws:s3:::s3-rds-demo", 
>            "arn:aws:s3:::s3-rds-demo/*"
>          ] 
>        }
>      ] 
>    }' 
{
    "Policy": {
        "PolicyName": "rds-s3-policy",
        "PolicyId": "ANPA2U57KX3NFH4HU4COG",
        "Arn": "arn:aws:iam::xxxxxxxx:policy/rds-s3-policy",
        "Path": "/",
        "DefaultVersionId": "v1",
        "AttachmentCount": 0,
        "PermissionsBoundaryUsageCount": 0,
        "IsAttachable": true,
        "CreateDate": "2020-11-10T12:04:34+00:00",
        "UpdateDate": "2020-11-10T12:04:34+00:00"
    }
}

Once the policy is in place we create an IAM role which gets the policy just created attached to:

dwe@dwe:~$ aws iam create-role \
>    --role-name rds-s3-role \
>    --assume-role-policy-document '{
>      "Version": "2012-10-17",
>      "Statement": [
>        {
>          "Effect": "Allow",
>          "Principal": {
>             "Service": "rds.amazonaws.com"
>           },
>          "Action": "sts:AssumeRole"
>        }
>      ] 
>    }'
{
    "Role": {
        "Path": "/",
        "RoleName": "rds-s3-role",
        "RoleId": "AROA2U57KX3NP2XWVCELI",
        "Arn": "arn:aws:iam::xxxxxxxxxx:role/rds-s3-role",
        "CreateDate": "2020-11-10T12:07:20+00:00",
        "AssumeRolePolicyDocument": {
            "Version": "2012-10-17",
            "Statement": [
                {
                    "Effect": "Allow",
                    "Principal": {
                        "Service": "rds.amazonaws.com"
                    },
                    "Action": "sts:AssumeRole"
                }
            ]
        }
    }
}

Attaching the policy to the role (you will need the ARN of the policy from above):

dwe@dwe:~$ aws iam attach-role-policy \
>    --policy-arn arn:aws:iam::xxxxxxxxxx:policy/rds-s3-policy \
>    --role-name rds-s3-role

Finally you need to attach the IAM role to the RDS instance by providing the ARN of the role and the identifier of your RDS instance:

aws rds add-role-to-db-instance \
   --db-instance-identifier dwe-postgres-helvetia \
   --feature-name s3Import \
   --role-arn arn:aws:iam::xxxxxxxx:role/rds-s3-role   \
   --region eu-central-1

Your RDS instance needs to be running to do that, otherwise you’ll get this:

An error occurred (InvalidDBInstanceState) when calling the AddRoleToDBInstance operation: The status for the dwe-postgres DB instance is stopped. The DB instance is not available for s3Import feature.

Having the IAM policy attached to the RDS instance we can load the csv, but first the s3 URI needs to be defined (we do not want to use access keys and credentials):

postgres=> SELECT aws_commons.create_s3_uri('s3-rds-demo'
postgres(>                                 ,'sample.csv'
postgres(>                                 ,'eu-central-1'
postgres(>                                 ) AS s3_uri \gset
postgres=> select :'s3_uri';
               ?column?                
---------------------------------------
 (s3-rds-demo,sample.csv,eu-central-1)
(1 row)

No we are ready to load the file:

postgres=> create table sample ( id int primary key
postgres(>                              , firstname varchar(20)
postgres(>                              , lastname varchar(20)
postgres(>                              , email varchar(20)
postgres(>                              , street varchar(20)
postgres(>                              , country varchar(20)
postgres(>                              , description varchar(20)
postgres(>                              );
CREATE TABLE
postgres=> SELECT aws_s3.table_import_from_s3 ( 'sample'
                                   , ''
                                   , '(format csv)'
                                   , :'s3_uri'
                                   );
                                 table_import_from_s3                                 
--------------------------------------------------------------------------------------
 1000000 rows imported into relation "sample" from file sample.csv of 100222272 bytes
(1 row)
postgres=> select * from sample limit 5;
 id |  firstname  |  lastname  |     email     |  street  |  country  |  description  
----+-------------+------------+---------------+----------+-----------+---------------
 77 | firstname77 | lastname77 | xxx77@xxx.com | street77 | country77 | description77
 78 | firstname78 | lastname78 | xxx78@xxx.com | street78 | country78 | description78
 79 | firstname79 | lastname79 | xxx79@xxx.com | street79 | country79 | description79
  1 | firstname1  | lastname1  | xxx1@xxx.com  | street1  | country1  | description1
  2 | firstname2  | lastname2  | xxx2@xxx.com  | street2  | country2  | description2
(5 rows)

And we’re done. The follow up post will show the opposite: Writing back to to S3 from RDS for PostgreSQL.

Cet article Loading data from S3 to AWS RDS for PostgreSQL est apparu en premier sur Blog dbi services.

Loading data from S3 to AWS RDS for PostgreSQL

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List