AWS RDS for PostgreSQL comes with an extension that allows you to fetch data from AWS S3 and to write back data to AWS S3. The use case for this is obvious: Either you use other AWS services that write data to S3 and you want to further process that data in PostgreSQL, or you want other AWS services to consume data from PostgreSQL by providing that data in S3. Let’s have a look at how that works.
The extension AWS is providing for working with S3 from inside PostgreSQL is called “aws_s3”:
postgres=> select * from pg_available_extensions where name like '%aws%'; name | default_version | installed_version | comment -------------+-----------------+-------------------+--------------------------------------------- aws_commons | 1.0 | | Common data types across AWS services aws_s3 | 1.0 | | AWS S3 extension for importing data from S3 (2 rows)
If you try to install the extension you’ll notice that there is a dependency on the “aws_commons” extension:
postgres=> create extension aws_s3; ERROR: required extension "aws_commons" is not installed HINT: Use CREATE EXTENSION ... CASCADE to install required extensions too.
You can install both extensions in one step using the “CASCADE” option:
postgres=> create extension aws_s3 cascade; NOTICE: installing required extension "aws_commons" CREATE EXTENSION
These extensions provide a couple of helper functions (aws_commons) and the function to import a file from S3 (aws_s3):
postgres=> \dx+ aws_commons Objects in extension "aws_commons" Object description ------------------------------------------------------------- function aws_commons.create_aws_credentials(text,text,text) function aws_commons.create_s3_uri(text,text,text) schema aws_commons type aws_commons._aws_credentials_1 type aws_commons._s3_uri_1 (5 rows) postgres=> \dx+ aws_s3 Objects in extension "aws_s3" Object description ----------------------------------------------------------------------------------------------------------- function aws_s3.table_import_from_s3(text,text,text,aws_commons._s3_uri_1,aws_commons._aws_credentials_1) function aws_s3.table_import_from_s3(text,text,text,text,text,text,text,text,text) schema aws_s3 (3 rows)
Having the extension ready we need a file we can import, so lets create one (exactly the same file as in the previous post, but a bit less rows):
dwe@dwe:~/Downloads$ cat gen_data.sh #!/bin/bash FILE="/home/dwe/Downloads/sample.csv" rm -rf ${FILE} for i in {1..1000000}; do echo "${i},firstname${i},lastname${i},xxx${i}@xxx.com,street${i},country${i},description${i}" >> ${FILE} done dwe@dwe:~/Downloads$ chmod +x gen_data.sh dwe@dwe:~/Downloads$ ./gen_data.sh dwe@dwe:~/Downloads$ head -5 sample.csv 1,firstname1,lastname1,xxx1@xxx.com,street1,country1,description1 2,firstname2,lastname2,xxx2@xxx.com,street2,country2,description2 3,firstname3,lastname3,xxx3@xxx.com,street3,country3,description3 4,firstname4,lastname4,xxx4@xxx.com,street4,country4,description4 5,firstname5,lastname5,xxx5@xxx.com,street5,country5,description5 dwe@dwe:~/Downloads$ ls -lha sample.csv -rw-rw-r-- 1 dwe dwe 96M Nov 10 11:11 sample.csv
We’ll be using a new bucket for this demo, so lets create one and then upload the file we just generated:
dwe@dwe:~/Downloads$ aws s3 mb s3://s3-rds-demo --region eu-central-1 make_bucket: s3-rds-demo dwe@dwe:~/Downloads$ aws s3 cp sample.csv s3://s3-rds-demo/ upload: ./sample.csv to s3://s3-rds-demo/sample.csv
Before we can do anything against S3 from RDS for PostgreSQL we need to setup the required permissions. You can use security credentials for this, but it is recommended to use IAM roles and policies. The first step is to create a policy that allows listing the bucket, read and write (write is required for writing data to S3 later on):
dwe@dwe:~$ aws iam create-policy \ > --policy-name rds-s3-policy \ > --policy-document '{ > "Version": "2012-10-17", > "Statement": [ > { > "Sid": "s3import", > "Action": [ > "s3:GetObject", > "s3:ListBucket", > "S3:PutObject" > ], > "Effect": "Allow", > "Resource": [ > "arn:aws:s3:::s3-rds-demo", > "arn:aws:s3:::s3-rds-demo/*" > ] > } > ] > }' { "Policy": { "PolicyName": "rds-s3-policy", "PolicyId": "ANPA2U57KX3NFH4HU4COG", "Arn": "arn:aws:iam::xxxxxxxx:policy/rds-s3-policy", "Path": "/", "DefaultVersionId": "v1", "AttachmentCount": 0, "PermissionsBoundaryUsageCount": 0, "IsAttachable": true, "CreateDate": "2020-11-10T12:04:34+00:00", "UpdateDate": "2020-11-10T12:04:34+00:00" } }
Once the policy is in place we create an IAM role which gets the policy just created attached to:
dwe@dwe:~$ aws iam create-role \ > --role-name rds-s3-role \ > --assume-role-policy-document '{ > "Version": "2012-10-17", > "Statement": [ > { > "Effect": "Allow", > "Principal": { > "Service": "rds.amazonaws.com" > }, > "Action": "sts:AssumeRole" > } > ] > }' { "Role": { "Path": "/", "RoleName": "rds-s3-role", "RoleId": "AROA2U57KX3NP2XWVCELI", "Arn": "arn:aws:iam::xxxxxxxxxx:role/rds-s3-role", "CreateDate": "2020-11-10T12:07:20+00:00", "AssumeRolePolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "rds.amazonaws.com" }, "Action": "sts:AssumeRole" } ] } } }
Attaching the policy to the role (you will need the ARN of the policy from above):
dwe@dwe:~$ aws iam attach-role-policy \ > --policy-arn arn:aws:iam::xxxxxxxxxx:policy/rds-s3-policy \ > --role-name rds-s3-role
Finally you need to attach the IAM role to the RDS instance by providing the ARN of the role and the identifier of your RDS instance:
aws rds add-role-to-db-instance \ --db-instance-identifier dwe-postgres-helvetia \ --feature-name s3Import \ --role-arn arn:aws:iam::xxxxxxxx:role/rds-s3-role \ --region eu-central-1
Your RDS instance needs to be running to do that, otherwise you’ll get this:
An error occurred (InvalidDBInstanceState) when calling the AddRoleToDBInstance operation: The status for the dwe-postgres DB instance is stopped. The DB instance is not available for s3Import feature.
Having the IAM policy attached to the RDS instance we can load the csv, but first the s3 URI needs to be defined (we do not want to use access keys and credentials):
postgres=> SELECT aws_commons.create_s3_uri('s3-rds-demo' postgres(> ,'sample.csv' postgres(> ,'eu-central-1' postgres(> ) AS s3_uri \gset postgres=> select :'s3_uri'; ?column? --------------------------------------- (s3-rds-demo,sample.csv,eu-central-1) (1 row)
No we are ready to load the file:
postgres=> create table sample ( id int primary key postgres(> , firstname varchar(20) postgres(> , lastname varchar(20) postgres(> , email varchar(20) postgres(> , street varchar(20) postgres(> , country varchar(20) postgres(> , description varchar(20) postgres(> ); CREATE TABLE postgres=> SELECT aws_s3.table_import_from_s3 ( 'sample' , '' , '(format csv)' , :'s3_uri' ); table_import_from_s3 -------------------------------------------------------------------------------------- 1000000 rows imported into relation "sample" from file sample.csv of 100222272 bytes (1 row) postgres=> select * from sample limit 5; id | firstname | lastname | email | street | country | description ----+-------------+------------+---------------+----------+-----------+--------------- 77 | firstname77 | lastname77 | xxx77@xxx.com | street77 | country77 | description77 78 | firstname78 | lastname78 | xxx78@xxx.com | street78 | country78 | description78 79 | firstname79 | lastname79 | xxx79@xxx.com | street79 | country79 | description79 1 | firstname1 | lastname1 | xxx1@xxx.com | street1 | country1 | description1 2 | firstname2 | lastname2 | xxx2@xxx.com | street2 | country2 | description2 (5 rows)
And we’re done. The follow up post will show the opposite: Writing back to to S3 from RDS for PostgreSQL.
Cet article Loading data from S3 to AWS RDS for PostgreSQL est apparu en premier sur Blog dbi services.