Skip to content

Commit 1a69b20

Browse files
authored
Merge pull request HarshCasper#413 from sameersrivastava13/master
solved HarshCasper#392 added : Loading data from Amazon S3 to Redshift
2 parents 6883eeb + f41b740 commit 1a69b20

File tree

2 files changed

+48
-0
lines changed

2 files changed

+48
-0
lines changed

Python/Aws/s3ToRedshift.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Load data from S3 to Redshift
2+
3+
The code example executes the following steps:
4+
5+
1. import modules that are bundled by AWS Glue by default.
6+
2. Define some configuration parameters (e.g., the Redshift hostname RedshiftHOST).
7+
3. Read the S3 bucket and object from the arguments (see getResolvedOptions) handed over when starting the job.
8+
4. Establish a connection to Redshift: connect(...).
9+
5. Increase the statement timeout (see statement_timeout) to one hour.
10+
6. Execute the COPY query to tell Redshift to the object from S3.

Python/Aws/s3ToRedshift.py

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
from pgdb import connect
2+
import os
3+
import sys
4+
from awsglue.utils import getResolvedOptions
5+
6+
# CONFIGURATION
7+
redshiftHost = "xyz.redshift.amazonaws.com"
8+
redshiftPort = "5439"
9+
redshiftDatabase = "mydatabase"
10+
redshiftUser = "myadmin"
11+
redshiftPassword = "XYZ"
12+
redshiftSchema = "myschema"
13+
redshifttable = "mytable"
14+
redshiftColumns = "timestamp,value_a,value_b,value_c"
15+
DELIMITER = "\t"
16+
DATEFORMAT = "YYYY-MM-DD"
17+
18+
# ARGUMENTS
19+
args = getResolvedOptions(sys.argv, ["s3-bucket", "s3-object"])
20+
s3Bucket = args["s3_bucket"]
21+
s3Object = args["s3_object"]
22+
23+
con = connect(
24+
host=redshiftHost + ":" + redshiftPort,
25+
database=redshiftDatabase,
26+
user=redshiftUser,
27+
password=redshiftPassword,
28+
)
29+
cursor = con.cursor()
30+
31+
cursor.execute("set statement_timeout = 360000")
32+
33+
copyQuery = f"COPY {redshiftSchema}.{redshifttable}({redshiftColumns}) from 's3://{s3Bucket}/{s3Object}' iam_role 'arn:aws:iam::111111111111:role/LoadFromS3ToRedshiftJob' delimiter {DELIMITER} DATEFORMAT AS {DATEFORMAT} ROUNDEC TRUNCATECOLUMNS ESCAPE MAXERROR AS 500;"
34+
35+
cursor.execute(copyQuery)
36+
con.commit()
37+
cursor.close()
38+
con.close()

0 commit comments

Comments
 (0)