You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+79-4Lines changed: 79 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ _export:
15
15
repositories:
16
16
- https://jitpack.io
17
17
dependencies:
18
-
- pro.civitaspo:digdag-operator-athena:0.1.5
18
+
- pro.civitaspo:digdag-operator-athena:0.2.0
19
19
athena:
20
20
auth_method: profile
21
21
@@ -80,21 +80,83 @@ Define the below options on properties (which is indicated by `-c`, `--config`).
80
80
- **region**: The AWS region to use for Athena service. (string, optional)
81
81
- **endpoint**: The Amazon Athena endpoint address to use. (string, optional)
82
82
83
+
## Configuration for `athena.add_partition>` operator
84
+
85
+
### Options
86
+
87
+
- **database**: The name of the database. (string, required)
88
+
- **table**: The name of the partitioned table. (string, required)
89
+
- **location**: The location of the partition. If not specified, this operator generates like hive automatically. (string, default: auto generated like the below)
- **partition_kv**: key-value pairs for partitioning (string to string map, required)
92
+
- **save_mode**: The mode to save the partition. (string, default = `"overwrite"`, available values are `"skip_if_exists"`, `"error_if_exists"`, `"overwrite"`)
93
+
- **follow_location**: Skip to add a partition and drop the partition if the location does not exist. (boolean, default: `true`)
94
+
- **catalog_id**: glue data catalog id if you use a catalog different from account/region default catalog. (string, optional)
95
+
96
+
### Output Parameters
97
+
98
+
Nothing
99
+
100
+
## Configuration for `athena.drop_partition>` operator
101
+
102
+
### Options
103
+
104
+
- **database**: The name of the database. (string, required)
105
+
- **table**: The name of the partitioned table. (string, required)
106
+
- **partition_kv**: key-value pairs for partitioning (string to string map, required)
107
+
- **with_location**: Drop the partition with removing objects on S3 (boolean, default: `false`)
108
+
- **ignore_if_not_exist**: Ignore if the partition does not exist. (boolean, default: `true`)
109
+
- **catalog_id**: glue data catalog id if you use a catalog different from account/region default catalog. (string, optional)
110
+
111
+
### Output Parameters
112
+
113
+
Nothing
114
+
115
+
## Configuration for `athena.apas>` operator
116
+
117
+
`apas`means *Add a partition as select* that creates a partition the query result is stored.
118
+
119
+
### Options
120
+
121
+
- **athena.apas>**: The select SQL statements or file location (in local or Amazon S3) to be executed for a new table by [`Create Table As Select`]((https://aws.amazon.com/jp/about-aws/whats-new/2018/10/athena_ctas_support/)). You can use digdag's template engine like `${...}` in the SQL query. (string, required)
122
+
- **database**: The name of the database that has the partitioned table. (string, required)
123
+
- **table**: The name of the partitioned table. (string, required)
124
+
- **workgroup**: The name of the workgroup in which the query is being started. (string, optional)
125
+
- **partition_kv**: key-value pairs for partitioning (string to string map, required)
126
+
- **location**: The location of the partition. If not specified, this operator generates like hive automatically. (string, default: auto generated like the below)
- **save_mode**: Specify the expected behavior. Available values are `"skip_if_exists"`, `"error_if_exists"`, `"ignore"`, `"overwrite"`. See the below explanation of the behaviour. (string, default: `"overwrite"`)
129
+
- `"skip_if_exists"`: Skip processing if the partition or the location exists.
130
+
- `"error_if_exists"`: Raise error if the partition or the location exists.
131
+
- `"overwrite"`: Always recreate the partition and the location if exists. This operation is not atomic.
132
+
- **bucketed_by**: An array list of buckets to bucket data. If omitted, Athena does not bucket your data in this query. (array of string, optional)
133
+
- **bucket_count**: The number of buckets for bucketing your data. If omitted, Athena does not bucket your data. (integer, optional)
134
+
- **additional_properties**: Additional properties for CTAS that is used `athena.apas>` internally. These are used for CTAS WITH clause without escaping. (string to string map, optional)
135
+
- **ignore_schema_diff**: Ignore if the schema of the query result is different from tha table. (boolean, default: `false`)
136
+
- **token_prefix**: Prefix for `ClientRequestToken` that a unique case-sensitive string used to ensure the request to create the query is idempotent (executes only once). On this plugin, the token is composed like `${token_prefix}-${session_uuid}-${hash value of query}-${radom string}`. (string, default: `"digdag-athena-apas"`)
- **catalog_id**: glue data catalog id if you use a catalog different from account/region default catalog. (string, optional)
139
+
140
+
### Output Parameters
141
+
142
+
Nothing
143
+
83
144
## Configuration for `athena.query>` operator
84
145
85
146
### Options
86
147
87
148
- **athena.query>**: The SQL query statements or file location (in local or Amazon S3) to be executed. You can use digdag's template engine like `${...}` in the SQL query. (string, required)
88
149
- **token_prefix**: Prefix for `ClientRequestToken` that a unique case-sensitive string used to ensure the request to create the query is idempotent (executes only once). On this plugin, the token is composed like `${token_prefix}-${session_uuid}-${hash value of query}-${random string}`. (string, default: `"digdag-athena"`)
89
150
- **database**: The name of the database. (string, optional)
90
-
- **output**: The location in Amazon S3 where your query results are stored, such as `"s3://path/to/query/"`. For more information, see [Queries and Query Result Files](https://docs.aws.amazon.com/athena/latest/ug/querying.html). (string, default: `"s3://aws-athena-query-results-${AWS_ACCOUNT_ID}-<AWS_REGION>"`)
151
+
- **workgroup**: The name of the workgroup in which the query is being started. (string, optional)
- **preview**: Call `athena.preview>` operator after run `athena.query>`. (boolean, default: `true`)
93
154
94
155
### Output Parameters
95
156
96
157
- **athena.last_query.id**: The unique identifier for each query execution. (string)
97
158
- **athena.last_query.database**: The name of the database. (string)
159
+
- **athena.last_query.workgroup**: The name of the workgroup in which the query is being started. (string)
98
160
- **athena.last_query.query**: The SQL query statements which the query execution ran. (string)
99
161
- **athena.last_query.output**: The location in Amazon S3 where your query results are stored. (string)
100
162
- **athena.last_query.scan_bytes**: The number of bytes in the data that was queried. (long)
@@ -131,9 +193,10 @@ Define the below options on properties (which is indicated by `-c`, `--config`).
131
193
132
194
### Options
133
195
134
-
- **select_query**: The select SQL statements or file location (in local or Amazon S3) to be executed for a new table by [`Create Table As Select`]((https://aws.amazon.com/jp/about-aws/whats-new/2018/10/athena_ctas_support/)). You can use digdag's template engine like `${...}` in the SQL query. (string, required)
196
+
- **athena.ctas>**: The select SQL statements or file location (in local or Amazon S3) to be executed for a new table by [`Create Table As Select`]((https://aws.amazon.com/jp/about-aws/whats-new/2018/10/athena_ctas_support/)). You can use digdag's template engine like `${...}` in the SQL query. (string, required)
135
197
- **database**: The database name for query execution context. (string, optional)
136
198
- **table**: The table name for the new table (string, default: `digdag_athena_ctas_${session_uuid.replaceAll("-", "")}_${random}`)
199
+
- **workgroup**: The name of the workgroup in which the query is being started. (string, optional)
137
200
- **output**: Output location for data created by CTAS (string, default: `"s3://aws-athena-query-results-${AWS_ACCOUNT_ID}-<AWS_REGION>/Unsaved/${YEAR}/${MONTH}/${DAY}/${athena_query_id}/"`)
138
201
- **format**: The data format for the CTAS query results, such as `"orc"`, `"parquet"`, `"avro"`, `"json"`, or `"textfile"`. (string, default: `"parquet"`)
139
202
- **compression**: The compression type to use for `"orc"` or `"parquet"`. (string, default: `"snappy"`)
@@ -144,7 +207,7 @@ Define the below options on properties (which is indicated by `-c`, `--config`).
144
207
- **additional_properties**: Additional properties for CTAS. These are used for CTAS WITH clause without escaping. (string to string map, optional)
145
208
- **table_mode**: Specify the expected behavior of CTAS results. Available values are `"default"`, `"empty"`, `"data_only"`. See the below explanation of the behaviour. (string, default: `"default"`)
146
209
- `"default"`: Do not do any care. This option require the least IAM privileges for digdag, but the behaviour depends on Athena.
147
-
- `"empty_table"`: Create a new empty table with the same schema as the select query results.
210
+
- `"empty"`: Create a new empty table with the same schema as the select query results.
148
211
- `"data_only"`: Create a new table with data by CTAS, but drop this after CTAS execution. The table created by CTAS is an external table, so the data is left even if the table is dropped.
149
212
- **save_mode**: Specify the expected behavior of CTAS. Available values are `"none"`, `"error_if_exists"`, `"ignore"`, `"overwrite"`. See the below explanation of the behaviour. (string, default: `"overwrite"`)
150
213
- `"none"`: Do not do any care. This option require the least IAM privileges for digdag, but the behaviour depends on Athena.
@@ -158,6 +221,18 @@ Define the below options on properties (which is indicated by `-c`, `--config`).
158
221
159
222
Nothing
160
223
224
+
## Configuration for `athena.drop_table>` operator
225
+
226
+
- **database**: The name of the database. (string, required)
227
+
- **table**: The name of the partitioned table. (string, required)
228
+
- **with_location**: Drop the partition with removing objects on S3 (boolean, default: `false`)
229
+
- **ignore_if_not_exist**: Ignore if the partition does not exist. (boolean, default: `true`)
230
+
- **catalog_id**: glue data catalog id if you use a catalog different from account/region default catalog. (string, optional)
0 commit comments