@@ -7,7 +7,7 @@ Batch Read Configuration Options
7
7
.. contents:: On this page
8
8
:local:
9
9
:backlinks: none
10
- :depth: 1
10
+ :depth: 2
11
11
:class: singlecol
12
12
13
13
.. facet::
@@ -178,26 +178,82 @@ dividing the data into partitions, you can run transformations in parallel.
178
178
This section contains configuration information for the following
179
179
partitioner:
180
180
181
+ - :ref:`AutoBucketPartitioner <conf-autobucketpartitioner>`
181
182
- :ref:`SamplePartitioner <conf-samplepartitioner>`
182
183
- :ref:`ShardedPartitioner <conf-shardedpartitioner>`
183
184
- :ref:`PaginateBySizePartitioner <conf-paginatebysizepartitioner>`
184
185
- :ref:`PaginateIntoPartitionsPartitioner <conf-paginateintopartitionspartitioner>`
185
186
- :ref:`SinglePartitionPartitioner <conf-singlepartitionpartitioner>`
186
- - :ref:`AutoBucketPartitioner <conf-autobucketpartitioner>`
187
187
188
188
.. note:: Batch Reads Only
189
189
190
190
Because the data-stream-processing engine produces a single data stream,
191
191
partitioners do not affect streaming reads.
192
192
193
+ .. _conf-autobucketpartitioner:
194
+
195
+ AutoBucketPartitioner Configuration (default)
196
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
197
+
198
+ The ``AutoBucketPartitioner`` is the default partitioner configuration. It
199
+ samples the data to generate partitions and uses
200
+ the :manual:`$bucketAuto </reference/operator/aggregation/bucketAuto/>`
201
+ aggregation stage to paginate. By using this configuration, you can partition
202
+ the data across single or multiple fields, including nested fields.
203
+
204
+ .. note:: Compound Keys
205
+
206
+ The ``AutoBucketPartitioner`` configuration requires {+mdb-server+} version
207
+ 7.0 or higher to support compound keys.
208
+
209
+ To use this configuration, set the ``partitioner`` configuration option to
210
+ ``com.mongodb.spark.sql.connector.read.partitioner.AutoBucketPartitioner``.
211
+
212
+ .. list-table::
213
+ :header-rows: 1
214
+ :widths: 35 65
215
+
216
+ * - Property name
217
+ - Description
218
+
219
+ * - ``partitioner.options.partition.fieldList``
220
+ - The list of fields to use for partitioning. The value can be either a single field
221
+ name or a list of comma-separated fields.
222
+
223
+ **Default:** ``_id``
224
+
225
+ * - ``partitioner.options.partition.chunkSize``
226
+ - The average size (MB) for each partition. Smaller partition sizes
227
+ create more partitions containing fewer documents.
228
+ Because this configuration uses the average document size to determine the number of
229
+ documents per partition, partitions might not be the same size.
230
+
231
+ **Default:** ``64``
232
+
233
+ * - ``partitioner.options.partition.samplesPerPartition``
234
+ - The number of samples to take per partition.
235
+
236
+ **Default:** ``100``
237
+
238
+ * - ``partitioner.options.partition.partitionKeyProjectionField``
239
+ - The field name to use for a projected field that contains all the
240
+ fields used to partition the collection.
241
+ We recommend changing the value of this property only if each document already
242
+ contains the ``__idx`` field.
243
+
244
+ **Default:** ``__idx``
245
+
193
246
.. _conf-mongosamplepartitioner:
194
247
.. _conf-samplepartitioner:
195
248
196
- `` SamplePartitioner`` Configuration
197
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
249
+ SamplePartitioner Configuration
250
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
198
251
199
- ``SamplePartitioner`` is the default partitioner configuration. This configuration
200
- lets you specify a partition field, partition size, and number of samples per partition.
252
+ The ``SamplePartitioner`` configuration is similar to the
253
+ :ref:`AutoBucketPartitioner <conf-autobucketpartitioner>` configuration, but
254
+ does not use the ``$bucketAuto`` aggregation stage. This
255
+ configuration lets you specify a partition field, partition size, and number of
256
+ samples per partition.
201
257
202
258
To use this configuration, set the ``partitioner`` configuration option to
203
259
``com.mongodb.spark.sql.connector.read.partitioner.SamplePartitioner``.
@@ -243,8 +299,8 @@ To use this configuration, set the ``partitioner`` configuration option to
243
299
.. _conf-mongoshardedpartitioner:
244
300
.. _conf-shardedpartitioner:
245
301
246
- `` ShardedPartitioner`` Configuration
247
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
302
+ ShardedPartitioner Configuration
303
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
248
304
249
305
The ``ShardedPartitioner`` configuration automatically partitions the data
250
306
based on your shard configuration.
@@ -262,8 +318,8 @@ To use this configuration, set the ``partitioner`` configuration option to
262
318
.. _conf-mongopaginatebysizepartitioner:
263
319
.. _conf-paginatebysizepartitioner:
264
320
265
- `` PaginateBySizePartitioner`` Configuration
266
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
321
+ PaginateBySizePartitioner Configuration
322
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
267
323
268
324
The ``PaginateBySizePartitioner`` configuration paginates the data by using the
269
325
average document size to split the collection into average-sized chunks.
@@ -292,8 +348,8 @@ To use this configuration, set the ``partitioner`` configuration option to
292
348
293
349
.. _conf-paginateintopartitionspartitioner:
294
350
295
- `` PaginateIntoPartitionsPartitioner`` Configuration
296
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
351
+ PaginateIntoPartitionsPartitioner Configuration
352
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
297
353
298
354
The ``PaginateIntoPartitionsPartitioner`` configuration paginates the data by dividing
299
355
the count of documents in the collection by the maximum number of allowable partitions.
@@ -320,63 +376,15 @@ To use this configuration, set the ``partitioner`` configuration option to
320
376
321
377
.. _conf-singlepartitionpartitioner:
322
378
323
- `` SinglePartitionPartitioner`` Configuration
324
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
379
+ SinglePartitionPartitioner Configuration
380
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
325
381
326
382
The ``SinglePartitionPartitioner`` configuration creates a single partition.
327
383
328
384
To use this configuration, set the ``partitioner`` configuration option to
329
385
``com.mongodb.spark.sql.connector.read.partitioner.SinglePartitionPartitioner``.
330
386
331
- .. _conf-autobucketpartitioner:
332
-
333
- ``AutoBucketPartitioner`` Configuration
334
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
335
-
336
- The ``AutoBucketPartitioner`` configuration is similar to the
337
- :ref:`SamplePartitioner <conf-samplepartitioner>`
338
- configuration, but uses the :manual:`$bucketAuto </reference/operator/aggregation/bucketAuto/>`
339
- aggregation stage to paginate the data. By using this configuration,
340
- you can partition the data across single or multiple fields, including nested fields.
341
-
342
- To use this configuration, set the ``partitioner`` configuration option to
343
- ``com.mongodb.spark.sql.connector.read.partitioner.AutoBucketPartitioner``.
344
-
345
- .. list-table::
346
- :header-rows: 1
347
- :widths: 35 65
348
-
349
- * - Property name
350
- - Description
351
-
352
- * - ``partitioner.options.partition.fieldList``
353
- - The list of fields to use for partitioning. The value can be either a single field
354
- name or a list of comma-separated fields.
355
-
356
- **Default:** ``_id``
357
-
358
- * - ``partitioner.options.partition.chunkSize``
359
- - The average size (MB) for each partition. Smaller partition sizes
360
- create more partitions containing fewer documents.
361
- Because this configuration uses the average document size to determine the number of
362
- documents per partition, partitions might not be the same size.
363
-
364
- **Default:** ``64``
365
-
366
- * - ``partitioner.options.partition.samplesPerPartition``
367
- - The number of samples to take per partition.
368
-
369
- **Default:** ``100``
370
-
371
- * - ``partitioner.options.partition.partitionKeyProjectionField``
372
- - The field name to use for a projected field that contains all the
373
- fields used to partition the collection.
374
- We recommend changing the value of this property only if each document already
375
- contains the ``__idx`` field.
376
-
377
- **Default:** ``__idx``
378
-
379
- Specifying Properties in ``connection.uri``
380
- -------------------------------------------
387
+ Specifying Properties in connection.uri
388
+ ---------------------------------------
381
389
382
390
.. include:: /includes/connection-read-config.rst
0 commit comments