Skip to content

Commit e27c12e

Browse files
authored
DOCSP-48557 Update Spark streaming write configuration to include all batch options (#261) (#264)
* DOCSP-48557 spark options update * DOCSP-48557 links and reorder * DOCSP-48667 link fix + mdb-server consistency * DOCSP-48557 wording * DOCSP-48557 remove ital * DOCSP-49557 update default w concern (cherry picked from commit 792cdc0)
1 parent d8ac434 commit e27c12e

File tree

2 files changed

+141
-26
lines changed

2 files changed

+141
-26
lines changed

source/batch-mode/batch-write-config.txt

Lines changed: 25 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ You can configure the following properties when writing data to MongoDB in batch
5555
| **Default:** ``com.mongodb.spark.sql.connector.connection.DefaultMongoClientFactory``
5656

5757
* - ``convertJson``
58-
- | Specifies whether the connector parses the string and converts extended JSON
58+
- | Specifies if the connector parses string values and converts extended JSON
5959
into BSON.
6060
|
6161
| This setting accepts the following values:
@@ -82,7 +82,7 @@ You can configure the following properties when writing data to MongoDB in batch
8282
| **Default:** ``false``
8383

8484
* - ``idFieldList``
85-
- | Field or list of fields by which to split the collection data. To
85+
- | Specifies a field or list of fields by which to split the collection data. To
8686
specify more than one field, separate them using a comma as shown
8787
in the following example:
8888

@@ -128,41 +128,43 @@ You can configure the following properties when writing data to MongoDB in batch
128128
| **Default:** ``true``
129129

130130
* - ``upsertDocument``
131-
- | When ``true``, replace and update operations will insert the data
131+
- | When ``true``, replace and update operations insert the data
132132
if no match exists.
133133
|
134134
| For time series collections, you must set ``upsertDocument`` to
135135
``false``.
136136
|
137137
| **Default:** ``true``
138138

139-
* - ``writeConcern.journal``
140-
- | Specifies ``j``, a write-concern option to enable request for
141-
acknowledgment that the data is confirmed on on-disk journal for
142-
the criteria specified in the ``w`` option. You can specify
143-
either ``true`` or ``false``.
144-
|
145-
| For more information on ``j`` values, see the MongoDB server
146-
guide on the
147-
:manual:`WriteConcern j option </reference/write-concern/#j-option>`.
148-
149139
* - ``writeConcern.w``
150-
- | Specifies ``w``, a write-concern option to request acknowledgment
151-
that the write operation has propagated to a specified number of
152-
MongoDB nodes. For a list
153-
of allowed values for this option, see :manual:`WriteConcern
154-
</reference/write-concern/#w-option>` in the MongoDB manual.
140+
- | Specifies ``w``, a write-concern option requesting acknowledgment that
141+
the write operation has propagated to a specified number of MongoDB
142+
nodes.
143+
|
144+
| For a list of allowed values for this option, see :manual:`WriteConcern
145+
w Option </reference/write-concern/#w-option>` in the {+mdb-server+}
146+
manual.
147+
|
148+
| **Default:** ``Acknowledged``
149+
150+
* - ``writeConcern.journal``
151+
- | Specifies ``j``, a write-concern option requesting acknowledgment that
152+
the data has been written to the on-disk journal for the criteria
153+
specified in the ``w`` option. You can specify either ``true`` or
154+
``false``.
155155
|
156-
| **Default:** ``1``
156+
| For more information on ``j`` values, see :manual:`WriteConcern j
157+
Option </reference/write-concern/#j-option>` in the {+mdb-server+}
158+
manual.
157159

158160
* - ``writeConcern.wTimeoutMS``
159161
- | Specifies ``wTimeoutMS``, a write-concern option to return an error
160-
when a write operation exceeds the number of milliseconds. If you
162+
when a write operation exceeds the specified number of milliseconds. If you
161163
use this optional setting, you must specify a nonnegative integer.
162164
|
163-
| For more information on ``wTimeoutMS`` values, see the MongoDB server
164-
guide on the
165-
:manual:`WriteConcern wtimeout option </reference/write-concern/#wtimeout>`.
165+
| For more information on ``wTimeoutMS`` values, see
166+
:manual:`WriteConcern wtimeout </reference/write-concern/#wtimeout>` in
167+
the {+mdb-server+} manual.
166168

167169
Specifying Properties in ``connection.uri``
168170
-------------------------------------------

source/streaming-mode/streaming-write-config.txt

Lines changed: 116 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -53,13 +53,126 @@ You can configure the following properties when writing data to MongoDB in strea
5353
interface.
5454
|
5555
| **Default:** ``com.mongodb.spark.sql.connector.connection.DefaultMongoClientFactory``
56+
57+
* - ``convertJson``
58+
- | Specifies if the connector parses string values and converts extended JSON
59+
into BSON.
60+
|
61+
| This setting accepts the following values:
62+
63+
- ``any``: The connector converts all JSON values to BSON.
64+
65+
- ``"{a: 1}"`` becomes ``{a: 1}``.
66+
- ``"[1, 2, 3]"`` becomes ``[1, 2, 3]``.
67+
- ``"true"`` becomes ``true``.
68+
- ``"01234"`` becomes ``1234``.
69+
- ``"{a:b:c}"`` doesn't change.
70+
71+
- ``objectOrArrayOnly``: The connector converts only JSON objects and arrays to
72+
BSON.
73+
74+
- ``"{a: 1}"`` becomes ``{a: 1}``.
75+
- ``"[1, 2, 3]"`` becomes ``[1, 2, 3]``.
76+
- ``"true"`` doesn't change.
77+
- ``"01234"`` doesn't change.
78+
- ``"{a:b:c}"`` doesn't change.
79+
80+
- ``false``: The connector leaves all values as strings.
81+
82+
| **Default:** ``false``
83+
84+
* - ``idFieldList``
85+
- | Specifies a field or list of fields by which to split the collection data. To
86+
specify more than one field, separate them using a comma as shown
87+
in the following example:
88+
89+
.. code-block:: none
90+
:copyable: false
91+
92+
"fieldName1,fieldName2"
93+
94+
| **Default:** ``_id``
95+
96+
* - ``ignoreNullValues``
97+
- | When ``true``, the connector ignores any ``null`` values when writing,
98+
including ``null`` values in arrays and nested documents.
99+
|
100+
| **Default:** ``false``
101+
102+
* - ``maxBatchSize``
103+
- | Specifies the maximum number of operations to batch in bulk
104+
operations.
105+
|
106+
| **Default:** ``512``
107+
108+
* - ``operationType``
109+
- | Specifies the type of write operation to perform. You can set
110+
this to one of the following values:
111+
112+
- ``insert``: Insert the data.
113+
- ``replace``: Replace an existing document that matches the
114+
``idFieldList`` value with the new data. If no match exists, the
115+
value of ``upsertDocument`` indicates whether the connector
116+
inserts a new document.
117+
- ``update``: Update an existing document that matches the
118+
``idFieldList`` value with the new data. If no match exists, the
119+
value of ``upsertDocument`` indicates whether the connector
120+
inserts a new document.
121+
122+
|
123+
| **Default:** ``replace``
124+
125+
* - ``ordered``
126+
- | Specifies whether to perform ordered bulk operations.
127+
|
128+
| **Default:** ``true``
129+
130+
* - ``upsertDocument``
131+
- | When ``true``, replace and update operations insert the data
132+
if no match exists.
133+
|
134+
| For time series collections, you must set ``upsertDocument`` to
135+
``false``.
136+
|
137+
| **Default:** ``true``
138+
139+
* - ``writeConcern.w``
140+
- | Specifies ``w``, a write-concern option requesting acknowledgment that
141+
the write operation has propagated to a specified number of MongoDB
142+
nodes.
143+
|
144+
| For a list of allowed values for this option, see :manual:`WriteConcern
145+
w Option </reference/write-concern/#w-option>` in the {+mdb-server+}
146+
manual.
147+
|
148+
| **Default:** ``Acknowledged``
149+
150+
* - ``writeConcern.journal``
151+
- | Specifies ``j``, a write-concern option requesting acknowledgment that
152+
the data has been written to the on-disk journal for the criteria
153+
specified in the ``w`` option. You can specify either ``true`` or
154+
``false``.
155+
|
156+
| For more information on ``j`` values, see :manual:`WriteConcern j
157+
Option </reference/write-concern/#j-option>` in the {+mdb-server+}
158+
manual.
159+
160+
* - ``writeConcern.wTimeoutMS``
161+
- | Specifies ``wTimeoutMS``, a write-concern option to return an error
162+
when a write operation exceeds the specified number of milliseconds. If you
163+
use this optional setting, you must specify a nonnegative integer.
164+
|
165+
| For more information on ``wTimeoutMS`` values, see
166+
:manual:`WriteConcern wtimeout </reference/write-concern/#wtimeout>` in
167+
the {+mdb-server+} manual.
56168

57169
* - ``checkpointLocation``
58-
- | The absolute file path of the directory to which the connector writes checkpoint
170+
- | The absolute file path of the directory where the connector writes checkpoint
59171
information.
60172
|
61-
| For more information about checkpoints, see the
62-
`Spark Structured Streaming Programming Guide <https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#recovering-from-failures-with-checkpointing>`__
173+
| For more information about checkpoints, see the `Spark Structured
174+
Streaming Programming Guide
175+
<https://spark.apache.org/docs/latest/streaming/index.html>`__
63176
|
64177
| **Default:** None
65178

0 commit comments

Comments
 (0)