Skip to content

Commit f430782

Browse files
authored
DOCSP-48557 Update Spark streaming write configuration to include all batch options (#261) (#263)
* DOCSP-48557 spark options update * DOCSP-48557 links and reorder * DOCSP-48667 link fix + mdb-server consistency * DOCSP-48557 wording * DOCSP-48557 remove ital * DOCSP-49557 update default w concern (cherry picked from commit 792cdc0)
1 parent 199d850 commit f430782

File tree

2 files changed

+141
-26
lines changed

2 files changed

+141
-26
lines changed

source/batch-mode/batch-write-config.txt

Lines changed: 25 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ You can configure the following properties when writing data to MongoDB in batch
5858
| **Default:** ``com.mongodb.spark.sql.connector.connection.DefaultMongoClientFactory``
5959

6060
* - ``convertJson``
61-
- | Specifies whether the connector parses the string and converts extended JSON
61+
- | Specifies if the connector parses string values and converts extended JSON
6262
into BSON.
6363
|
6464
| This setting accepts the following values:
@@ -85,7 +85,7 @@ You can configure the following properties when writing data to MongoDB in batch
8585
| **Default:** ``false``
8686

8787
* - ``idFieldList``
88-
- | Field or list of fields by which to split the collection data. To
88+
- | Specifies a field or list of fields by which to split the collection data. To
8989
specify more than one field, separate them using a comma as shown
9090
in the following example:
9191

@@ -131,41 +131,43 @@ You can configure the following properties when writing data to MongoDB in batch
131131
| **Default:** ``true``
132132

133133
* - ``upsertDocument``
134-
- | When ``true``, replace and update operations will insert the data
134+
- | When ``true``, replace and update operations insert the data
135135
if no match exists.
136136
|
137137
| For time series collections, you must set ``upsertDocument`` to
138138
``false``.
139139
|
140140
| **Default:** ``true``
141141

142-
* - ``writeConcern.journal``
143-
- | Specifies ``j``, a write-concern option to enable request for
144-
acknowledgment that the data is confirmed on on-disk journal for
145-
the criteria specified in the ``w`` option. You can specify
146-
either ``true`` or ``false``.
147-
|
148-
| For more information on ``j`` values, see the MongoDB server
149-
guide on the
150-
:manual:`WriteConcern j option </reference/write-concern/#j-option>`.
151-
152142
* - ``writeConcern.w``
153-
- | Specifies ``w``, a write-concern option to request acknowledgment
154-
that the write operation has propagated to a specified number of
155-
MongoDB nodes. For a list
156-
of allowed values for this option, see :manual:`WriteConcern
157-
</reference/write-concern/#w-option>` in the MongoDB manual.
143+
- | Specifies ``w``, a write-concern option requesting acknowledgment that
144+
the write operation has propagated to a specified number of MongoDB
145+
nodes.
146+
|
147+
| For a list of allowed values for this option, see :manual:`WriteConcern
148+
w Option </reference/write-concern/#w-option>` in the {+mdb-server+}
149+
manual.
150+
|
151+
| **Default:** ``Acknowledged``
152+
153+
* - ``writeConcern.journal``
154+
- | Specifies ``j``, a write-concern option requesting acknowledgment that
155+
the data has been written to the on-disk journal for the criteria
156+
specified in the ``w`` option. You can specify either ``true`` or
157+
``false``.
158158
|
159-
| **Default:** ``1``
159+
| For more information on ``j`` values, see :manual:`WriteConcern j
160+
Option </reference/write-concern/#j-option>` in the {+mdb-server+}
161+
manual.
160162

161163
* - ``writeConcern.wTimeoutMS``
162164
- | Specifies ``wTimeoutMS``, a write-concern option to return an error
163-
when a write operation exceeds the number of milliseconds. If you
165+
when a write operation exceeds the specified number of milliseconds. If you
164166
use this optional setting, you must specify a nonnegative integer.
165167
|
166-
| For more information on ``wTimeoutMS`` values, see the MongoDB server
167-
guide on the
168-
:manual:`WriteConcern wtimeout option </reference/write-concern/#wtimeout>`.
168+
| For more information on ``wTimeoutMS`` values, see
169+
:manual:`WriteConcern wtimeout </reference/write-concern/#wtimeout>` in
170+
the {+mdb-server+} manual.
169171

170172
Specifying Properties in ``connection.uri``
171173
-------------------------------------------

source/streaming-mode/streaming-write-config.txt

Lines changed: 116 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -56,13 +56,126 @@ You can configure the following properties when writing data to MongoDB in strea
5656
interface.
5757
|
5858
| **Default:** ``com.mongodb.spark.sql.connector.connection.DefaultMongoClientFactory``
59+
60+
* - ``convertJson``
61+
- | Specifies if the connector parses string values and converts extended JSON
62+
into BSON.
63+
|
64+
| This setting accepts the following values:
65+
66+
- ``any``: The connector converts all JSON values to BSON.
67+
68+
- ``"{a: 1}"`` becomes ``{a: 1}``.
69+
- ``"[1, 2, 3]"`` becomes ``[1, 2, 3]``.
70+
- ``"true"`` becomes ``true``.
71+
- ``"01234"`` becomes ``1234``.
72+
- ``"{a:b:c}"`` doesn't change.
73+
74+
- ``objectOrArrayOnly``: The connector converts only JSON objects and arrays to
75+
BSON.
76+
77+
- ``"{a: 1}"`` becomes ``{a: 1}``.
78+
- ``"[1, 2, 3]"`` becomes ``[1, 2, 3]``.
79+
- ``"true"`` doesn't change.
80+
- ``"01234"`` doesn't change.
81+
- ``"{a:b:c}"`` doesn't change.
82+
83+
- ``false``: The connector leaves all values as strings.
84+
85+
| **Default:** ``false``
86+
87+
* - ``idFieldList``
88+
- | Specifies a field or list of fields by which to split the collection data. To
89+
specify more than one field, separate them using a comma as shown
90+
in the following example:
91+
92+
.. code-block:: none
93+
:copyable: false
94+
95+
"fieldName1,fieldName2"
96+
97+
| **Default:** ``_id``
98+
99+
* - ``ignoreNullValues``
100+
- | When ``true``, the connector ignores any ``null`` values when writing,
101+
including ``null`` values in arrays and nested documents.
102+
|
103+
| **Default:** ``false``
104+
105+
* - ``maxBatchSize``
106+
- | Specifies the maximum number of operations to batch in bulk
107+
operations.
108+
|
109+
| **Default:** ``512``
110+
111+
* - ``operationType``
112+
- | Specifies the type of write operation to perform. You can set
113+
this to one of the following values:
114+
115+
- ``insert``: Insert the data.
116+
- ``replace``: Replace an existing document that matches the
117+
``idFieldList`` value with the new data. If no match exists, the
118+
value of ``upsertDocument`` indicates whether the connector
119+
inserts a new document.
120+
- ``update``: Update an existing document that matches the
121+
``idFieldList`` value with the new data. If no match exists, the
122+
value of ``upsertDocument`` indicates whether the connector
123+
inserts a new document.
124+
125+
|
126+
| **Default:** ``replace``
127+
128+
* - ``ordered``
129+
- | Specifies whether to perform ordered bulk operations.
130+
|
131+
| **Default:** ``true``
132+
133+
* - ``upsertDocument``
134+
- | When ``true``, replace and update operations insert the data
135+
if no match exists.
136+
|
137+
| For time series collections, you must set ``upsertDocument`` to
138+
``false``.
139+
|
140+
| **Default:** ``true``
141+
142+
* - ``writeConcern.w``
143+
- | Specifies ``w``, a write-concern option requesting acknowledgment that
144+
the write operation has propagated to a specified number of MongoDB
145+
nodes.
146+
|
147+
| For a list of allowed values for this option, see :manual:`WriteConcern
148+
w Option </reference/write-concern/#w-option>` in the {+mdb-server+}
149+
manual.
150+
|
151+
| **Default:** ``Acknowledged``
152+
153+
* - ``writeConcern.journal``
154+
- | Specifies ``j``, a write-concern option requesting acknowledgment that
155+
the data has been written to the on-disk journal for the criteria
156+
specified in the ``w`` option. You can specify either ``true`` or
157+
``false``.
158+
|
159+
| For more information on ``j`` values, see :manual:`WriteConcern j
160+
Option </reference/write-concern/#j-option>` in the {+mdb-server+}
161+
manual.
162+
163+
* - ``writeConcern.wTimeoutMS``
164+
- | Specifies ``wTimeoutMS``, a write-concern option to return an error
165+
when a write operation exceeds the specified number of milliseconds. If you
166+
use this optional setting, you must specify a nonnegative integer.
167+
|
168+
| For more information on ``wTimeoutMS`` values, see
169+
:manual:`WriteConcern wtimeout </reference/write-concern/#wtimeout>` in
170+
the {+mdb-server+} manual.
59171

60172
* - ``checkpointLocation``
61-
- | The absolute file path of the directory to which the connector writes checkpoint
173+
- | The absolute file path of the directory where the connector writes checkpoint
62174
information.
63175
|
64-
| For more information about checkpoints, see the
65-
`Spark Structured Streaming Programming Guide <https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#recovering-from-failures-with-checkpointing>`__
176+
| For more information about checkpoints, see the `Spark Structured
177+
Streaming Programming Guide
178+
<https://spark.apache.org/docs/latest/streaming/index.html>`__
66179
|
67180
| **Default:** None
68181

0 commit comments

Comments
 (0)