Skip to content

DOCSP-48557 Update Spark streaming write configuration to include all batch options #261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 17 additions & 15 deletions source/batch-mode/batch-write-config.txt
Original file line number Diff line number Diff line change
Expand Up @@ -139,33 +139,35 @@ You can configure the following properties when writing data to MongoDB in batch
|
| **Default:** ``true``

* - ``writeConcern.w``
- | Specifies ``w``, a write-concern option to request acknowledgment that
the write operation has propagated to a specified number of MongoDB
nodes.
|
| For a list of allowed values for this option, see :manual:`WriteConcern
w Option </reference/write-concern/#w-option>` in the {+mdb-server+}
manual.
|
| **Default:** ``1``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: How did you get that this default is 1? the server manual states only that "If the write concern is missing the w field, MongoDB sets the w option to the default write concern," and then later on in the table it says "{ w: "majority" } is the default write concern for most MongoDB deployments". The implicit default write concern for most mongo deployments seems to be majority: https://www.mongodb.com/docs/manual/reference/write-concern/#std-label-wc-default-behavior

This question applies to the other writeConcern.w option as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just moved what was on the batch page to the stream page, so I didn't write any of the info. I assumed most of it was correct but I'll double check everything now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that it can be 1 or majority, so I'm going to include both, with majority taking precedence as it is most cases like you said

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a bit odd if this option has two potential defaults, so it would be good to double check with the technical reviewer in this case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaults to Acknowledged which is the default set by the server.


* - ``writeConcern.journal``
- | Specifies ``j``, a write-concern option to enable request for
acknowledgment that the data is confirmed on on-disk journal for
the criteria specified in the ``w`` option. You can specify
either ``true`` or ``false``.
|
| For more information on ``j`` values, see the MongoDB server
guide on the
:manual:`WriteConcern j option </reference/write-concern/#j-option>`.

* - ``writeConcern.w``
- | Specifies ``w``, a write-concern option to request acknowledgment
that the write operation has propagated to a specified number of
MongoDB nodes. For a list
of allowed values for this option, see :manual:`WriteConcern
</reference/write-concern/#w-option>` in the MongoDB manual.
|
| **Default:** ``1``
| For more information on ``j`` values, see :manual:`WriteConcern j
Option </reference/write-concern/#j-option>` in the {+mdb-server+}
manual.

* - ``writeConcern.wTimeoutMS``
- | Specifies ``wTimeoutMS``, a write-concern option to return an error
when a write operation exceeds the number of milliseconds. If you
use this optional setting, you must specify a nonnegative integer.
|
| For more information on ``wTimeoutMS`` values, see the MongoDB server
guide on the
:manual:`WriteConcern wtimeout option </reference/write-concern/#wtimeout>`.
| For more information on ``wTimeoutMS`` values, see
:manual:`WriteConcern wtimeout </reference/write-concern/#wtimeout>` in
the {+mdb-server+} manual.

Specifying Properties in ``connection.uri``
-------------------------------------------
Expand Down
117 changes: 115 additions & 2 deletions source/streaming-mode/streaming-write-config.txt
Original file line number Diff line number Diff line change
Expand Up @@ -56,13 +56,126 @@ You can configure the following properties when writing data to MongoDB in strea
interface.
|
| **Default:** ``com.mongodb.spark.sql.connector.connection.DefaultMongoClientFactory``

* - ``convertJson``
- | Specifies whether the connector parses the string and converts extended JSON
into BSON.
|
| This setting accepts the following values:

- ``any``: The connector converts all JSON values to BSON.

- ``"{a: 1}"`` becomes ``{a: 1}``.
- ``"[1, 2, 3]"`` becomes ``[1, 2, 3]``.
- ``"true"`` becomes ``true``.
- ``"01234"`` becomes ``1234``.
- ``"{a:b:c}"`` doesn't change.

- ``objectOrArrayOnly``: The connector converts only JSON objects and arrays to
BSON.

- ``"{a: 1}"`` becomes ``{a: 1}``.
- ``"[1, 2, 3]"`` becomes ``[1, 2, 3]``.
- ``"true"`` doesn't change.
- ``"01234"`` doesn't change.
- ``"{a:b:c}"`` doesn't change.

- ``false``: The connector leaves all values as strings.

| **Default:** ``false``

* - ``idFieldList``
- | Field or list of fields by which to split the collection data. To
specify more than one field, separate them using a comma as shown
in the following example:

.. code-block:: none
:copyable: false

"fieldName1,fieldName2"

| **Default:** ``_id``

* - ``ignoreNullValues``
- | When ``true``, the connector ignores any ``null`` values when writing,
including ``null`` values in arrays and nested documents.
|
| **Default:** ``false``

* - ``maxBatchSize``
- | Specifies the maximum number of operations to batch in bulk
operations.
|
| **Default:** ``512``

* - ``operationType``
- | Specifies the type of write operation to perform. You can set
this to one of the following values:

- ``insert``: Insert the data.
- ``replace``: Replace an existing document that matches the
``idFieldList`` value with the new data. If no match exists, the
value of ``upsertDocument`` indicates whether the connector
inserts a new document.
- ``update``: Update an existing document that matches the
``idFieldList`` value with the new data. If no match exists, the
value of ``upsertDocument`` indicates whether the connector
inserts a new document.

|
| **Default:** ``replace``

* - ``ordered``
- | Specifies whether to perform ordered bulk operations.
|
| **Default:** ``true``

* - ``upsertDocument``
- | When ``true``, replace and update operations will insert the data
if no match exists.
|
| For time series collections, you must set ``upsertDocument`` to
``false``.
|
| **Default:** ``true``

* - ``writeConcern.w``
- | Specifies ``w``, a write-concern option to request acknowledgment that
the write operation has propagated to a specified number of MongoDB
nodes.
|
| For a list of allowed values for this option, see :manual:`WriteConcern
w Option </reference/write-concern/#w-option>` in the {+mdb-server+}
manual.
|
| **Default:** ``1``

* - ``writeConcern.journal``
- | Specifies ``j``, a write-concern option to enable request for
acknowledgment that the data is confirmed on on-disk journal for
the criteria specified in the ``w`` option. You can specify
either ``true`` or ``false``.
|
| For more information on ``j`` values, see :manual:`WriteConcern j
Option </reference/write-concern/#j-option>` in the {+mdb-server+}
manual.

* - ``writeConcern.wTimeoutMS``
- | Specifies ``wTimeoutMS``, a write-concern option to return an error
when a write operation exceeds the number of milliseconds. If you
use this optional setting, you must specify a nonnegative integer.
|
| For more information on ``wTimeoutMS`` values, see
:manual:`WriteConcern wtimeout </reference/write-concern/#wtimeout>` in
the {+mdb-server+} manual.

* - ``checkpointLocation``
- | The absolute file path of the directory to which the connector writes checkpoint
information.
|
| For more information about checkpoints, see the
`Spark Structured Streaming Programming Guide <https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#recovering-from-failures-with-checkpointing>`__
| For more information about checkpoints, see the `Spark Structured
Streaming Programming Guide
<https://spark.apache.org/docs/latest/streaming/index.html>`__
|
| **Default:** None

Expand Down
Loading