Skip to content

DOCSP-48557 Update Spark streaming write configuration to include all batch options #261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

mballard-mdb
Copy link

@mballard-mdb mballard-mdb commented Jul 2, 2025

Pull Request Info

PR Reviewing Guidelines

JIRA - https://jira.mongodb.org/browse/DOCSP-48557

Staging Links

  • batch-mode/batch-write-config
  • streaming-mode/streaming-write-config
  • Self-Review Checklist

    • Is this free of any warnings or errors in the RST?
    • Did you run a spell-check?
    • Did you run a grammar-check?
    • Are all the links working?
    • Are the facets and meta keywords accurate?
    • Are the page titles greater than 20 characters long and SEO relevant?

    Copy link

    netlify bot commented Jul 2, 2025

    Deploy Preview for docs-spark-connector ready!

    Name Link
    🔨 Latest commit e49a0a3
    🔍 Latest deploy log https://app.netlify.com/projects/docs-spark-connector/deploys/686576858efa8c0008e1d2b9
    😎 Deploy Preview https://deploy-preview-261--docs-spark-connector.netlify.app
    📱 Preview on mobile
    Toggle QR Code...

    QR Code

    Use your smartphone camera to open QR code link.

    To edit notification comments on pull requests, go to your Netlify project configuration.

    @docs-builder-bot
    Copy link

    docs-builder-bot commented Jul 2, 2025

    🔄 Deploy Preview for docs-spark-connector processing

    Item Details
    🔨 Latest Commit c3b319db35f54b44697f16f2fc26dc3f7321319d
    😎 Deploy Preview https://deploy-preview-261--docs-spark-connector.netlify.app
    🔍 Build Logs View Logs

    Copy link
    Contributor

    @shuangela shuangela left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    great job on this! added some comments related to wording as well as a question about a default setting in one of the parameters.

    w Option </reference/write-concern/#w-option>` in the {+mdb-server+}
    manual.
    |
    | **Default:** ``1``
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Q: How did you get that this default is 1? the server manual states only that "If the write concern is missing the w field, MongoDB sets the w option to the default write concern," and then later on in the table it says "{ w: "majority" } is the default write concern for most MongoDB deployments". The implicit default write concern for most mongo deployments seems to be majority: https://www.mongodb.com/docs/manual/reference/write-concern/#std-label-wc-default-behavior

    This question applies to the other writeConcern.w option as well

    Copy link
    Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I just moved what was on the batch page to the stream page, so I didn't write any of the info. I assumed most of it was correct but I'll double check everything now.

    Copy link
    Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I see that it can be 1 or majority, so I'm going to include both, with majority taking precedence as it is most cases like you said

    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I think it's a bit odd if this option has two potential defaults, so it would be good to double check with the technical reviewer in this case

    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Defaults to Acknowledged which is the default set by the server.

    @@ -56,13 +56,126 @@ You can configure the following properties when writing data to MongoDB in strea
    interface.
    |
    | **Default:** ``com.mongodb.spark.sql.connector.connection.DefaultMongoClientFactory``

    * - ``convertJson``
    - | Specifies whether the connector parses the string and converts extended JSON
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    S: "whether" seems a bit imprecise, as there are more than two options. Perhaps change wording to "specifies if and how", or something similar. applies to all instances of convertJson

    @@ -56,13 +56,126 @@ You can configure the following properties when writing data to MongoDB in strea
    interface.
    |
    | **Default:** ``com.mongodb.spark.sql.connector.connection.DefaultMongoClientFactory``

    * - ``convertJson``
    - | Specifies whether the connector parses the string and converts extended JSON
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggested change
    - | Specifies whether the connector parses the string and converts extended JSON
    - | Specifies whether the connector parses string values and converts extended JSON

    "the string" is a bit unclear to me as i'm unsure what the specific string we're referring to is. from my understanding this option applies to the whole value of the write operation so "string values" or "strings" would be more clear

    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    applies to all

    | **Default:** ``false``

    * - ``idFieldList``
    - | Field or list of fields by which to split the collection data. To
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    S: to keep parallelism, "Specifies a field or list of fields.."

    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    applies to all

    | **Default:** ``1``
    | For more information on ``j`` values, see :manual:`WriteConcern j
    Option </reference/write-concern/#j-option>` in the {+mdb-server+}
    manual.

    * - ``writeConcern.wTimeoutMS``
    - | Specifies ``wTimeoutMS``, a write-concern option to return an error
    when a write operation exceeds the number of milliseconds. If you
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggested change
    when a write operation exceeds the number of milliseconds. If you
    when a write operation exceeds the specified number of milliseconds. If you

    applies to all

    manual.
    |
    | **Default:** ``1``

    * - ``writeConcern.journal``
    - | Specifies ``j``, a write-concern option to enable request for
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggested change
    - | Specifies ``j``, a write-concern option to enable request for
    - | Specifies ``j``, a write-concern option requesting acknowledgment from MongoDB that the data has been written to the on-disk journal...

    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    applies to all

    | **Default:** ``true``

    * - ``upsertDocument``
    - | When ``true``, replace and update operations will insert the data
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggested change
    - | When ``true``, replace and update operations will insert the data
    - | When ``true``, replace and update operations insert the data

    applies to all

    |
    | For more information on ``wTimeoutMS`` values, see
    :manual:`WriteConcern wtimeout </reference/write-concern/#wtimeout>` in
    the {+mdb-server+} manual.

    * - ``checkpointLocation``
    - | The absolute file path of the directory to which the connector writes checkpoint
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggested change
    - | The absolute file path of the directory to which the connector writes checkpoint
    - | The absolute file path of the directory where the connector writes checkpoint

    @@ -139,33 +139,35 @@ You can configure the following properties when writing data to MongoDB in batch
    |
    | **Default:** ``true``

    * - ``writeConcern.w``
    - | Specifies ``w``, a write-concern option to request acknowledgment that
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Suggested change
    - | Specifies ``w``, a write-concern option to request acknowledgment that
    - | Specifies ``w``, a write-concern option requesting acknowledgment that

    @mballard-mdb mballard-mdb requested a review from shuangela July 2, 2025 18:12
    Copy link
    Contributor

    @shuangela shuangela left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    lgtm with a note to ask tech reviewer for confirmation on something!

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    4 participants