Skip to content

NATS Service Failure During the Product Upgrade Process #6893

@Inkathu

Description

@Inkathu

Observed behavior

During an attempt to install the new version of the product, the NATS service experienced failures, leading to unsuccessful upgrade attempts. The issue occurred when attempting to stop a lot of running product jobs.

Symptoms:

  • NATS service failed to respond to multiple StopJobRequest messages.
  • Performance warnings indicated delayed internal subscriptions on various streams (e.g., $JS.API.STREAM.PURGE.SessionEventsStream).
  • The product service failed to start due to no API response from NATS.
  • The NATS service was found to be down during the second upgrade attempt(with "incorrect function error"), and manual restart attempts were initially unsuccessful.
    Performance logs example:

2025/05/07 14:13:59.418304 [WRN] Internal subscription on "$JS.API.STREAM.PURGE.SessionEventsStream" took too long: 3m13.0468125s
2025/05/07 14:13:59.739927 [WRN] Internal subscription on "$JS.API.STREAM.PURGE.SessionEventsStream" took too long: 3m13.3086614s
2025/05/07 14:13:59.739927 [WRN] Internal subscription on "$JS.API.STREAM.PURGE.SessionEventsStream" took too long: 3m13.3689963s

Resolution Attempts:
Cleaning up the NATS folder and restarting the service resolved the issue, allowing for a successful upgrade to new version of the product.

Screen of the service failure between unsuccefull restarts:

Image

Expected behavior

The Nats server works without problems, and there is no need to clean streams.

Server and client version

Server Version: 2.10.22
Client: .net client nats.net v2.2.3

Host environment

No response

Steps to reproduce

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    defectSuspected defect such as a bug or regression

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions