Skip to content

Fix: keep WebSocket subscriptions on resubscribe failure#6317

Open
AYUSHMIT wants to merge 2 commits intohome-assistant:mainfrom
AYUSHMIT:fix/websocket-resubscribe-on-reconnect-clean
Open

Fix: keep WebSocket subscriptions on resubscribe failure#6317
AYUSHMIT wants to merge 2 commits intohome-assistant:mainfrom
AYUSHMIT:fix/websocket-resubscribe-on-reconnect-clean

Conversation

@AYUSHMIT
Copy link
Copy Markdown

Summary

This change fixes an edge case during WebSocket reconnection where active subscriptions could be permanently dropped.

Previously, when a WebSocket reconnect succeeded but a subscription re-register request failed, the old subscription entry was removed immediately. This meant transient failures could cause the app to stop receiving events until a full restart.

This update keeps existing subscription state until a re-subscribe succeeds, allowing subsequent reconnect attempts to retry safely. A deterministic unit test was added to cover this failure path.

Checklist

  • New or updated tests have been added to cover the changes following the testing guidelines.
  • The code follows the project's code style and best_practices.
  • The changes have been thoroughly tested, and edge cases have been considered.
  • Changes are backward compatible whenever feasible. Any breaking changes are documented in the changelog for users and/or in the code for developers depending on the relevance.

Screenshots

N/A – no user-facing UI changes.

Link to pull request in documentation repositories

User Documentation: N/A
Developer Documentation: N/A

Any other notes

This change is intentionally minimal and scoped to reconnection behavior only. It does not alter subscription semantics or message handling outside of the reconnect flow.

Copilot AI review requested due to automatic review settings January 23, 2026 22:24
Copy link
Copy Markdown

@home-assistant home-assistant bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @AYUSHMIT

It seems you haven't yet signed a CLA. Please do so here.

Once you do that we will be able to review and accept this pull request.

Thanks!

@home-assistant
Copy link
Copy Markdown

Please take a look at the requested changes, and use the Ready for review button when you are done, thanks 👍

Learn more about our pull request process.

@home-assistant home-assistant bot marked this pull request as draft January 23, 2026 22:25
@AYUSHMIT AYUSHMIT marked this pull request as ready for review January 23, 2026 22:30
@AYUSHMIT
Copy link
Copy Markdown
Author

CLA signed. Marked ready for review.
This PR is scoped to reconnection behavior only + includes a deterministic unit test for the failure path.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an edge case in WebSocket reconnection where active subscriptions could be permanently lost after a failed resubscribe attempt. The implementation changes the reconnection logic to keep old subscription entries until resubscription succeeds, allowing subsequent reconnect attempts to retry. A unit test was added to verify this behavior.

Changes:

  • Modified reconnectSubscriptions() to snapshot subscriptions before iterating and only remove old entries after successful resubscription
  • Added a unit test to verify subscriptions are retained when resubscription fails during reconnection

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
common/src/main/kotlin/io/homeassistant/companion/android/common/data/websocket/impl/WebSocketCoreImpl.kt Changed reconnection logic to keep old subscription IDs until successful resubscribe
common/src/test/kotlin/io/homeassistant/companion/android/common/data/websocket/impl/WebSocketCoreImplTest.kt Added test case for failed resubscription during reconnection

Comment on lines +880 to +894
every { mockConnection.send(match<String> { it.contains(""""type":"subscribe_events"""") }) } answers {
val msg = firstArg<String>()
val id = Regex(""""id":(\d+)""").find(msg)?.groupValues?.get(1)?.toLong()
?: fail { "Missing id in subscribe_events message: $msg" }

// First subscribe succeeds (already handled by mockResultSuccessForId(2)); fail the *next* subscribe.
if (id != 2L) {
resubscribeId = id
webSocketListener.onMessage(
mockConnection,
"""{"id":$id,"type":"result","success":false,"result":{}}""",
)
}
true
}
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There appears to be a conflict between the mock defined at line 876 and the mock at line 880. Both mocks match the initial subscribe message (which contains both the id and type), but MockK will use the most recently defined mock. The mock at line 880 doesn't send a response for id=2 (it only handles id != 2), which would cause the initial subscribe at line 896-901 to hang waiting for a response. Consider restructuring the mocks to avoid this conflict, perhaps by setting up the failure mock only after the initial subscription completes.

Copilot uses AI. Check for mistakes.
Comment on lines +1043 to 1047
if (response == null || response.success != true) {
Timber.e("Issue re-registering subscription with ${oldActiveMessage.request}")
// Keep oldId so we can retry on a later reconnect attempt
return@forEach
}
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a resubscription fails, the new subscription entry created by sendMessage is not cleaned up from activeMessages. The sendMessage method only removes entries when the response is null (timeout) or an exception occurs, but not when success is false. This means after a failed resubscribe attempt, both the old subscription ID and the new failed subscription ID will remain in activeMessages, causing a memory leak. The new failed subscription ID should be explicitly removed from activeMessages after detecting the resubscription failure.

Copilot uses AI. Check for mistakes.
}
}
@Test
fun `Given an Active subscription When disconnection occurs and resubscribe fails Then it keeps the old subscription in activeMessages`() = runTest {
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test name uses "Active" with a capital letter which is inconsistent with the test naming pattern in this file. Other tests use lowercase like "Given an Active subscription" at line 831. The word "Active" here should use lowercase "an active subscription" to maintain consistency with other test names in the file.

Suggested change
fun `Given an Active subscription When disconnection occurs and resubscribe fails Then it keeps the old subscription in activeMessages`() = runTest {
fun `Given an active subscription When disconnection occurs and resubscribe fails Then it keeps the old subscription in activeMessages`() = runTest {

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

@TimoPtr TimoPtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpelgrom I'm not super confident about this. I get that it might help in some cases but that would require heavy testing and today we already have OOM that we don't know yet where it's actually coming from.

I guess this could help for long lasting subscription, not widget since after 30s it's killed.

@jpelgrom
Copy link
Copy Markdown
Member

@jpelgrom I'm not super confident about this. I get that it might help in some cases but that would require heavy testing and today we already have OOM that we don't know yet where it's actually coming from.

I guess this could help for long lasting subscription, not widget since after 30s it's killed.

Agreed. It could be helpful but also backfire if unsubscribing from the app-end doesn't happen correctly.

For transient failures the 10 second delay was added, and UI will (should) close/resubscribe when hidden/reopened. Maybe retrying once or twice after that is a good middle ground? @AYUSHMIT is there a specific situation in which you observed this as an issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants