Fix: keep WebSocket subscriptions on resubscribe failure#6317
Fix: keep WebSocket subscriptions on resubscribe failure#6317AYUSHMIT wants to merge 2 commits intohome-assistant:mainfrom
Conversation
|
Please take a look at the requested changes, and use the Ready for review button when you are done, thanks 👍 |
|
CLA signed. Marked ready for review. |
There was a problem hiding this comment.
Pull request overview
This PR fixes an edge case in WebSocket reconnection where active subscriptions could be permanently lost after a failed resubscribe attempt. The implementation changes the reconnection logic to keep old subscription entries until resubscription succeeds, allowing subsequent reconnect attempts to retry. A unit test was added to verify this behavior.
Changes:
- Modified
reconnectSubscriptions()to snapshot subscriptions before iterating and only remove old entries after successful resubscription - Added a unit test to verify subscriptions are retained when resubscription fails during reconnection
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
common/src/main/kotlin/io/homeassistant/companion/android/common/data/websocket/impl/WebSocketCoreImpl.kt |
Changed reconnection logic to keep old subscription IDs until successful resubscribe |
common/src/test/kotlin/io/homeassistant/companion/android/common/data/websocket/impl/WebSocketCoreImplTest.kt |
Added test case for failed resubscription during reconnection |
| every { mockConnection.send(match<String> { it.contains(""""type":"subscribe_events"""") }) } answers { | ||
| val msg = firstArg<String>() | ||
| val id = Regex(""""id":(\d+)""").find(msg)?.groupValues?.get(1)?.toLong() | ||
| ?: fail { "Missing id in subscribe_events message: $msg" } | ||
|
|
||
| // First subscribe succeeds (already handled by mockResultSuccessForId(2)); fail the *next* subscribe. | ||
| if (id != 2L) { | ||
| resubscribeId = id | ||
| webSocketListener.onMessage( | ||
| mockConnection, | ||
| """{"id":$id,"type":"result","success":false,"result":{}}""", | ||
| ) | ||
| } | ||
| true | ||
| } |
There was a problem hiding this comment.
There appears to be a conflict between the mock defined at line 876 and the mock at line 880. Both mocks match the initial subscribe message (which contains both the id and type), but MockK will use the most recently defined mock. The mock at line 880 doesn't send a response for id=2 (it only handles id != 2), which would cause the initial subscribe at line 896-901 to hang waiting for a response. Consider restructuring the mocks to avoid this conflict, perhaps by setting up the failure mock only after the initial subscription completes.
| if (response == null || response.success != true) { | ||
| Timber.e("Issue re-registering subscription with ${oldActiveMessage.request}") | ||
| // Keep oldId so we can retry on a later reconnect attempt | ||
| return@forEach | ||
| } |
There was a problem hiding this comment.
When a resubscription fails, the new subscription entry created by sendMessage is not cleaned up from activeMessages. The sendMessage method only removes entries when the response is null (timeout) or an exception occurs, but not when success is false. This means after a failed resubscribe attempt, both the old subscription ID and the new failed subscription ID will remain in activeMessages, causing a memory leak. The new failed subscription ID should be explicitly removed from activeMessages after detecting the resubscription failure.
| } | ||
| } | ||
| @Test | ||
| fun `Given an Active subscription When disconnection occurs and resubscribe fails Then it keeps the old subscription in activeMessages`() = runTest { |
There was a problem hiding this comment.
The test name uses "Active" with a capital letter which is inconsistent with the test naming pattern in this file. Other tests use lowercase like "Given an Active subscription" at line 831. The word "Active" here should use lowercase "an active subscription" to maintain consistency with other test names in the file.
| fun `Given an Active subscription When disconnection occurs and resubscribe fails Then it keeps the old subscription in activeMessages`() = runTest { | |
| fun `Given an active subscription When disconnection occurs and resubscribe fails Then it keeps the old subscription in activeMessages`() = runTest { |
TimoPtr
left a comment
There was a problem hiding this comment.
@jpelgrom I'm not super confident about this. I get that it might help in some cases but that would require heavy testing and today we already have OOM that we don't know yet where it's actually coming from.
I guess this could help for long lasting subscription, not widget since after 30s it's killed.
Agreed. It could be helpful but also backfire if unsubscribing from the app-end doesn't happen correctly. For transient failures the 10 second delay was added, and UI will (should) close/resubscribe when hidden/reopened. Maybe retrying once or twice after that is a good middle ground? @AYUSHMIT is there a specific situation in which you observed this as an issue? |
Summary
This change fixes an edge case during WebSocket reconnection where active subscriptions could be permanently dropped.
Previously, when a WebSocket reconnect succeeded but a subscription re-register request failed, the old subscription entry was removed immediately. This meant transient failures could cause the app to stop receiving events until a full restart.
This update keeps existing subscription state until a re-subscribe succeeds, allowing subsequent reconnect attempts to retry safely. A deterministic unit test was added to cover this failure path.
Checklist
Screenshots
N/A – no user-facing UI changes.
Link to pull request in documentation repositories
User Documentation: N/A
Developer Documentation: N/A
Any other notes
This change is intentionally minimal and scoped to reconnection behavior only. It does not alter subscription semantics or message handling outside of the reconnect flow.