Skip to content

WiFi auto reconnect should occur for all disconnect reasons #7210

Closed
@RefactorFactory

Description

@RefactorFactory

Board

n/a

Device Description

n/a

Hardware Configuration

n/a

Version

v2.0.4

IDE Name

n/a

Operating System

n/a

Flash frequency

n/a

PSRAM enabled

yes

Upload speed

n/a

Description

WiFiSTAClass has an _autoReconnect member, which defaults to true. When WiFiGeneric::_eventCallback() handles a ARDUINO_EVENT_WIFI_STA_DISCONNECTED event, it checks WiFiSTAClass::getAutoReconnect() and only reconnects if the following criteria is true:

https://github.com/espressif/arduino-esp32/blob/2.0.4/libraries/WiFi/src/WiFiGeneric.cpp#L971

        else if(WiFi.getAutoReconnect()){
            if((reason == WIFI_REASON_AUTH_EXPIRE) ||
            (reason >= WIFI_REASON_BEACON_TIMEOUT && reason != WIFI_REASON_AUTH_FAIL))
            {
                log_d("WiFi AutoReconnect Running");
                WiFi.disconnect();
                WiFi.begin();
            }
        }

The code above excludes reasons such as WIFI_REASON_ASSOC_EXPIRE, WIFI_REASON_NOT_ASSOCED, WIFI_REASON_4WAY_HANDSHAKE_TIMEOUT, WIFI_REASON_GROUP_KEY_UPDATE_TIMEOUT and perhaps other reasons that may randomly occur.

I propose that reconnection should occur for more than the reasons currently in the code above because of the following:

  1. The ESP-IDF Programming Guide recommends reconnecting for more reasons:

https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/wifi.html#wifi-event-sta-disconnected

The most common event handle code for this event in application is to call esp_wifi_connect() to reconnect the Wi-Fi. However, if the event is raised because esp_wifi_disconnect() is called, the application should not call esp_wifi_connect() to reconnect. It is the application's responsibility to distinguish whether the event is caused by esp_wifi_disconnect() or other reasons. Sometimes a better reconnection strategy is required. Refer to Wi-Fi Reconnect and Scan When Wi-Fi Is Connecting.

https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/wifi.html#wi-fi-disconnect-phase

s6.2: In the scenario described above, the application event callback function relays WIFI_EVENT_STA_DISCONNECTED to the application task. The recommended actions are: 1) call esp_wifi_connect() to reconnect the Wi-Fi, 2) close all sockets, and 3) re-create them if necessary. For details, please refer to WIFI_EVENT_STA_DISCONNECTED.

https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/wifi.html#wi-fi-reconnect

The station may disconnect due to many reasons, e.g., the connected AP is restarted. It is the application's responsibility to reconnect. The recommended reconnection strategy is to call esp_wifi_connect() on receiving event WIFI_EVENT_STA_DISCONNECTED.

Sometimes the application needs more complex reconnection strategy:

  • If the disconnect event is raised because the esp_wifi_disconnect() is called, the application may not want to do the reconnection.
  • If the esp_wifi_scan_start() may be called at anytime, a better reconnection strategy is necessary. Refer to Scan When Wi-Fi Is Connecting.

Another thing that need to be considered is that the reconnection may not connect the same AP if there are more than one APs with the same SSID. The reconnection always select current best APs to connect.

  1. The ESP-IDF examples reconnect for all reasons:

https://github.com/espressif/esp-idf/blob/v4.4.2/examples/wifi/getting_started/station/main/station_example_main.c#L68
https://github.com/espressif/esp-idf/blob/v4.4.2/examples/provisioning/wifi_prov_mgr/main/app_main.c#L100

  1. ESP-Jumpstart, a "ready reference, a known set of best steps, gathered from previous experience of others" reconnects for all reasons:

https://github.com/espressif/esp-jumpstart/blob/bf3e26f2295730c8f6e9e7c08c897d2155064c5c/7_mfg/main/app_wifi.c#L112

  1. Auto reconnect is already on by default to provide a sensible default to new developers. Why not make that default also sensibly reconnect for all reasons?

Sketch

n/a

Debug Message

n/a

Other Steps to Reproduce

No response

I have checked existing issues, online documentation and the Troubleshooting Guide

  • I confirm I have checked existing issues, online documentation and Troubleshooting guide.

Activity

SuGlider

SuGlider commented on Sep 3, 2022

@SuGlider
Collaborator

@RefactorFactory - thanks for reporting such detailed and well explained issue!

Let's check it out and find a good fix.

self-assigned this
on Sep 3, 2022
mrengineer7777

mrengineer7777 commented on Sep 29, 2022

@mrengineer7777
Collaborator

I have noticed this on 2.0.3 (platform = https://github.com/tasmota/platform-espressif32/releases/download/v2.0.3/platform-espressif32-2.0.3.zip).

[ 16898][W][WiFiGeneric.cpp:873] _eventCallback(): Reason: 4 - ASSOC_EXPIRE WiFi disconnected 'WIFI_REASON_ASSOC_EXPIRE'

This event happens half the time after programming. Must reboot to get the device to connect.

mrengineer7777

mrengineer7777 commented on Oct 11, 2022

@mrengineer7777
Collaborator

The WiFi failure on "ASSOC_EXPIRE" is a serious bug for us, so I plan to submit a PR to fix this issue. Extensive analysis follows. Feedback wanted!

When ARDUINO_EVENT_WIFI_STA_DISCONNECTED occurs,
esp_err_t WiFiGenericClass::_eventCallback(arduino_event_t *event) attempts to reconnect for the following reasons:

  1. If connect fails at turn-on, will retry ONCE (ever) for these reasons:
    WIFI_REASON_AUTH_EXPIRE              = 2,
    WIFI_REASON_BEACON_TIMEOUT           = 200,
    WIFI_REASON_NO_AP_FOUND              = 201,
    WIFI_REASON_AUTH_FAIL                = 202,
    WIFI_REASON_ASSOC_FAIL               = 203,
    WIFI_REASON_HANDSHAKE_TIMEOUT        = 204,
    WIFI_REASON_CONNECTION_FAIL          = 205,
    WIFI_REASON_AP_TSF_RESET             = 206,
    WIFI_REASON_ROAMING                  = 207
  1. If WiFi.getAutoReconnect() enabled, will reconnect for these reasons:
    WIFI_REASON_AUTH_EXPIRE              = 2,
    WIFI_REASON_BEACON_TIMEOUT           = 200,
    WIFI_REASON_NO_AP_FOUND              = 201,

    WIFI_REASON_ASSOC_FAIL               = 203,
    WIFI_REASON_HANDSHAKE_TIMEOUT        = 204,
    WIFI_REASON_CONNECTION_FAIL          = 205,
    WIFI_REASON_AP_TSF_RESET             = 206,
    WIFI_REASON_ROAMING                  = 207

These are the current disconnect reasons from esp_wifi_types.h, wifi_err_reason_t:

    WIFI_REASON_UNSPECIFIED              = 1,
    WIFI_REASON_AUTH_EXPIRE              = 2,
    WIFI_REASON_AUTH_LEAVE               = 3,
    WIFI_REASON_ASSOC_EXPIRE             = 4,
    WIFI_REASON_ASSOC_TOOMANY            = 5,
    WIFI_REASON_NOT_AUTHED               = 6,
    WIFI_REASON_NOT_ASSOCED              = 7,
    WIFI_REASON_ASSOC_LEAVE              = 8,
    WIFI_REASON_ASSOC_NOT_AUTHED         = 9,
    WIFI_REASON_DISASSOC_PWRCAP_BAD      = 10,
    WIFI_REASON_DISASSOC_SUPCHAN_BAD     = 11,
    WIFI_REASON_BSS_TRANSITION_DISASSOC  = 12,
    WIFI_REASON_IE_INVALID               = 13,
    WIFI_REASON_MIC_FAILURE              = 14,
    WIFI_REASON_4WAY_HANDSHAKE_TIMEOUT   = 15,
    WIFI_REASON_GROUP_KEY_UPDATE_TIMEOUT = 16,
    WIFI_REASON_IE_IN_4WAY_DIFFERS       = 17,
    WIFI_REASON_GROUP_CIPHER_INVALID     = 18,
    WIFI_REASON_PAIRWISE_CIPHER_INVALID  = 19,
    WIFI_REASON_AKMP_INVALID             = 20,
    WIFI_REASON_UNSUPP_RSN_IE_VERSION    = 21,
    WIFI_REASON_INVALID_RSN_IE_CAP       = 22,
    WIFI_REASON_802_1X_AUTH_FAILED       = 23,
    WIFI_REASON_CIPHER_SUITE_REJECTED    = 24,
    WIFI_REASON_INVALID_PMKID            = 53,
    WIFI_REASON_BEACON_TIMEOUT           = 200,
    WIFI_REASON_NO_AP_FOUND              = 201,
    WIFI_REASON_AUTH_FAIL                = 202,
    WIFI_REASON_ASSOC_FAIL               = 203,
    WIFI_REASON_HANDSHAKE_TIMEOUT        = 204,
    WIFI_REASON_CONNECTION_FAIL          = 205,
    WIFI_REASON_AP_TSF_RESET             = 206,
    WIFI_REASON_ROAMING                  = 207,

Based on my understanding of scan-when-wi-fi-is-connecting , I would break down the disconnect reasons as:

Disconnected
    WIFI_REASON_ASSOC_LEAVE              = 8,       //Client voluntarily disconnected from AP. Do not reconnect!

Fatal
    WIFI_REASON_UNSPECIFIED              = 1,       //Internal failure (e.g. out of memory) or msg from AP
    WIFI_REASON_DISASSOC_PWRCAP_BAD      = 10,      //Bad power setting
    WIFI_REASON_DISASSOC_SUPCHAN_BAD     = 11,      //Bad channel setting
    WIFI_REASON_IE_INVALID               = 13,      //Invalid element
    WIFI_REASON_UNSUPP_RSN_IE_VERSION    = 21,      //Unsupported RSNE version
    WIFI_REASON_CIPHER_SUITE_REJECTED    = 24,      //Cipher suite rejected due to security policies
    WIFI_REASON_AUTH_FAIL                = 202,     //Auth failed :(

Timeouts (retry)
    WIFI_REASON_AUTH_EXPIRE              = 2,       //Timed out during auth or AP sent reason
    WIFI_REASON_4WAY_HANDSHAKE_TIMEOUT   = 15,      //Timed out during 4-way handshake (ESP uses WIFI_REASON_HANDSHAKE_TIMEOUT instead)
    WIFI_REASON_GROUP_KEY_UPDATE_TIMEOUT = 16,      //Group key handshake times out
    WIFI_REASON_802_1X_AUTH_FAILED       = 23,      //802.1X auth failed. Best guess: enterprise radius certificate error or client timeout waiting for server response.
    WIFI_REASON_HANDSHAKE_TIMEOUT        = 204,     //Same as WIFI_REASON_4WAY_HANDSHAKE_TIMEOUT

Transient error (reconnect)
    WIFI_REASON_AUTH_LEAVE               = 3,       //AP is leaving (rebooting?)
    WIFI_REASON_ASSOC_EXPIRE             = 4,       //AP disconnected client due to inactivity
    WIFI_REASON_ASSOC_TOOMANY            = 5,       //AP cannot handle any more clients at this time
    WIFI_REASON_NOT_AUTHED               = 6,       //Client not authenticated
    WIFI_REASON_NOT_ASSOCED              = 7,       //Client not associated
    WIFI_REASON_ASSOC_NOT_AUTHED         = 9,       //Client sent data while associated but not authenticated
    WIFI_REASON_MIC_FAILURE              = 14,      //Message integrity code failure
    WIFI_REASON_IE_IN_4WAY_DIFFERS       = 17,      //The element in the four-way handshake is different from the (Re-)Association Request/Probe and Response/Beacon frame
    WIFI_REASON_INVALID_PMKID            = 53,      //?? Undocumented. PMK is reused to create session keys between the client and the roamed to AP.
    WIFI_REASON_BEACON_TIMEOUT           = 200,     //Client is no longer hearing beacons from AP and has failed 5 probe requests.  AP is likely offline.
    WIFI_REASON_NO_AP_FOUND              = 201,     //Unable to scan the specified AP (SSID or BSSID).  I believe the ESP32 must find the AP in scan list before it will connect. Incorrect SSID or AP offline.
    WIFI_REASON_ASSOC_FAIL               = 203,     //Association failed
    WIFI_REASON_CONNECTION_FAIL          = 205,     //Espressif-specific Wi-Fi reason code: the connection to the AP has failed.
    WIFI_REASON_AP_TSF_RESET             = 206,     //?? Undocumentated.  TSF in an timestamp kept by AP and possibly clients.  I see a fix in ESP-IDF for disconnects here: https://github.com/espressif/esp32-wifi-lib/commit/435347a24cec805f81319d3ac6d2a2f17da57bd5
    WIFI_REASON_ROAMING                  = 207,     //?? Undocumentated.  Per Google roaming is triggered by client when it gets too far from an AP and wants to connect to a stronger one.

Unknown
    WIFI_REASON_BSS_TRANSITION_DISASSOC  = 12,      //?? Undocumentated
    WIFI_REASON_GROUP_CIPHER_INVALID     = 18,      //Group ciper invalid
    WIFI_REASON_PAIRWISE_CIPHER_INVALID  = 19,      //Pairwise cipher invalid
    WIFI_REASON_AKMP_INVALID             = 20,      //AKMP invalid
    WIFI_REASON_INVALID_RSN_IE_CAP       = 22,      //RSNE invalid

Since the initial retry fires on AUTH_FAIL, I would argue it should retry for ALL disconnect reasons except WIFI_REASON_ASSOC_LEAVE.

I believe the auto-reconnect should trigger for all Timeout and Transient errors. I don't know what to do with the Unknown reasons.

Note to self: will be submitting PR against WiFiGeneric.cpp

RefactorFactory

RefactorFactory commented on Oct 12, 2022

@RefactorFactory
ContributorAuthor

While #7344 is an great improvement over the existing code and it handles all the WiFi disconnect reasons that I've personally seen, why not just do as the official ESP-IDF samples?

Consider if there is a problem with #7344 in the future. At that time, if we ask Espressif, they might say "hmm, we never encountered such a problem because we never do that in our ESP-IDF samples (and perhaps other tests)." This hypothetical problem could have been avoided by matching what Espressif does in their code.

Another way to look at this: is it a "bug" in the ESP-IDF samples that they don't do something as complicated as we're suggesting for Arduino-esp32? Perhaps the ESP-IDF samples are "simple" because they're just samples, so it's ok for them, but not real projects?

BTW, I think esphome retries on all disconnect reasons, but I'm not 100% sure because their code is more complicated.

9 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    WiFi auto reconnect should occur for all disconnect reasons · Issue #7210 · espressif/arduino-esp32