Skip to content

Document Wire timeout API on website #895

Open
@matthijskooijman

Description

@matthijskooijman

Recently, some new timeout API methods were added to the AVR Wire library (see arduino/ArduinoCore-avr#42), which should be documented. Given there is no repository for the library reference, I'm going to report this here. While looking at the Wire docs at https://www.arduino.cc/en/Reference/Wire I noticed that the end() method is also not documented yet.

Please find a proposal for documentation below, comments welcome. I've tried to match the formatting (heading levels etc.) to the existing doc pages, but it's likely that this still needs some handwork to integrate. Also, there's a fair chance that I've written this in too much detail or technically too complex for the novice audience, so feedback on that aspect is also welcome.

Wire

Just above the "Note" section, add:

Recent versions of the Wire library can use timeouts to prevent a lockup in the face of certain problems on the bus, but this is not enabled by default (yet) in current versions. It is recommended to always enable these timeouts when using the Wire library. See the Wire.setWireTimeout function for more details.

Wire.end()

Description

Disable the Wire library, reversing the effect of Wire.begin(). To use the Wire library again after this, call Wire.begin() again.

Syntax

Wire.end()

Parameters

None.

Returns

None.

Portability Notes

This function was not available in the original version of the Wire library and might still not be available on all platforms. Code that needs to be portable across platforms and versions can use the WIRE_HAS_END macro, which is only defined when Wire.end() is available.

Wire.endTransmission()

Under "Returns", add:

  • 5:timeout

Wire.setWireTimeout()

Description

Sets the timeout for Wire transmissions in master mode.

On platforms that support it, these timeouts can help handle unexpected situations on the Wire bus, such as another device or a short-circuit that keeps the bus blocked indefinitely, or noise that looks like a start condition, making it look there is another master active that keeps the bus claimed.

Note that these timeouts are almost always an indication of an underlying problem, such as misbehaving devices, noise, insufficient shielding, or other electrical problems. These timeouts will prevent your sketch from locking up, but not solve these problems. In such situations there will often (also) be data corruption which doesn't result in a timeout or other error and remains undetected. So when a timeout happens, it is likely that some data previously read or written is also corrupted. Additional measures might be needed to more reliably detect such issues (e.g. checksums or reading back written values) and recover from them (e.g. full system reset). This timeout and such additional measures should be seen as a last line of defence, when possible the underlying cause should be fixed instead.

Syntax

Wire.setWireTimeout(timeout, reset_on_timeout)
Wire.setWireTimeout()

Parameters

timeout a timeout: timeout in microseconds, if zero then timeout checking is disabled
reset_on_timeout: if true then Wire hardware will be automatically reset on timeout

When this function is called without parameters, a default timeout is configured that should be sufficient to prevent lockups in a typical single-master configuration.

Returns

None.

Example Code

#include <Wire.h>

void setup() {
  Wire.begin(); // join i2c bus (address optional for master)
  #if defined(WIRE_HAS_TIMEOUT)
    Wire.setWireTimeout(3000 /* us */, true /* reset_on_timeout */);
  #endif
}

byte x = 0;

void loop() {
  /* First, send a command to the other device */
  Wire.beginTransmission(8); // transmit to device arduino/Arduino#8
  Wire.write(123);           // send command
  byte error = Wire.endTransmission(); // run transaction
  if (error) {
    Serial.println("Error occured when writing");
    if (error == 5)
      Serial.println("It was a timeout");
  }

  delay(100);

  /* Then, read the result */
  #if defined(WIRE_HAS_TIMEOUT)
  Wire.clearWireTimeoutFlag();
  #endif
  byte len = Wire.requestFrom(8, 1); // request 1 byte from device arduino/Arduino#8
  if (len == 0) {
    Serial.println("Error occured when reading");
    #if defined(WIRE_HAS_TIMEOUT)
    if (Wire.getWireTimeoutFlag())
      Serial.println("It was a timeout");
    #endif
  }

  delay(100);
}

Notes and Warnings

How this timeout is implemented might vary between different platforms, but typically a timeout condition is triggered when waiting for (some part of) the transaction to complete (e.g. waiting for the bus to become available again, waiting for an ACK bit, or maybe waiting for the entire transaction to be completed).

When such a timeout condition occurs, the transaction is aborted and endTransmission() or requestFrom() will return an error code or zero bytes respectively. While this will not resolve the bus problem by itself (i.e. it does not remove a short-circuit), it will at least prevent blocking potentially indefinitely and allow your software to detect and maybe solve this condition.

If reset_on_timeout was set to true and the platform supports this, the Wire hardware is also reset, which can help to clear any incorrect state inside the Wire hardware module. For example, on the AVR platform, this can be required to restart communications after a noise-induced timeout.

When a timeout is triggered, a flag is set that can be queried with getWireTimeoutFlag() and must be cleared manually using clearWireTimeoutFlag() (and is also cleared when setWireTimeout() is called).

Note that this timeout can also trigger while waiting for clock stretching or waiting for a second master to complete its transaction. So make sure to adapt the timeout to accomodate for those cases if needed. A typical timeout would be 25ms (which is the maximum clock stretching allowed by the SMBus protocol), but (much) shorter values will usually also work.

Portability Notes

This function was not available in the original version of the Wire library and might still not be available on all platforms. Code that needs to be portable across platforms and versions can use the WIRE_HAS_TIMEOUT macro, which is only defined when Wire.setWireTimeout(), Wire.getWireTimeoutFlag() and Wire.clearWireTimeout() are all available.

When this timeout feature was introduced on the AVR platform, it was initially kept disabled by default for compatibility, expecting it to become enabled at a later point. This means the default value of the timeout can vary between (versions of) platforms. The default timeout settings are available from the WIRE_DEFAULT_TIMEOUT and WIRE_DEFAULT_RESET_WITH_TIMEOUT macro.

If you require the timeout to be disabled, it is recommended you disable it by default using setWireTimeout(0), even though that is currently the default.

See Also

  • Wire.getWireTimeoutFlag()
  • Wire.clearWireTimeoutFlag()
  • Wire.endTransmission()
  • Wire.requestFrom()

Wire.getWireTimeoutFlag()

Description

Checks whether a timeout has occured since the last time the flag was cleared.

This flag is set is set whenever a timeout occurs and cleared when Wire.clearWireTimeoutFlag() is called, or when the timeout is changed using Wire.setWireTimeout().

Timeouts might not be enabled by default. See the documentation for Wire.setWireTimeout() for more information on how to configure timeouts and how they work.

Syntax

Wire.getWireTimeoutFlag()

Parameters

None.

Returns

bool: The current value of the flag

Portability Notes

This function was not available in the original version of the Wire library and might still not be available on all platforms. Code that needs to be portable across platforms and versions can use the WIRE_HAS_TIMEOUT macro, which is only defined when Wire.setWireTimeout(), Wire.getWireTimeoutFlag() and Wire.clearWireTimeout() are all available.

See Also

  • Wire.clearWireTimeoutFlag()
  • Wire.setWireTimeout()

Wire.clearWireTimeoutFlag()

Description

Clear the timeout flag.

Timeouts might not be enabled by default. See the documentation for Wire.setWireTimeout() for more information on how to configure timeouts and how they work.

Syntax

Wire.clearTimeout()

Parameters

None.

Returns

None.

Portability Notes

This function was not available in the original version of the Wire library and might still not be available on all platforms. Code that needs to be portable across platforms and versions can use the WIRE_HAS_TIMEOUT macro, which is only defined when Wire.setWireTimeout(), Wire.getWireTimeoutFlag() and Wire.clearWireTimeout() are all available.

See Also

  • Wire.getWireTimeoutFlag()
  • Wire.setWireTimeout()

Activity

bperrybap

bperrybap commented on Sep 26, 2020

@bperrybap

What about WIRE_HAS_END macro? It is used to indicate the existence of end()

Also, it seems like the macros WIRE_HAS_END should show up somewhere on the end() web page
Llkewise it seems like WIRE_HAS_TIMEOUT should show up somewhere on the pages for
setTimeout(), Wire.getWireTimeoutFlag(), Wire.clearWireTimeoutFlag() i.e. all the pages that are for functions that exist when the macro exists.

Perhaps add a section called "NOTES" or "OTHER INFORMATION" to the documentation web pages for the information about the macro.

matthijskooijman

matthijskooijman commented on Sep 28, 2020

@matthijskooijman
CollaboratorAuthor

What about WIRE_HAS_END macro? It is used to indicate the existence of end()

Thanks, forgot about that one. I added it now.

Likewise it seems like WIRE_HAS_TIMEOUT should show up somewhere on the pages for

Good point. It was already on the setWireTimeout() page above, but I now added it to all three (and changed the wording on getWireTimeout() to be more complete).

Perhaps add a section called "NOTES" or "OTHER INFORMATION" to the documentation web pages for the information about the macro.

I added a "Portability Notes" section now, which seems more specific.

bperrybap

bperrybap commented on Sep 28, 2020

@bperrybap

Sounds great.

freddyrios

freddyrios commented on Oct 2, 2020

@freddyrios

Got here after reading arduino/ArduinoCore-avr#42 and related issues/pr.

It would be great to also add a note about it directly to the Wire page too https://www.arduino.cc/en/Reference/Wire. Having it visible like that can potentially safe a lot of pain.

Also should there be some extra warning or links about the concerns mentioned here (if confirmed to be valid)? arduino/ArduinoCore-avr#42 (comment). The claim seems to be that is a good idea to take timeout cases as a warning sign of hardware issues in some cases, that if left alone can be lead to other issues.

bperrybap

bperrybap commented on Oct 2, 2020

@bperrybap

@freddyrios,
The concerns are valid. If there are timeout errors and there are not multi masters on the bus, then there is some sort of h/w issue causing bit errors on the bus. The effects of those bit errors are unpredictable.
Not only that, but most bit errors on the bus can not be detected. The only bit errors that can be detected are those that happen to occur during the address, or the status portion of the transfer because they cause some sort of issue like trying to address a non existent slave, or confuse the master into thinking that there is another master on the bus.

But I do agree with you that it would be a good idea to have some sort of note/information about the potential causes of timeouts and seriousness of i2c bus signal corruption.

matthijskooijman

matthijskooijman commented on Oct 3, 2020

@matthijskooijman
CollaboratorAuthor

It would be great to also add a note about it directly to the Wire page too https://www.arduino.cc/en/Reference/Wire. Having it visible like that can potentially safe a lot of pain.

Good suggestion, I added a small paragraph to the first post.

Also should there be some extra warning or links about the concerns mentioned here (if confirmed to be valid)?

There was already this bit:

When such a timeout condition occurs, the transaction is aborted and endTransmission() or requestFrom() will return an error code or zero bytes respectively. While this will not resolve the bus problem by itself (i.e. it does not remove a short-circuit), it will at least prevent blocking potentially indefinitely and allow your software to detect and maybe solve this condition.

But, you make a good point, so I added a more specific warning to the main description.

How do these look?

bperrybap

bperrybap commented on Oct 3, 2020

@bperrybap

IMO, there needs to be a bit more more about timeouts.
The language around timeouts does not seem strong enough. It does not mention some of the other possible things that could and often do happen and can go undetected when there are issues like misbehaving slaves and/or bus noise.
Things like data corruption or writing/reading to the wrong slave.
And that this data corruption cause the need to re-initialize or restart the slave.
So getting a timeout likely means that a slave will have to be full re-initialized since there is now way to know what data/commands the slave has received.

In the real-world cases I've seen, there is often quite a bit of data corruption before a lockup (now timeout) would happen.

The reason I think additional text is needed, is that in issue thread about adding the timeout, it seems like several of the posters seem to be of the incorrect assumption that just having a timeout in the Wire library and retries on top of that either in a higher level library that uses Wire, or in the sketch, can fix things. But that is definitely not the case.
For example, on i2c LCDs, you can end up with lots of garbage on the display from data corruption before the lockup/timeout.
So even if you did do a retry on an operation when there was a timeout, it would likely not keep the display from getting corrupted.
In some cases like a hd44780 LCD with a PCF8574 based backpack, the i2c data corruption can cause the host and the LCD to lose nibble sync from garbage commands. When out of nibble sync, the display will continue to be corrupted as the host sends more data/commands since it is being misinterpreted.
The only way to get back into nibble sync is to start the full initialization over.

freddyrios

freddyrios commented on Oct 5, 2020

@freddyrios

The way it reads now for me is there is a note in the Wire page leads you to the method doc that has this fairly up in the text:

Note that such a timeout is almost always an indication of an underlying problem, such as misbehaving devices, bus noise or other electrical problems. Relying on this timeout for proper operation is not recommended, it is better to fix the underlying problem instead.

Sounds good to me, as very early it is talking about the actual problems underneath. Of course the more info to help us outside avoids the pitfalls the better. Links to any relevant topics that points people in the right direction(s) to help solve the real root causes would be incredibly helpful (and almost certainly get people to give it a shot).

bperrybap

bperrybap commented on Oct 5, 2020

@bperrybap

My issue is with

Relying on this timeout for proper operation is not recommended

It hints that even though there are may be timeouts, that the system is still capable of properly functioning.
There will never be proper operation when timeouts are occurring due to signal corruption. When there is enough signal corruption to cause a timeout, there is plenty more that is also causing silent/undetected data corruption.

So, IMO, the message could/should have a bit stronger message than just saying it is "not recommended".
It should indicate that the presence of timeouts very likely indicates the presence of other issues which cannot be detected and are silently occurring like data corruption.

matthijskooijman

matthijskooijman commented on Oct 8, 2020

@matthijskooijman
CollaboratorAuthor

Relying on this timeout for proper operation is not recommended

The way I meant this is that if there are bus lockups or other problems, that you should not just enable timeouts and except it to make things run properly.

But we can make it stronger, how about this?

Note that such a timeout is almost always an indication of an underlying problem, such as misbehaving devices, bus noise or other electrical problems which. These timeouts will prevent your sketch from locking up, but not solve these problems. In addition to locking the bus, there might also be data corruption. To ensure reliable operation, whenever timeouts occur, make sure to find and fix the underlying problem, rather than assuming that these timeouts will fix those problems.

Of course the more info to help us outside avoids the pitfalls the better. Links to any relevant topics that points people in the right direction(s) to help solve the real root causes would be incredibly helpful (and almost certainly get people to give it a shot).

I don't want to go into too much detail here, also since this page is about the Wire library in general, not necessarily specific to the AVR hardware. But external links could probably be added, if anyone knows of appropriate ones.

bperrybap

bperrybap commented on Oct 8, 2020

@bperrybap

I like the added detail. Here is an additional tweak:

Note that such a timeout is almost always an indication of an underlying problem, such as misbehaving devices, bus noise or other electrical problems which. These timeouts will prevent your sketch from locking up, but not solve these problems. In addition to locking the bus, there may also be undetectable data corruption or accesses to slave addresses other than the ones specified. To ensure reliable operation, whenever timeouts occur, make sure to find and fix the underlying problem, rather than assuming that these timeouts will fix those problems.

Not sure how much, if any, information to provide on how to identify/solve potential issues.
It is kind of can of worms and it is a pretty complex subject getting into analog electrical issues.

Although It might be worth mentioning some of the more common issues, such as poor wiring/connections, or attempting to use "long" wires (I know, we would need to try to somehow describe what "long" means)
Since posts related to these issues do come up from time to time on the forum.

I think the most important thing is the information about timeouts that hopefully gets the message out that Wire library lockups or timeouts are an indication of other h/w issues that cannot be fully resolved in s/w in the sketch or the library.

ermtl

ermtl commented on Nov 21, 2020

@ermtl

I2C using both a strong low state and a weak high state with a pullup resistor, it's very susceptible to all kinds of electrical noises that are not the sign of h/w issues. Position your device close to a dimmer controlled brushed motor and see the error count increase, or just be in a thunderstorm. Shielding can be effective but has it's limits.
A way to approach the problem is to consider I2C as a transmission protocol that might have errors and see what could be done about it. Designers of the OneWire protocol that's electrically similar with a pullup and a long line added CRC checks to all communications, but nearly all I2C circuits don't have that.
Fortunately, the protocol itself includes a 'start' condition that resets the state machine of any I2C compliant device, making each transaction independent (errors don't propagate) and a lot can be done in software, the exact details depend on the chip used, but here are a few hints:

  • When a timeout occurs, read the value again (if in an interrupt, using the previous value and increasing an error counter can be an alternative)
  • Have a min value, max value, default value, max change value (according the the expected range for the value being measured) and previous measurement value.
    If the sensor value fails the plausibility test, replace it with either the previous value, the default value or an average of the previous and default value. Have a counter for such occurrences. This can handle both I2C errors and sensor malfunctions
  • oversample and discard. Take 4 samples or more, clear the highest and lowest value(s) and take the average of the (2 or more) remaining values. This is resistant to high error rates (and to error bursts if more than one high/one low value is discarded) and mitigates the uncaught errors by averaging them with good ones while decreasing measurement noise if data is from a sensor).
  • count errors and go to a failsafe mode when too many happen (you need to decrease error count based on time, maybe 1 error every 10 seconds)
  • if the I2C is used for communication between Arduinos , add CRC at the end of each transmission
  • if the I2C is used to store values in an external EEPROM, add CRC to the stored values to check data integrity. This can also detect EEPROM wear.
  • In high reliability applications, use redundant sensors / write data twice (or 3 times with majority vote)
  • To prevent long delays due to timeout, set the timeout value to the shortest possible before communicating with each chip. Some chips always respond fast, so any delay should trigger the timeout ASAP
  • Use repeated timeouts to warn the user about a disconnected / poorly connected sensor

As a side note, the problem with hd44780 LCD with a PCF8574 backpack is tricky and very specific. It occurs because this display is 'write only' and there is no way to read from it and know if the nibbles are in the correct order or inverted causing display corruption. This is a very rare case as all other devices I can think of have some ability to be read that would allow such a problem to be detected. From a reliability point of view, the hd44780 LCD + PCF8574 combination is not a good solution. If it needs to be used, a periodic display reset sequence (the only way to reorder the nibbles) should be implemented, the delay being a compromise between how long a garbled display can be tolerated and the short visible glitch that's visible each time the display is reset.

matthijskooijman

matthijskooijman commented on Nov 21, 2020

@matthijskooijman
CollaboratorAuthor

@ermtl, most of what you write seems like good advice, though I'm not entirely sure what you intend with this? I'm not sure if this much detail is appropriate for this reference documentation page, or did you have something else in mind?

Concerning timeouts, two additional remarks:

  • In normal communication, timeouts are not really needed, since a chip will respond directly and if it does not (e.g. becaue it is disconnected), you'll know directly because no ACK is received. The only exception is when the slave does clock stretching, in which case it could actively (and potentially indefinitely) keep the master waiting, so the implemented timeout is useful in that case.
  • The other main case that these new timeouts guard against, is somewhat AVR-specific, where the AVR Wire hardware locks up because it thinks there is a secondary master on the bus (typically due to noise that looks like a start condition). It's very likely that other I²C hardware implements this more elegantly, with more feedback.

29 remaining items

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Document Wire timeout API on website · Issue #895 · arduino/reference-en