EFTON - STM32 gotchas

95.How to achieve consistent RTC readout?

This article is based on findings of EngyCZ, while attempting to achieve consistent subsecond-precision time reading, reported in this thread, and by end of this thread.

This is hopefully the last installment of the RTC-data-readout saga, with previous issues dicussing effects of BYPSHAD=0: on the need to read DR after SSR/TR, and on the exact mechanismus how TR and DR are "locked" when SSR/TR are read.

Those articles also describe to which STM32 families this issue pertains - i.e. not 'F1 and not the newest 'G0/G4 and newer families. This is equivalent to the version 2 of the RTC, as described in AN4759.

Please note, that issues described here are equally relevant to use cases where subseconds are not needed, i.e. the time (TR) and date (DR) readout may be equally mutually inconsistent than SSR/TR readouts. However, as there are nearly a hundred thousand seconds in a day, with equal frequency of readout and readouts not intentionally synchronized to RTC clock, the probability of observing inconsistency between TR/DR is much lower than probability of observing inconsistency between SSR/TR, and in many practical cases negligible¹.

The main aim here is to achieve a consistent set of SSR/TR/DR readouts. As DR changes rarely, we concentrate on SSR/TR readouts. A sequence of normal readouts may look like the following (with the usual synchronous prescaler of 256 i.e. PREDIV_S = 255; and remembering that SSR is a downcounter):

sec:22 subsec: 2
sec:22 subsec: 1
sec:22 subsec: 0
sec:23 subsec: 255
sec:23 subsec: 254
sec:23 subsec: 253

Issues influencing consistent SSR/TR/DR readout from RTC with BYPSHD=0 (i.e. the lock/unlock mechanism in place) include:

erratum-described late-lock - this demonstrates itself for example as the following sequence of readouts:
```
sec:22 subsec: 2
sec:22 subsec: 1
sec:22 subsec: 0
sec:23 subsec: 0
sec:23 subsec: 255
sec:23 subsec: 254
sec:23 subsec: 253
```
as readout of subseconds may happen before the increment, and reading of subseconds may fail to lock the old pre-increment value of seconds exactly as the erratum describes it.
early reading after an unlock - the datasheet requires to wait at least 2 RTC clocks after reading DR in order to "unlock", before next readout happens. The consequence of not fulfilling this requirement is the following sequence (note, that here we show only readouts which changed; in fact, there were multiple readouts with the same value in between those which are shown):
```
sec:22 subsec: 2
sec:22 subsec: 1
sec:22 subsec: 0
sec:22 subsec: 255
sec:23 subsec: 255
sec:23 subsec: 254
sec:23 subsec: 253
```
where subseconds display the value after increment (again, SSR is downcounter) while seconds (TR) are from the old "locked" value (more precisely, TR has not been refreshed). As we've learned already, SSR is never "locked", but reading it "relocks" TR/DR.

A bullet-proof method to ensure consistent readout here consists of using the RSF mechanism: user clears the RSF bit and waits, until hardware sets it, which indicates that the shadow registers have been copied from RTC domain². Of course, users can come up with other methods based purely on observation of time outside of RTC, e.g. using a common timer.
RSF is a write-protected bit. What that means is, that when implementing the RSF-based readout delay as per the previous item, and if we use the RTC write protection (as we should in most of the applications), before writing 0 into RSF, we must first un-protect RTC writes using the key sequence written into RTC_WPR.
Otherwise, zero does not gets written into RSF, and subsequent check for RSF being one is fulfilled immediately, without actually waiting for the shadows refresh to happen, leading ultimately to inconsistent readouts again.

An example bullet-proof code to read out RTC with BYPSHAD = 0 (within code utilizing the RTC write protection, which is the usual case) provided by EngyCZ is³:

    volatile uint32_t drreg1, trreg1, ssrreg1;
    volatile uint32_t drreg2, trreg2, ssrreg2;

    do
    {
      // Unlock RTC registers
      RTC->WPR = 0xCA;
      RTC->WPR = 0x53;
      // Clear the RTC_ISR_RSF
      RTC->ISR &= (uint32_t)~RTC_ISR_RSF;
      // Lock RTC registers
      RTC->WPR = 0xFF;
      // Wait for RSF
      while ((RTC->ISR AND RTC_ISR_RSF) == 0);

      ssrreg1 = RTC->SSR;
      trreg1  = RTC->TR;
      drreg1  = RTC->DR;

      // Unlock RTC registers
      RTC->WPR = 0xCA;
      RTC->WPR = 0x53;
      // Clear the RTC_ISR_RSF
      RTC->ISR &= (uint32_t)~RTC_ISR_RSF;
      // Lock RTC registers
      RTC->WPR = 0xFF;
      // Wait for RSF
      while ((RTC->ISR AND RTC_ISR_RSF) == 0);

      ssrreg2 = RTC->SSR;
      trreg2  = RTC->TR;
      drreg2  = RTC->DR;
    } while (ssrreg1 != ssrreg2);

The basic drawback of this code is, that it inevitably waits for the shadow registers reload (through the RSF mechanism), resulting execution of this code to last for cca 60us in the usual case. Some optimization may be possible, by clearing RSF at the end of the function and then checking it at its beginning (assuming there's no other code working with RTC meantime); nonetheless there's always at least one shadow reload which has to be waited for.

Now, let's look at the case of BYPSHAD=1, i.e. the case, when shadow registers are not used, and processor reads the RTC registers directly.

It can be anticipated, that an RTC increment may happen in the course of reading individual registers, so a single readout won't result in mutually consistent values. However, the surprising fact EngyCZ found (and I've reproduced) was, that not just registers were mutually inconsistent, but that sometimes bits within registers were inconsistent, too! Clearly, in that case, the read from APB clock domain came in the middle of bits flipping in the RTC-clock-domain-registers as consequence of increment.

This, while undoubtedly surprising for many users, is documented in the RM:

Additionally, the value of one of the registers may be incorrect if an RTCCLK edge occurs during the read operation.

The following code by EngyCZ represents a bulletproof implementation of workaround described in the RM:

    void ReadDateTime(void)
    {
      volatile u32 subsec,  timereg,  datereg;
      volatile u32 subsec2, timereg2, datereg2;

      do
      {
        datereg  = RTC->DR;
        timereg  = RTC->TR;
        subsec   = RTC->SSR;

        datereg2 = RTC->DR;
        timereg2 = RTC->TR;
        subsec2  = RTC->SSR;
      } while (datereg != datereg2 ||
               timereg != timereg2 ||
               subsec  != subsec2
              );

      ... process subsec,timereg & datereg ...

A slight optimization may be possible, as it should be enough to read SSR twice - before and after reading DR/TR - and compare only the two SSR readouts to determine, whether an increment happened during reading or not (and loop if it did, i.e. in case of mismatch).

While RM warns that in this case an RTC register read lasts 2 APB cycles rather than 1, this method is still far quicker than the code with BYPSHAD=0, which requires to wait for shadow copy because of the Erratum, as described above.

User Piranha in this thread proposed an intriguing method of RTC readout: read TR/DR into one set of variables, read SSR, then read TR/DR into a second set of variables. If SSR is above half of PREDIV_S accept the second set of variables, else accept the first set (note, that SSR is a downcounter!) Basic requirement for this algorithm to work is, that its total duration (including potential interrupts in between readouts) must be below half a second - but this requirement can usually be fulfilled relatively easily in typical (micro)controller applications. It was proposed, that this algorithm should work correctly for both BYPSHAD settings and even in presence of the Erratum.

The value of this algorithm is mainly in the fact that it avoids the loop, with all its consequences.

Unfortunately, for BYPSHAD=1 the fact, that SSR itself may be read out corrupted means, that this method can't be used. And for BYPSHAD=0, the requirement for the lengthy wait for shadows reload between the two readouts makes this algorithm unatracctive..

1.With entirely random/unsynchronized, and relatively sparse readouts, the "basic" probability of hitting Erratum-described inconsistency is given by the fact that the "not locking" leading to inconsistency happens - according to the Erratum itself - only if the readout occurs one APB cycle before RTC increment. As in usual setting, RTC increment which rolls over seconds (i.e. increment where inconsistency can be detected) happens once per second, and APB frequencies are in tens of MHz; the baseline probability of such inconsistency is one in several tens of millions of readouts.
Assuming average readout frequency in the range of 0.1s, one inconsistency would occur in a few hundreds of days. Also, without explicit rigorous method to find such inconsistency (e.g. if readouts are used only to log events, which are then observed only to find some particular event - which is the usual usage of such readouts), probability of spotting it is near zero. Both these explain, why it took roughly 7 years since introduction of STM32 models with this RTC, until the Erratum appeared.
Introducing intended or unintended synchronicities and/or increasing relative frequency of readouts (which includes decreasing of RTC prescalers to achive higher RTC resolution at the cost of losing real-time-ness; and using unusually low APB frequencies to achieve ultralow power consumption), together with such usage of readout which highlights the inconsistency (e.g. taking differences between successive readouts and processing them) may result in dramatically different observations.

2.Note, that if BYPSHAD=1 (e.g. in code which handles both BYPSHAD cases), the shadow-copy mechanism is not working and RSF will never be set by the hardware.

3.We don't discuss code resilient to hardware failures here. Users who are concerned about possible hardware failures may want to implement timeouts or count limits to the loops in presented code. In particular, the loop waiting for RSF may keep looping if LSE fails. Conversely, the outer loop checking for identical SSR/TR/DR readouts may keep looping if LSE starts to run unexpectedly fast, or the RTC module gets internally damaged and starts to return unexpectedly varying values - both these events appear to be quite unlikely, though.