STM32 gotchas
162.LPUART Rx clocked from LSE corrupts bytes at 9600 Baud - unless LPUART_CR3.UCESM is set

The newer type of UART present in STM32 since 'F0/'F3 on, is a peripheral with quite couple of options when it is configured at chip design stage: it can have the synchronous option, the IrDA/smartcard options, RS485 support etc. Various UARTs in various STM32 families are configured by a mix of these options, as described usually in USART chapter in DS, and/or at the beginning of UART chapter in RM.

For purposes of this discussion, most notable is the option to have two separate clock domains; the APB-clock domain through which its registers are accessed from the rest of the mcu; and the kernel-clock (in some materials called independent-clock) domain, which contains the actual transmitter/receiver and its surrounding logic. Kernel-clock source is selected in RCC, usually amongst APB/HSI/LSE/LSI; again, the particular details depend on family.

Nonetheless, dual-clock option allows the UART to keep running during low-power modes. It is even able to detect startbit's falling (=leading) edge asynchronously, i.e. it needs no kernel clock while waiting for the startbit, reducing further power consumption. Depending on how its kernel clock is set in RCC, it then either sets a request signal to RCC to start HSI, or un-gates the continuously running LSE, to enable reception of the incoming UART frame.

The Low-Power UART (LPUART) present in newer STM32 families is almost identical to the "general-purpose" UART, and probably are only a particular configuration variant of it. This is underpinned by the fact that it does not have separate register struct nor register-bits/bitfields symbols in the CMSIS-mandated device headers, and the USART-related struct/symbols are used for LPUART, too.

While LPUART is usually able to wake up the chip from a deeper sleep setting than UART (e.g. Stop 2 in 'L476), this is probably only due to power supply distribution choices in given STM32 model.

The one notable difference in LPUART in contrast to UART is, that LPUART does not employ the 16x/8x oversampling, only a single sample in what its baudrate generator marks as middle of the bit. Also, LPUART baudrate generatorhas 8 fractional bits rather than 4 as has UART. This allows LPUART to run at 9600 Baud with the usual 32.768kHz LSE. However, this means in average cca 3.4 clocks per bit, and while the fractional baudrate generator ensures quite precise timing in the long run, individual bits are sampled by either 3 or 4 clocks. This can be seen also on the LPUARTs Tx jitter under this combination (blue waveform is Tx of 0x55, green is LSE)1:

LPUART at 9600 Baud clocked from 32.768Hz LSE has 3 or 4 clocks per bit.

The need to generate the Rx sampling point from 3 or 4 clocks results in significant jitter in exact placement of the sampling point, too. In other words, relatively small error in timing of incoming bits (caused e.g. by asymmetry in skew of high-to-low and low-to-high transitions in some transceivers, or simply by imprecise transmitter clocking) may result in incorrectly received frames.

However, even with perfect input signal, LPUART Rx at 9600 Baud clocked from LSE may be prone to errors in reception in the low-power modes, especially in first byte(s) of a packet (as reported here and here). The reason probably is, that ungating LSE after the asynchronous detection of startbit falling edge requires some time (and maybe some synchronization), and that possibly delays the sampling points by one LSE period, worsening the placement of sampling points. In this case, the RM requires to have the LSE clock into LPUART kernel ungated all the time, by setting the LPUART_CR3.UCESM bit.

This increases somewhat consumption ('L476 DS gives 2.0μA/MHz, i.e. at 32.768kHz the consumption increase is around 65nA), and that's the price for reliable Rx under these conditions2.

Even if this requirement is explicitly documented in the LPTIM chapter of RM, it is easy to be overlooked and may result in unpleasant surprise.

1.Tested on 32L476GDISCOVERY (which was inexplicably discontinued by ST), source code here.

2.Similarly to this combination, in case of UART/LPUART kernel clock being HSI, in low-power mode, the HSI startup time may cause startbit to be missed. In 'L476, worst-case HSI16 startup/stabilization is 1.2μs+5μs; comparing that with bit duration (cca 100μs at 9600Baud, cca 8.6μs at 115200Baud) indicates, that higher baudrates would be unreliably received under these conditions, unless HSI is kept running all the time.