STM32 gotchas
141. On UART IDLE frame insertion upon UART_CR1.TE being set

The STM32 UART has a separate control bit, UART_CR1.TE, to enable the transmitter portion of the UART. The most visible effect of the TE bit is, that it enables/disables output on the UART_Tx pin assigned to this UART through the GPIO matrix. This is visible on the following waveform, where immediately after TE was set, 4 bytes were transmitted (0x55, 0x56, 0x57, 0x58) and then TE was cleared again:

UART waveform around which TE was enabled/disabled.

The gradual "decay" is consequence of leakage on UART_Tx pin together with parasitic capacitances on it. But there is one more notable feature on that waveform: after TE is enabled, Tx pin goes to high, which is the expected idle level, but even if the first byte 0x55 was written to UART's data register UART_TXD immediately after TE being set, there is a one-byte long delay before Tx starts to transmit this byte.

This is a deliberate and documented feature of the STM32 UART and can be used to insert an IDLE frame for example in multi-drop buses such as RS485. IDLE frames allow for end-of-frame detection, and/or to resynchronize UART receivers when a listener is brought up amidst an ongoing transaction.

The following waveform displays the same transaction as above, taken by a logic analyzer. The top waveform is from a GPIO output which was toggled each time UART_ISR.TXE was tested as being set and subsequently USART_TDR was written; the second from top is taken from UART_DE pin i.e. it is a hardware indication of transmission in progress:

Waveform taken by LA with TE enabled/disabled around a packet.

And this is how the same transaction looks like, when TE is never being cleared after it had been set upon UART initialization, far before this transaction. Note, that upon start of transmission the top waveform indicates that there are two writes to USART_TDR immediately after each other: the first byte after being written to the UART Tx buffer, gets immediately transferred to the UART Tx shift register as Tx starts immediately, so the TXE flag indicates that the Tx buffer is empty and the second byte is allowed to be written just after that:

Waveform taken by LA with TE enabled permanently.

It may appear, that in order to insert the IDLE frame, it is enough to toggle TE before packet transmission start, to insert an IDLE frame. However, in the newer version of UARTs which are used since 'F3/'F0, the following warning appeared in the description of TE bit in RM:

In order to generate an idle character, the TE must not be immediately written to 1. In order to ensure the required duration, the software can poll the TEACK bit in the USART_ISR register.

This is quite easy, the whole sequence may look like this:

    USART2->CR1 &= ~USART_CR1_TE;
    while((USART2->ISR & USART_ISR_TEACK) != 0);
    USART2->CR1 |= USART_CR1_TE;

But why is this poll important, why was this requirement not present in older STM32, and what delay does it actually involve?

One notable difference between the older and newer version of UART is, that while the older UART was entirely clocked from the related APB clock, in newer UARTs only the register interface is clocked by the APB clock, UART itself (the "kernel") is clocked from one of several possible clock sources (including APB, too) selected in RCC, usually by a field in RCC_CCIPR register.

While writing TE bit happens on the register i.e. APB-clock side, the actual insertion of IDLE frame is part of the UART logic i.e. it's clocked by the kernel clock, which may be asynchronous to APB and may be also slower than the APB clock. Thus, after writing TE bit, it takes time while the synchronizer between the two clock domains writes this bit into the UART logic on kernel-clock side, and the TEACK bit comes through another synchronizer back to the APB-clock domain, as a confirmation that TE bit has indeed been registered by the kernel-side logic.

Here is a simple program testing this behaviour, written for the STM32L4 Disco board. There are several options to clock the UART, enabled by uncommenting one of the USART2_USE_DIVIDED_APB, USART2_USE_HSI16, USART2_USE_LSE defines, or leaving all of them commented for the default, plain APB clock. The system clock is the default 4MHz MSI, so the HSI16 option tests asynchronous USART kernel clock which is faster than APB clock, and LSE tests USART kernel clock which is significantly slower than APB clock (namely 32.768kHz).

Only with USART2_USE_LSE the plain TE toggle failed, leading to waveform identical to that without TE being touched. That's where the TEACK test as outlined above restored the IDLE frame. Another simple experiment, where a GPIO pin was toggled before and after the TEACK test, indicated a delay between cca 34us and 64us, which is one-to-two periods of the slow LSE (32.768kHz) clock, plus a couple of APB clocks, spent in the synchronizers; and a few more system clocks to actually toggle the GPIO pin.

However, TE toggling is used here only to illustrate the potential problem and its nature and source. In real-world programs, e.g. with RS485, the most appropriate approach is to switch off TE after transmission ends (see commented-out line in USART2_IRQHandler() in the example), and switch it back on just before next transmission is to commence. This leaves ample time while TE is off, during reception. That UART_Tx pin goes threestate after TE being switched off is in this case not a problem, as the RS485 transceiver is not set to drive the bus during that period either.