EFTON - STM32 gotchas

185. What's all this kernel clock stuff, anyhow?

By now it's a well-known fact, that individual peripherals in 32-bit mcus such as STM32 have individually controlled clocks. Most STM32 peripherals - especially in older STM32 families - have only a single clock, which is equal to the clock of APB bus¹, to which given peripheral is connected.

However, some peripherals have two clock domains. One domain contains mostly the registers, through which the peripheral is controlled from the rest of the mcu (mostly processor, but also DMAs), and this domain is clocked from the APB clock. The second domain is the "execution" portion of the peripheral, performing the core functionality of it (e.g. in SPI the shift register and its associated clock and control circuitry) - this portion in documentation of newer STM32 peripherals is called kernel, and thus its clock is called kernel clock.

There are two options for the kernel clock

it can be synchronous with the APB clock, either being derived from it, or being derived from a common source clock. Example for the former is the ADC clock in some of its settings in most STM32. The most notable example for the latter is the Timer clock, which is derived from the same AHB clock as APB clock, but in some cases can be divided by a smaller factor thus run faster than the APB clock.
or it can be asynchronous to APB clock, either from an internal source (e.g. LSI in case of IWDG/RTC/LPTIM, LSE in case of RTC/LPUART, a different PLL than the "main" e.g. for I2S/SAI/USB), or from an external source through a dedicated pin (e.g. I2S_EXTCLK/SAI_EXTCLK for I2S/SAI, externally clocked LPTIM, TX/RX clocks for ETH, ULPI clock for OTG_HS used with external PHY).

While separate kernel clocks allow greater flexibility in choosing optimal operational frequencies for various portions of the application (e.g. the mcu's system clock may not be necessarily tied to I2S/digital audio clock), it also imposes requirements on the peripheral's designers. Transferring signals and data between the two domains may require synchronization in order to present a coherent set of signals to the other side. Configuration registers are relatively "easy": usually there is one "enable" bit, and configuration registers are required to be all set before "enable" bit is set, so that gives time for these signals to propagate to kernel side before "enable" bit starts the kernel side's activity² ³. Reading out single bits (status flags) is also relatively simple, as they are mostly mutually independent, so they can be simply transferred from kernel to APB side. However, reading out numerical values may get tricky and requires precise synchronization with APB clock, otherwise the read out value may get incoherent. Even more tricky is a readout of multi-word values, where either a lock/release mechanism has to be employed (such as in RTC_SSR/TR/DR with BYPSHAD=0), or the user may be required to perform a read-twice-until-identical-values procedure (e.g. in reading out the LPTIM_CNT register).

From programmer's point of view, synchronous kernel clocks are usually relatively "harmless", i.e. they usually have no or very little requirements on the programmer and have small to negligible impact on the resulting application. For example, triggering ADC clocked from synchronous APB/n clock from a timer results in precise synchronization between timer and ADC, whereas in case of asynchronously clocked ADC there would be an uncertainty (jitter) in the trigger caused by resynchronization.

Asynchronous kernel clocks are much more tricky:

they may impose requirements on minimum APB clock (as is in case of ETH and USB; but also RTC requires APB be at least 7x the RTC clock)
delays introduced by resynchronizations have impact on propagation time of some internal signals, with potentially surprising effects, e.g. in the interrupt-flag-clear-to-NVIC case
there may be timing requirements on individual setup steps (e.g. in newer I2C, switching off and on the PE enable bit requires a 3-APB delay)
reading or consecutive writing registers in the kernel clock domain may impose automatic delays (e.g. in LTDC accessing LCD-domain registers will stall the APB bus for 6-7 APB clocks plus 5 LCD clocks)
readout may require the read-twice-until-identical-values procedure
as data transfer between clock domains is more complex, errors can be easily made in the hardware's implementation, such as the RTC inclomplete-lock erratum or ETH's Successive write operations to the same register might not be fully taken into account erratum

As the number of transistors in STM32 increases, so does also the complexity of peripherals. In newer STM32, more and more peripherals have separate APB and kernel clocks, so more attention has to be given to effects resulting from this separation, especially if asynchronous clock is selected to be used for the kernel.

1. ... or AHB bus (e.g. in case of ETH or OTG_HS). For brevity, we will use "APB" throughout this chapter, even if in some cases it's AHB.

2. Usually, writing such peripheral while "enable" bit is set, is not prevented by hardware, and results in undefined behaviour of the peripheral. It's the responsibility of user to read carefully the documentation and act accordingly.

3. Nomenclature across peripherals is not unified, and sometimes "enable" bit does not start the kernel's functionality, but gates the kernel clock altogether, for example in LPTIM. In this case, writing to registers of kernel clock domain is not possible until such "enable" is set.