STM32 gotchas
10.Strange behaviour after increasing system clock frequency

Program in STM32 usually runs from the built-in FLASH memory. As all FLASH memories, this one is relatively slow, allowing to read out data at around 25MHz. STM32 after reset run from an internal RC oscillator (HSI or MSI) at a frequency lower than that, so they safely run directly from the FLASH. But users usually want to utilize the higher possible system clocks, ranging from 48MHz in 'F0 through 150-200MHz in 'F4/'F7 all the way to 480MHz in 'H7. How could possibly the slow FLASH feed instructions at this rate?

There are several techniques employed in the various STM32 families to allow this - prefetch, jumpcache a.k.a. ART accelerator, conventional cache - which are aimed to mitigate the "slowness" of FLASH. We are not going to discuss their respective merits and drawbacks here, because they won't change the fact that the FLASH is physically slower than the available system clock.

The FLASH controller thus has to be "told", how faster the system clock is. This ratio is commonly called FLASH latency.

There is basically a divider of the system clock towards the FLASH clock, and the user has to set it before increasing the system clock frequency, otherwise the FLASH hardware won't be able to deliver the subsequent instructions after the frequency switch. As a consequence, the FLASH subsystem imposes waitstates to the processor, whenever the processor attempts to fetch an instruction or read data, which is not already present in the prefetch registers or caches.

The latency divider ratio is usually set in LATENCY field of FLASH_ACR register. There may be a prescribed procedure to follow for both increase and decrease of system clock; read your STM32 Reference Manual's FLASH chapter.