STM32 gotchas
7. Interrupt called without reason

Sometimes an interrupt service routine (ISR) is called without any apparent reason.

This happens immediately after a "legitimate" interrupt, if the flag in peripheral causing the interrupt is cleared late in the ISR

The root cause is, that it takes time, until a write from processor actually reaches the peripheral (from processor through its write buffer, the busmatrix, the AHB/APB bridge with its write buffer, getting synchronized in that bridge to the APB frequency) and then more time until the change from the peripheral propagates all the way through to NVIC (again through synchronizers from the APB clock domain to system/processor clock domain). This is exacerbated by the fact that peripherals may sit at slow APB buses, with clocks divided significantly down from the system clock.

Consider, for example, the case, where timer has enabled its Update interrupt, and upon timer rollover, the TIMx_SR.UIF gets set by hardware. This propagates through resynchronizers into NVIC, where it triggers the interrupt process. Processor executes the ISR entry stacking and then executes the instructions within the ISR. Let's suppose, that just before ending the ISR, there is an instruction which clears TIMx_SR.UIF, by writing a word which has 0 at this flag's position1. This word, passing from processor, has to acquire bus from the bus-matrix's arbiter, then it hits the AHB/APB bridge where it has to wait until synchronization to APB clock, when it gets to the timer. Then the timer's cleared interrupt output signal has pass through synchronizers into NVIC. This may take a couple of system clocks, but the processor won't stay and it starts to exit the ISR, and as due to the tail-chaining feature it starts to check for new interrupts quite early, it still "sees" the timer interrupt active, so it kicks in again.

The reason why this process is so complicated is, that these chips are not microcontrollers built tightly around a processing core and sharing its clock. Rather, they are systems-on-chip, SoC, equivalent to a whole computer from the past, stitched together more or less loosely from a processor core, peripherals, and an interconnection mesh (data and signalling).

Unfortunately, this problem is difficult to remove entirely.

Symptomatically, it's relatively easy - each interrupt source has to be qualified, i.e. the ISR has always to check, whether the flag causing the interrupt is indeed set in the peripheral and the program has to return from ISR if not. This is what the vast majority of users should do.

However, if one, by early clearing the interrupt source, would like to avoid the unnecessary calls to the interrupt entirely (and possibly avoiding the test, if there's only one single source for the interrupt), there's no much help - exact documentation is non-existent, the best one can do is test/benchmark and hope that he/she won't make any mistake. Note, that adding elements to the program, which potentially cause bus conflicts (e.g. DMA), can increase the latencies between processor and peripheral, thus re-introducing the problem which had possibly been removed in a simpler version of the program. This in most cases is best to be avoided, even at the cost of ineffective ISR, using the above "always verify the interrupt source" method. Lengthy ISR may almost safely guarantee that the verification is not needed; but then a bit of inefficiency usually won't hurt lengthy ISRs...

Using barrier instuctions (DSB or ISB) in the processor won't yield the required safety - the root cause is outside the scope of the processor, and dependent on the particular implementation of the busmatrix and peripherals.

This article in ARM/Keil's support describes the same problem. Note, that they don't acknowledge existence of any other source of delay, and the "second, system level buffer" mentioned there is assumed to be immediately on the processor's port, on the entry to the busmatrix. This is rarely the case, that's why the solution described there (performing any write) is not likely to help in most of the cases.

1. You don't want to do this by RMW i.e. using &= or |=, and you don't want to use bit-banding to do this on Cortex-M3/M4-based mcus either. Here's why.