STM32 gotchas
49.F3 CCM RAM can't be used for DMA, but can be used for code

The 'F3 came out later than the 'F2/'F4 (roughly at the same time as 'F0 and sharing much of the details in peripherals), featuring the same Cortex-M4F core than the 'F4, but at a lower maximum clock of 72MHz, smaller memory sizes and without the "power" peripherals such as OTG, ETH and FSMC. With its relatively rich analog-oriented peripheral set, it is mostly intended for motor control, and similar semi-analog real-time control applications.

Probably as a measure to cut the die size (thus unit price), and maybe also to provide the 'F4 a distinguishing element justifying its price, in 'F3 there's no FLASH access accelerator (called ART by ST, basically a jump-cache). This would mean, that the processor would be hampered by the relatively high number of required 2 wait-states when running at the highest speed1.

Comparison of 'F4 and 'F3 busmatrix, highlighting the path to CCMRAM

As users might want to speed up execution of at least some critical parts of code, there's an option to run code from RAM, with 0 waitstates. 'F3 - similarly to 'F4 - besides SRAM available to all masters on busmatrix, provides also a piece of RAM to which the processor has exclusive access, called CCM RAM (Closely-Coupled Memory). Same as in 'F4, this memory cannot be used for DMA, nor can it be target of bit-banding. But, while in 'F4, CCM RAM cannot be used to run code, in 'F3, the designers wisely addedd access to CCM RAM from both the I and D ports of processor, which means, that CCMRAM in 'F3 can actually be used also for running code.

1. The FLASH line is 64-bits wide, there is a 2-line prefetch, the I-port of processor is 32-bit wide while Cortex-M4 runs exclusively in Thumb mode which means 16-bit wide instuction word (with some instructions being 2-word long). The processor itself has a several words worth of prefetch queue. Also, in normal programs, there's enough loads and stores to relatively slow peripherals, allowing all the prefetch queues to fill up enough so that linear code will mostly run just as fast as if it there were no FLASH waitstates. The problem is, when the code branches (jumps) - whether due to normal program flow or due to interrupts - and also when the program reads literals (constants) from the FLASH. There are techniques to somewhat reduce occurence of both these things, but discussing them is outside the scope of this article.