STM32 gotchas
22.Writing one byte to SPI transmits two bytes

While SPI is a relatively simple communication interface, it is also the one which offers the highest transfer speed amongst the "traditional" communication interfaces (SPI, I2C, UART). It is not uncommon to see users pushing SPI clock rates to tens of MHz, making it a significant portion of the overall processor power to keep feeding the SPI with data continuously.

This is more pronounced, if the SPI is set for some reason to 8-bit1 frames. The rest of this article assumes SPI to be set in this way.

ST's original plan was to use the DMA to help with this, but as price per transistor got lower with improving silicon technology, starting with the 'F0, the SPI has a 4-byte FIFO built in for both Tx and Rx. With that, another opportunity for optimization has been open: as it's not efficient to transfer individual bytes through a 32-bit bus, SPI set to 8-bit frames can now accept/supply 2 bytes at once, when the SPI_DR is accessed as 16-bit register (even if the processor is 32-bit, most of the STM32 peripherals are in fact 16-bit - this may have historical reasons, but this also may help reduce complexity and silicon area).

ST calls this feature data packing and it's well documented in the SPI chapter of respective Reference Manuals.

However, this feature for users accustomed to the older SPI versions resulted in a surprising behaviour, where one write resulted in transmitting two bytes (with the second byte being zero). The reason is, that in the device headers, the data register is defined as being 16-bit, so a simple

SPIx->DR = data;
line is compiled into a STRH instruction, i.e. a 16-bit write. Writing a single byte in C has to be accomplished by type punning, i.e.
*(volatile uint8_t *)&SPI->DR = data;

Consequence of data packing on data reading (i.e. Rx) are less conspicuous, but still can lead to confusing errors. Using SPI with DMA is less of a problem as users unaware of the data packing feature naturally implement the "correct" data access by setting DMA_CR/CCR.PSIZE to byte.

The lesson here is, that even if there's a great deal of compatibility in the STM32 peripherals among the families, and the newer versions tend to be reasonably backwards compatible, having the new features switched off by default; yet this is no hard rule and it pays off to read and re-read the manual and follow the instructions set there.

1.Actually, the frame data width in the newer SPI with FIFO can be set to any bit number between 4 and 16. Data packing applies to any bit number between 4 and 8.