This post on the STM32 forum contained a link to STM32F429 DMA2D bilinear bitmap resize by Alessandro Rocchegiani, which is a combination of several nice tricks utilizing the DMA2D unit of STM32:
- stretching/contracting image vertically thanks to the fact, that source and destination "pointers" of DMA2D can skip (in ST's parlance, offset) different number of pixels at the end of the line (DMA2D_FGOR.LO/DMA2D_BGOR.LO for source(s); DMA2D_OOR.LO for destination
- stretching/contracting image horizontally, working "manually" column-by-column, in destination columns (i.e. DMA2D moves 1xN rectangles, repeated M times for MxN destination size). As this process is less efficient than the vertical stretch/contraction, both because it is performed column-by-column, and it is to be
- linear pixels color interpolation for each stretch/contraction step utilizing the blending feature of DMA2D (with appropriately set alphas for background/foreground, each sourcing from appropriate row/column), to result in the bilinear resize algorithm after both stretch/contraction has been performed
Basing on the "manually Mx repeated 1xN transfer" trick for MxN images, together with the different skip/offset of source and destination, three additional transformations can be performed: flip (mirroring) around vertical axis [2], flip around one of the diagonals [3], and 90° rotation [4]. Also, flip (mirroring) around horizontal axis [1] can be performed as "manually Nx repeated Nx1 transfer".
Some of these transformations may be combined to a single process (e.g. 90° rotation and resize). [1] and [2] can be used also to scroll images.
The "manual" overhead may result in these transformations being performed using DMA2D to be more expensive than performing them manually. For the non-integer expansions/stretch, the color interpolation using DMA2D's blending is almost certainly more efficient than performing it in software, and especially for larger images even the plain transformations may turn out to be more efficient/faster.