Higher-end ARM Cortex-M cores, such as M4 and M7, contain a floating-point unit (FPU) as a coprocessor 1, allowing fast floating-unit calculations. The FPU is by default switched off to preserve power, as it is a relatively large circuit.
The FPU implements operations with IEEE-754 single-precision floating point numbers (SP, 32-bit, in C represented as float
type), and in selected high-end models, also double-precision numbers (DP, 64-bit, double
).
The FPU contains a dedicated set of 32 SP registers, the same set can be also used as register‑pairs, 16 DP registers.
Compilers (commanded by user to use the FPU using dedicated command-line switches) may decide to push some of these registers to the stack upon function entry, in the function prologue, for example if there are local non-static float/double variables defined in the function. As main() is usually treated as a normal C function 2, it may have such a prologue, too. If FPU is switched off when FPU registers are pushed, an FPU-Fault is thrown, usually escalated to HardFault.
So, if FPU is to be used in a given program, FPU has to be switched on in the startup code, before calling main() 3.