Efficient C for ARM
Loop Unrolling
Examples
Before unrolling:
int countbits1(unsigned int N) { int nbits = 0; while (N) { if (N & 1) nbits++; N >>= 1; } return nbits; }
9 instructions, 6 cycles/bit.
After unrolling:
int countbits2(unsigned int N) { int nbits = 0; while (N) { if (N & 1) nbits++; if (N & 2) nbits++; if (N & 4) nbits++; if (N & 8) nbits++; N >>= 4; } return nbits; }
15 instructions, 3 cycles/bit.


See Loop unwinding on Wikipedia.
See Duff’s Device on Wikipedia.