Section Contents – Efficient C for ARM

Efficiency

Optimising

Stuff That’s Slow on ARM

Basic Optimisations

Targeted Optimisations

References

Efficient C for ARM

Loop Unrolling

Small loops can be unrolled.

  • Inner part repeated multiple times.
    • Higher performance.
    • At the expense of increased code size.

When unrolled:

  • Loop counter needs to be updated less often.
  • Fewer branches are executed.

If small enough the loop can be completely unrolled.

  • All test/branch overhead vanishes.

Examples

Before unrolling:

int countbits1(unsigned int N)
{
  int nbits = 0;
  while (N) {
    if (N & 1) nbits++;
    N >>= 1;
  }
  return nbits;
}

9 instructions, 6 cycles/bit.

After unrolling:

int countbits2(unsigned int N)
{
  int nbits = 0;
  while (N) {
    if (N & 1) nbits++;
    if (N & 2) nbits++;
    if (N & 4) nbits++;
    if (N & 8) nbits++;
    N >>= 4;
  }
  return nbits;
}

15 instructions, 3 cycles/bit.

Remarks

See Loop unwinding on Wikipedia.

See Duff’s Device on Wikipedia.

Navigate