ARM instructions typically operate on three registers. With 32 registers (including the special zero register), you need 5 bits per register to give the register addresses. That would only leave 1 bit for the opcode in a 16-bit encoding, so 64-bit ARM programs will use a 32-bit instruction encoding.
32-bit ARM programs typically use one of two encodings: ARM or Thumb-2. ARM is also a 32-bit encoding. Thumb-2 is a variable-length encoding, with the most common instructions being encoded in 16 bits.
This means that 64-bit code is likely to be larger, unless there are some significant improvements in the instruction set. Unfortunately, I expect the opposite. The 64-bit architecture is removing two of the most useful features of the instruction set for compressing code:
- Predicated instructions. Most ARM instructions include a condition field mask, and they'll execute only when these conditions are met. This means that ARM code needs fewer branch instructions, and for a long time ARM chips could achieve good performance without a branch predictor. Modern ARM chips include branch prediction, so this feature is somewhat less useful, and its cost in terms of complexity (which turns into power consumption) was deemed too high to justify it.
- Load and store multiple instructions. These instructions allow loading and storing an arbitrary subset of the register setgreat for compilers and assembly programmers. A function prologue and epilogue just need to contain one instruction store each that any callee-save registers that the function modifies, reloading them later. Similarly, a call instruction just needs to be bracketed by a single instruction on each side to preserve any of the caller-save registers it cares about.
Although these instructions are useful (and great for producing dense code), they're fantastically complex to implement. With the enlarged register set, there should be less of a requirement to save and load registers, so hopefully these instructions aren't needed as much.
Predication is not completely removed; some instructions still retain predicated modes. The load and store multiple instructions are replaced by instructions for loading and storing sequential pairs of registers. This should somewhat reduce the number of instructions required for saving and restoring registers, while simplifying the decoder complexity.