Home > Articles > Hardware > Upgrading & Repairing

A Look at the 64-Bit ARMv8 Architecture

  • Print
  • + Share This
ARM is the most popular 32-bit architecture in the world, but has only just begun the transition to 64-bit. David Chisnall discusses what we can expect from the upcoming 64-bit version of the architecture.
Like this article? We recommend

ARM has been the best-selling 32-bit architecture for quite a while. That was an impressive achievement back when 64-bit CPUs were confined to high-end workstations and servers, but now that even cheap laptops come with them, it sounds a bit more hollow.

In most of the places where you find ARM chips, 64 bits isn't very useful. A mobile phone or a tablet, for example, doesn't usually run anything that would benefit from a 64-bit address space. The situation with ARM is quite different from x86, where the 64-bit transition brought the opportunity to clean up a lot of warts in the architecture. It's fairly common to see an x86 program run 10[nd]20% faster when compiled for x86-64, because the 64-bit mode also brings with it advantages like these:

  • Program-counter relative addressing—important for position-independent code in libraries
  • More registers—reduce the need for register-to-stack copies
  • Guarantee of SSE—so the compiler doesn't have to emit x87 code for floating point

On architectures like MIPS, PowerPC, or SPARC, compiling in 64-bit mode can often make things slower, because the only significant difference is that you're using twice as much cache space for storing pointers. Unless you're using a lot of 64-bit integer arithmetic, there's little advantage.

However, some markets that interest ARM would benefit from a 64-bit architecture. One of the biggest growth areas at the moment is in very low-power servers. ARM's current offering here is the Cortex A8, which supports LPAE. This allows the operating system to use a 40-bit physical address space (up to 1TB), but only permits applications to use a 32-bit virtual address space. For things like databases and even web servers that want to cache as much as possible in memory, this situation is less than ideal.

Recently, ARM published the initial details of the ARMv8 architecture, which supports a 64-bit instruction set.

One of the most interesting things about the ARMv8 announcement is that ARM is not yet creating any chip designs based on it. Typical ARM instruction set releases come with a reference implementation that system-on-chip (SoC) manufacturers can license and create. With ARMv8, it's expected that companies like NVIDIA will release implementations of their designs first, and ARM will produce its own designs later. This approach makes the ARM ecosystem even more attractive to device makers, because it means that there will be several competing implementations of the core architecture (as with Intel, AMD, and VIA producing x86 chips), rather than simply several companies adding things to ARM designs.

Register and Memory Increases

The definition of a 64-bit architecture is a bit fuzzy. Typically either supporting 64-bit pointers or having 64-bit registers is considered a requirement. In AArch64 mode, ARMv8 provides both.

ARM was never particularly register-starved, providing 16 general-purpose registers, including some—such as the program counter, stack pointer, and link register—that had special uses. In comparison, x86 had 4 registers, which x86-64 extended to 16.

ARMv8 increases the size of the register set to 31 64-bit registers. If 31 seems like a strange number, note that one "register" is a hard-wired zero value—a fairly common approach in RISC processors. It turns out that zero is a very commonly used value, and having it always in a register shortens a lot of common instruction sequences.

Slightly more interestingly, the instruction pointer and stack pointer are no longer general-purpose registers, which means that the number of real usable registers goes from 14 to 31—a fairly significant improvement. This is even more noticeable when you're doing 64-bit arithmetic. With ARMv7, 64-bit operations needed the operands to be split between pairs of registers, limiting you to 7 intermediate values (fewer, if you actually used the link register and frame pointer).

  • + Share This
  • 🔖 Save To Your Account