The ARM processor architecture uses little-endian byte ordering. This means that in multi-byte data types like integers, the least significant byte is stored at the lowest memory address. For example, for a 32-bit integer, the byte containing bits 0-7 would be stored first, followed by bits 8-15, bits 16-23, and finally bits 24-31 at the highest address. The term “little-endian” comes from Jonathan Swift’s Gulliver’s Travels, referring to wars over which end of a boiled egg to open first. In little-endian ordering, data is accessed starting from the “little end” or least significant byte.
Byte Ordering in Memory
To illustrate byte ordering in more detail, let’s look at an example 32-bit integer with the value 0x12345678 stored in memory. In little-endian ordering on an ARM processor, this would be laid out as: Memory Address Contents 0x100 0x78 0x101 0x56 0x102 0x34 0x103 0x12
The least significant byte, 0x78, comes first at the lowest address 0x100. Then the next byte 0x56 at 0x101, and so on. If we read 4 bytes starting from 0x100, we get 0x78563412, which has the same value as 0x12345678 when interpreted as a little-endian 32-bit integer.
Byte Ordering for Data Types
The little-endian byte ordering applies to all the primitive data types used in ARM processors:
- 8-bit: Single byte, so byte ordering does not apply.
- 16-bit: The least significant byte comes first.
- 32-bit: The least significant byte comes first.
- 64-bit: The least significant byte comes first.
This ordering is used regardless of whether the data type is signed or unsigned. For floating point types, the byte ordering is a little more complex but still follows the little-endian convention:
- Single precision (32-bit): Sign bit, 8-bit exponent, 23-bit mantissa
- Double precision (64-bit): Sign bit, 11-bit exponent, 52-bit mantissa
In both cases, the mantissa bytes come before the exponent bytes in memory. This ensures compatibility with integer byte ordering while still conforming to the IEEE 754 floating point standard.
Byte Ordering for Instructions
Instructions in ARM machine code also follow the little-endian convention. For example, a simple ADD instruction encoded as 0xe2800001 would be stored as: Memory Address Contents 0x200 0x01 0x201 0x00 0x202 0x80 0x203 0xe2
With the least significant byte at the lowest address. The ARM instruction set architecture is defined in this way to support efficient execution in both big-endian and little-endian memory systems.
Effects on Code
The little-endian byte ordering affects how code accesses data in ARM processors. Some examples:
- When dereferencing pointers, the least significant bytes are used in address calculation first.
- When reading or writing multi-byte data types, the least significant bytes are accessed first.
- Data values may need to be byte swapped when transferring between big-endian and little-endian systems.
- Bit shift and rotate operations work on the least significant bytes first.
- Multi-byte constants need to be specified with the least significant bytes first (e.g. 0x12345678 rather than 0x12345678).
Programmers have to keep the byte ordering in mind when accessing memory, defining data structures, and manipulating bits and bytes in code. Languages like C/C++ abstract away some of these details but an awareness of the underlying byte order is still important.
Comparison with Big-Endian Systems
The other common byte ordering convention is big-endian, used on some non-ARM processors. In big-endian systems, the most significant byte comes first at the lowest memory address. For example, the 32-bit integer 0x12345678 would be stored in memory as: Memory Address Contents 0x100 0x12 0x101 0x34 0x102 0x56 0x103 0x78
The byte order is reversed compared to little-endian. This causes differences in how data is accessed in code on big-endian vs little-endian processors. Some key differences:
- Pointer dereferencing starts from the most significant bytes.
- Multi-byte data types are accessed starting from most significant byte.
- Bit shifts/rotates work on the most significant bytes first.
- Multi-byte constants need to be specified reversed (0x12345678 would be 0x78563412).
Software sometimes needs to explicitly handle byte swapping between the two conventions. ARM supports bi-endian operation to allow the same code to work on either system.
Conclusion
The key takeaway is that ARM processors use little-endian byte ordering by default. This means:
- Multi-byte data types are stored with least significant bytes at lower addresses.
- Code accesses data starting from least significant bytes.
- Bit manipulation works on least significant bytes first.
- Data interchange with big-endian systems requires byte swapping.
Keeping the little-endian behavior in mind helps avoid subtle bugs when handling byte arrays, pointers, and other low-level code. The ARM ABI and compilers handle some of this automatically, but an awareness of the underlying byte order is still important in system software development.