Endianness refers to the byte ordering and memory layout convention used by a processor architecture to store data in memory. It determines how sequence of bytes are interpreted as larger data types like words, double words etc. The ARM processor supports both big-endian and little-endian memory layouts. By default, ARM uses little-endian format.
Big Endian vs Little Endian
In big-endian format, the most significant byte of a multi-byte data is stored at lowest memory address. As we keep reading higher memory addresses, we get the less significant bytes of the data. For example, a 4-byte integer 0x12345678 would be stored in memory as: Address Value 0x100 0x12 0x101 0x34 0x102 0x56 0x103 0x78
In little-endian format, the least significant byte is stored first at lowest address. As we read higher addresses, we get more significant bytes. The same 4-byte 0x12345678 would be stored as: Address Value 0x100 0x78 0x101 0x56 0x102 0x34 0x103 0x12
The ARM architecture supports both formats via configuration bits. The EABI (Embedded Application Binary Interface) uses little-endian as default for performance reasons. Legacy ARM systems used big-endian convention.
Endianness Configuration in ARM
The ARM processors provide ENDSTATE signal pin to configure endianness. Setting this pin HIGH during reset forces big-endian mode. Leaving it LOW or unconnected results in little-endian mode. This signal can be tied to a configuration bit or boot strap option for firmware control.
The processor executes in configured endianness mode until next reset. The mode cannot be changed dynamically during runtime. Any data exchange with external memories or I/O devices should match the configured endianness.
ARM Instruction Endianness
All ARM instructions are stored in memory using little-endian convention, irrespective of configured data endianness mode. The processors internally convert instructions to correct endian format for execution.
Data Endianness
The configured endianness mode decides memory layout of data – variables, arrays, structures etc. Code has to use appropriate access methods to load data from memory in correct byte order.
For example, loading a 32-bit integer variable with address 0x1000: Big-endian mode: LDR R0, [0x1000] ; Load MSB byte from 0x1000 into R0 Little-endian mode: LDR R0, [0x1003] ; Load LSB byte from 0x1003 into R0
Impact on Data Access
Developers have to carefully handle multi-byte data access like array, structure operations. Indexing an int array on little-endian machine is not same as bigint array on big-endian system even though an int is 32-bit.
Structures may require explicit handling of member offsets and sizes. Byte ordering needs to be handled properly when interacting with external devices or network data.
Handling Endianness in Code
Certain techniques allow developers to write portable code that works correctly on both endian systems:
- Use compiler intrinsic functions like __bswap_32() to swap byte order
- Use unions to alias same memory as different data types
- Access memory using byte pointers instead of data types
- Use #defines, macros and inline functions to isolate endian-specific operations
- Make endian handling functions CPU architecture specific
- Use command line options to make compiler output required endian format
For example, a portable byte swap function: uint32_t swap_endian(uint32_t val) { return (__bswap_32(val)); }
The compiler replaces __bswap_32() with optimal instruction depending on CPU. This allows the function to work on both little and big-endian machines.
ARM Endianness Examples
Let’s look at some examples to better understand endianness in ARM:
1. Byte Ordering in Memory
An integer variable with hex value 0x12345678 is stored as: Big Endian: Little Endian: 0x1000: 0x12 0x1000: 0x78 0x1001: 0x34 0x1001: 0x56 0x1002: 0x56 0x1002: 0x34 0x1003: 0x78 0x1003: 0x12
2. Array Indexing
Indexing a byte array gives same result on both systems: Big Endian: char arr[4] = {0x12, 0x34, 0x56, 0x78}; arr[0] = 0x12 arr[1] = 0x34 Little Endian: char arr[4] = {0x12, 0x34, 0x56, 0x78}; arr[0] = 0x12 arr[1] = 0x34
But indexing an integer array is different: Big Endian: int arr[1] = {0x12345678}; arr[0] = 0x12345678 Little Endian: int arr[1] = {0x12345678}; arr[0] = 0x78563412
3. Structures
A structure with 3 char members will be laid out as: struct abc { char a; char b; char c; }; Big Endian: a: 0x1000 b: 0x1001 c: 0x1002 Little Endian: a: 0x1000 b: 0x1001 c: 0x1002
But a structure with an int and char will differ: struct xyz { int x; char y; }; Big Endian: x: 0x1000 (MSbyte) .. 0x1003 (LSbyte) y: 0x1004 Little Endian: y: 0x1000 x: 0x1001 (LSbyte) .. 0x1004 (MSbyte)
Conclusion
Endianness can affect code behavior in subtle ways. Understanding ARM endianness handling allows developers to write efficient firmware and drivers. Use of intrinsic functions, macros and compiler options help in dealing with endianness portably. Proper struct padding and field ordering is needed. Testing on actual target hardware can help catch endianness issues early.