Differences between Thumb-16 and Thumb-2 instruction sets

The main difference between Thumb-16 and Thumb-2 instruction sets is that Thumb-2 has a 32-bit instruction set architecture while Thumb-16 has a 16-bit instruction set architecture. Thumb-2 provides significant performance and code density improvements over Thumb-16.

Contents

Overview of Thumb-16 Overview of Thumb-2 Key Differences Performance and Code Density Interworking Functionality Instruction Set Highlights Real-World Impact

Overview of Thumb-16

Thumb-16 is a 16-bit compressed instruction set that was introduced in ARMv4 architecture in the late 1990s. The goal of Thumb-16 was to improve code density compared to the 32-bit ARM instruction set. Thumb-16 achieved approximately 65% better code density while maintaining reasonable performance.

In Thumb-16, most instructions are 16-bit long. It has a total of 16 general purpose registers (R0-R15) that are accessible at any time. Thumb-16 instructions operate on a single register, a register and a 3-bit immediate value, a register and another register, or a hi register and a lo register pair.

Thumb-16 has a limited instruction set with only 20% of the number of instructions compared to the ARM instruction set. It lacks many arithmetic, logic, and branch instructions that are available in the ARM set. Thumb-16 programs require more instructions to implement the same functions as ARM code.

To compensate for the reduced instruction set, Thumb-16 uses 16-bit Thumb encoding for most instructions but still relies on 32-bit ARM encoding for some instructions like branches and long immediate values. This results in a variable-length instruction set.

Overview of Thumb-2

Thumb-2 was introduced in ARMv6 architecture in the early 2000s. It builds upon the Thumb-16 instruction set by adding many 32-bit instructions while retaining support for 16-bit Thumb instructions.

The Thumb-2 instruction set is based on a variable-length encoding scheme. It can efficiently encode 16-bit and 32-bit instructions, choosing the encoding based on the size and complexity of the instruction. Smaller and simpler instructions use 16-bit encoding while more complex instructions use 32-bit encoding.

In Thumb-2, most Thumb-16 instructions are reused as is, providing backward compatibility. Additional 32-bit instructions are added to the Thumb-2 set to handle more complex operations and memory access modes.

Thumb-2 extends the register set to 13 general purpose registers (R0-R12), the stack pointer (SP), link register (LR), and program counter (PC). More registers are available for arithmetic and load/store operations.

Thumb-2 also adds hardware support for ARM/Thumb interworking. This allows flexible switching between ARM and Thumb modes, enabling Thumb-2 code to call ARM subroutines and vice versa.

Key Differences

Here are some of the major differences between Thumb-16 and Thumb-2:

Thumb-2 has 32-bit encoding options for instructions while Thumb-16 is limited to 16-bit encoding.

Thumb-2 has 13 general purpose registers vs. 16 registers in Thumb-16.
Thumb-2 supports ARM/Thumb interworking to mix ARM and Thumb code, Thumb-16 does not.
Thumb-2 has a much larger instruction set including arithmetic, branch, load/store, and coprocessor instructions.

Thumb-2 instructions can operate on two source registers instead of just one.
Thumb-2 extends memory addressing modes with support for scaled registers and PC-relative addressing.
Thumb-2 includes hardware divide instruction, Thumb-16 relies on software routines.

Thumb-2 adds conditional execution of many instructions based on status flags.

Performance and Code Density

The Thumb-2 instruction set architecture brings significant performance improvements over Thumb-16. Benchmarks show that Thumb-2 code provides about 30% better performance clock-for-clock compared to Thumb-16.

This performance gain is attributed to:

More registers available, reducing register pressure.
Wider range of instructions, requiring fewer instructions to implement functions.
More operands per instruction leading to better parallelism.

Faster memory access with flexible addressing modes.
In-line hardware divide instruction instead of software routine.

However, Thumb-2 code density is not as good as Thumb-16. Thumb-2 code is approximately 15-20% larger than Thumb-16 code in most cases. This is because Thumb-2 mixes 16-bit and 32-bit instructions and uses 32-bit instructions for many complex operations.

But ARM designed Thumb-2 as a performance enhancement over Thumb-16 while trying to maintain reasonable code size. The modest code size increase is considered an acceptable tradeoff for the performance gains.

Interworking Functionality

A major feature of Thumb-2 is its support for ARM/Thumb interworking. This allows Thumb-2 code to seamlessly call ARM subroutines and vice versa without any mode switching overhead.

Thumb-2 implements interworking using the BLX (branch with link and exchange) instruction. BLX handles switching between ARM and Thumb states automatically based on the target address.

On ARMv6 and newer processors, the interworking branch can occur to any instruction address. Older implementations required interworking branches to be aligned on word boundaries.

Interworking creates flexibility in mixing Thumb-2 and ARM code within an application. Performance-critical parts can use ARM code while the rest uses Thumb-2 for better code density. Thumb-2 can also leverage ARM libraries.

Instruction Set Highlights

Here are some examples of key instructions and features that Thumb-2 provides over the older Thumb-16 architecture:

32-bit arithmetic: ADD, SUB, MUL, etc. with two register operands.
32-bit logical: AND, ORR, EOR, BIC, etc. with two register operands.
32-bit shift and rotate: LSL, LSR, ASR, ROR, RRX, etc. with two register operands.

32-bit branch: B conditional branch with 24-bit offset.
Load/store: LDR, STR with register offset, scaled register, pre-index, post-index addressing.
Load/store multiple: LDM, STM to load/store multiple registers.

Hardware divide: SDIV and UDIV signed and unsigned integer divide.
Conditional execution: IT instruction to conditionally execute 1-4 following instructions.
Hi register operations: ADD, CMP, MOV, BX, BLX with LR, PC.

PC-relative addressing: LDR, ADR to reference PC-offset location.

Real-World Impact

The Thumb-2 instruction set has been hugely successful in delivering high performance 32-bit execution on ARM embedded processors while maintaining good code density. It is supported on all modern ARM Cortex-A, Cortex-R, and Cortex-M processor families.

Thumb-2 is ideal for deeply embedded applications with memory and power constraints. Its variable-length encoding provides a good balance of high code density and high performance.

Thumb-2 is very popular in 32-bit microcontrollers used for IoT edge nodes, sensors, wearables, and other low-power devices. The combination of Thumb-2 execution and power optimization techniques enables months or years of battery life.

For mobile applications, Thumb-2 provides an excellent solution. It is the primary instruction set in ARMv6 and ARMv7-A architectures that power most smartphones and tablets. Thumb-2 code fits well in tight mobile memory footprints.

Even ARM server processors use Thumb-2 execution. Large ARM-based server chips utilize Thumb-2 for efficient embedded control code alongside ARM code running server applications.

Overall, Thumb-2 has become the instruction set of choice for almost all 32-bit ARM processors. It delivers an optimal blend of high performance, good code density, and power efficiency in a wide range of embedded, mobile, and server deployments.

Differences between Thumb-16 and Thumb-2 instruction sets

Overview of Thumb-16

Overview of Thumb-2

Key Differences

Performance and Code Density

Interworking Functionality

Instruction Set Highlights

Real-World Impact

More ARM insights right in your inbox

Leave a Reply Cancel reply

You Might Also Like

What is Serial Wire Viewer (SWV) in Arm Cortex-M?

Flash Patch and Breakpoint Unit (FPB) in Arm Cortex-M Explained

Arm Cortex-M DAP bus and interconnect architecture Explained

Controlling Clocks and PLL for Power Savings in Cortex-M3