Differences between Thumb and Thumb2 instruction sets

The Thumb and Thumb2 instruction sets are both used in ARM processors, but there are some key differences between them. Thumb is a 16-bit instruction set that was introduced in ARMv4T processors as a space-efficient alternative to the 32-bit ARM instruction set. Thumb2, introduced in ARMv6T2 processors, expands on Thumb by adding some 32-bit instructions while retaining 16-bit instructions for improved code density. Here we’ll explore the differences between these two instruction sets in detail.

Contents

Code Density Available Instructions Addressing Modes Branches Load/Store Support Interworking Position Independence Performance Code Generation Complexity Use Cases Summary of Differences

Code Density

One of the main goals of both Thumb and Thumb2 is to improve code density compared to the 32-bit ARM instruction set. Thumb restricts instructions to 16-bits, reducing code size compared to 32-bit ARM instructions. Thumb2 builds on this by allowing a mix of 16-bit and 32-bit instructions. This provides an advantage over the original Thumb in some situations:

32-bit instructions allow operations the original Thumb can’t express in 16-bits, like PC-relative addressing.

Important 32-bit instructions can be used without expanding everything to 32-bit.
16-bit Thumb2 instructions are usable in situations the 32-bit versions aren’t, like last-minute constants.

Overall, Thumb2 code can be denser than pure 32-bit ARM in some cases while avoiding limitations of the original Thumb. The processor automatically switches between 16-bit and 32-bit instruction sets with no overhead.

Available Instructions

The original Thumb instruction set is highly restricted compared to ARM, supporting only a fraction of the available instructions. For example, Thumb lacks many arithmetic and logical operations present in ARM. Thumb2 massively expands the instruction set available compared to original Thumb:

Adds support for all ARM conditional execution features.
Supports all ARM data-processing instructions through 32-bit encodings.

Adds support for artifacts like exception return.
Supports load/store multiple and SWP instructions.
Adds branches and control flow like IT blocks.

Thumb2 is designed to remedy most of the deficiencies of original Thumb. The only ARM instructions still missing are data types like 64-bit integers. The broad instruction set helps make Thumb2 suitable for many complex applications the original Thumb couldn’t easily handle.

Addressing Modes

One major disadvantage of original Thumb is very limited addressing mode support due to the 16-bit restriction. Only registers and small immediate offsets are available. Thumb2 adds some key addressing abilities:

PC-relative addressing – Useful for position-independent code.

SP-relative addressing with smaller encodings.
Larger immediate offset addressing.
Index addressing modes like pre-indexed, post-indexed.

These additions allow Thumb2 code to efficiently perform tasks like table lookup, complex stack manipulation, and accessing data objects. The broader addressing expands Thumb’s usefulness for applications like compilers, interpreters, and OS kernels.

Branches

Original Thumb also suffers from very limited branch instruction support. It can only directly branch forward a few instructions or branch to a handful of registers. Looping and complex branch constructs require inefficient workarounds. Thumb-2 adds:

CBZ/CBNZ for compare and branch on zero/nonzero.

IT blocks for If-Then style conditional execution.
B.cond backward branches.
BL and BLX calls with link register.

Together these new conditional and unconditional branches allow Thumb2 to implement efficient decision-making and complex call graphs. The branches help make Thumb2 suitable for larger programs the original Thumb couldn’t handle.

Load/Store Support

Thumb also has very basic load/store support. It can only access word and halfword data using a limited set of addressing modes. Thumb-2 improves this by:

Allowing byte, doubleword, and multiple accesses.

Supporting larger immediate offsets.
Adding indexed and register-offset addressing.
Allowing normal, exclusive, and acquire/release semantics.

These additions allow Thumb2 to efficiently access objects and arrays in memory. The exclusive instructions are useful for synchronization primitives when writing OS kernels and device drivers.

Interworking

Original Thumb requires special interworking code to transfer control between ARM and Thumb states. Branching between them is complex and costs extra instructions. Thumb2 uses BLX to allow near-seamless transitioning between ARM and Thumb2 code using simple procedure calls. Interworking code and overhead are avoided.

Position Independence

Position-independent code avoids hard-coding addresses allowing it to execute properly regardless of where it is loaded in memory. This is useful for things like shared libraries, JITs, and OS kernels. The PC-relative addressing in Thumb2 makes it suitable for position-independent code in a way original Thumb is not.

Performance

The simplicity of Thumb’s 16-bit encoding does yield some performance advantages. Thumb code sees improved code density and I-cache utilization. Looping constructs like small loops also benefit. However, the limited instructions and branches hamper outright performance in many cases.

Thumb2 mitigates these issues by adding efficient 32-bit instructions and improved branching. Benchmarking shows Thumb2 performs similarly to ARM code in many situations while retaining the density benefits of Thumb. For compute-intensive code, Thumb2 delivers performance much closer to ARM than original Thumb.

Code Generation Complexity

Thumb’s 16-bit format imposes complexity challenges for compilers generating Thumb code:

Limited instructions require more instructions to synthesize some operations.
Workarounds needed for larger immediates and complex branching.
Interworking overhead to call ARM code.

Mode limitations require dividing code into ARM and Thumb sections.

Thumb2’s more ARM-like instructions and branching relax these burdens. Compilers can produce efficient Thumb2 in more situations with less target-specific optimization. Easier interworking also simplifies compilers.

Use Cases

Here are some typical use cases where Thumb and Thumb2 excel:

Thumb’s simplicity works well for some embedded systems with simple processing and memory requirements.
Thumb2 suits more advanced embedded applications like networking equipment and set-top boxes.
Thumb2 is very attractive for JIT and dynamic code generation scenarios.

Thumb2 is suitable for OS kernel and driver development with efficient instructions.
Code density benefits help conserve I-cache and tight memory embedded systems.

In general, Thumb2 delivers compelling performance and size while avoiding many shortcomings of original Thumb. Thumb2 is suitable for a much wider range of applications than Thumb.

Summary of Differences

In summary, here are the key differences between the Thumb and Thumb2 instruction sets:

Thumb2 mixes 16-bit and 32-bit instructions vs Thumb’s 16-bit only.
Thumb2 supports far more ARM instructions and features.

Thumb2 adds useful addressing modes and control flow.
Interworking overhead between ARM/Thumb is reduced.
Thumb2 enables position-independent code.

Performance is much closer to ARM benchmarks vs Thumb.
Thumb2 reduces compiler code generation complexity.

Overall Thumb2 delivers ARM-like performance in many cases while retaining Thumb’s density benefits. Thumb2 is suitable for a much broader range of use cases than the more limited original Thumb instruction set.

Differences between Thumb and Thumb2 instruction sets

Code Density

Available Instructions

Addressing Modes

Branches

Load/Store Support

Interworking

Position Independence

Performance

Code Generation Complexity

Use Cases

Summary of Differences

More ARM insights right in your inbox

Leave a Reply Cancel reply

You Might Also Like

Do I Need to Run a Separate Flash Programmer Software for Custom SOC with Cortex M0?

Changing Interrupt Priority on Cortex-M Microcontrollers

How to get QEMU to run an ARM Thumb binary?

How to Create a Hard Fault Handler that Prints Out Call Stack on Cortex-M0+?