Disassembly is the process of converting machine code back into assembly code. It allows you to see the low-level instructions that make up a program. This can be useful for reverse engineering, malware analysis, and debugging. The ARM architecture is widely used in embedded devices like smartphones and IoT. To disassemble ARM code, you need an ARM toolchain with disassembly capabilities.
Obtain an ARM Toolchain
There are a few options for getting an ARM toolchain on your computer:
- Install the GNU ARM Embedded Toolchain – This is a ready-made toolchain from ARM and includes the GCC compiler, binaries utilities, and other tools needed for ARM development.
- Build the GNU ARM Toolchain from source – You can download and compile the latest versions of GCC, binutils, and newlib to build your own toolchain.
- Use a commercially licensed ARM Compiler – Toolchains like ARM Compiler and Keil MDK-ARM include advanced optimizations and analysis tools.
For disassembly purposes, the free GNU ARM Embedded Toolchain is likely sufficient. Download the cross-platform executable installer from the ARM Developer website and run through the installation steps. Be sure to install the toolchain in a location where you have write permissions.
Disassemble with objdump
The objdump utility included with the ARM toolchain can disassemble both object files and executable binaries. The most straightforward approach is to provide objdump the -d flag to generate assembly interspersed with the original machine code: arm-none-eabi-objdump -d program.elf > program.disasm
This outputs each instruction in AT&T assembly syntax along with the bytes that make up the instruction. You can also use objdump to disassemble specific sections of an executable: arm-none-eabi-objdump -j .text -d program.elf
This disassembles just the .text section containing executable code. The disassembly listing provides addresses on the left side so you can correlate instructions back to the machine code.
Explore Compiler Optimizations
One reason to disassemble code is to understand the effects of compiler optimizations. For example, loop unrolling is an optimization that reduces looping overhead by unrolling internal loops into duplicated bodies. We can see this in action by compiling a simple loop in C: int sum=0; for(int i=0; i<8; i++) { sum += i; }
With no optimizations, the loop complies to: 4002dc: e3a03000 mov r3, #0 4002e0: e5843000 str r3, [r4] 4002e4: e3a03000 mov r3, #0 4002e8: 0a000003 beq 400300 4002ec: e5933000 ldr r3, [r3] 4002f0: e2833001 add r3, r3, #1 4002f4: e5843000 str r3, [r4] 4002f8: eafffff8 b 4002e4
With loop unrolling of 2, the loop body is duplicated: 4002dc: e3a03000 mov r3, #0 4002e0: e5843000 str r3, [r4] 4002e4: e3a03000 mov r3, #0 4002e8: e5933000 ldr r3, [r3] 4002ec: e2833001 add r3, r3, #1 4002f0: e5843000 str r3, [r4] 4002f4: e3a03000 mov r3, #0 4002f8: e5933000 ldr r3, [r3] 4002fc: e2833001 add r3, r3, #1
Disassembly provides visibility into precisely how optimizations modify the generated instructions.
Analyze Memory Usage
objdump can also be used to analyze an executable’s memory usage. The -h option prints the section headers including start address, length, and name for each section: arm-none-eabi-objdump -h program.elf
This provides visibility into how data and code are organized into sections like .text, .data, .bss, etc. You can further dump the contents of sections to explore the raw hex bytes being stored: # Dump text section arm-none-eabi-objdump -s -j .text program.elf # Dump data section arm-none-eabi-objdump -s -j .data program.elf
Analyzing the memory layout and contents provides insight into how the compiled program stores and accesses data.
Generate Assembly Listings
Disassembly works on compiled binaries, but we can also generate assembly code listings directly from source code using the compiler. Passing the -S flag to arm-none-eabi-gcc will output assembly rather than object code: arm-none-eabi-gcc -S main.c -o main.asm
The assembly listing provides a direct correlation between C source code and the core CPU instructions being generated. Comments even show the original C source lines corresponding to each assembly block. Assembly listings are useful for learning assembly and writing optimized code.
Use Features of GNU Binutils
The binary utilities collection (binutils) includes tools like objdump and readelf that are useful for analyzing and manipulating code. Some other useful features include:
- Symbols – objdump and nm can dump the symbol table showing function names and addresses.
- Headers – readelf displays details on ELF headers and program sections.
- Strings – strings extracts human readable strings from a binary.
- Size – size lists section sizes and total program size.
Make use of binutils when you need to explore and reason about code at the binary level. The tools give visibility into compiled programs and are indispensable when working close to the metal.
Set Up a Disassembly Environment
Doing routine disassembly is easier if you create a reusable environment and scripts. Here are some tips:
- Consider Radare2 – This open source tool provides a featureful disassembler and binary analysis framework.
- Use batch scripts – Write scripts to automate running tools like objdump with consistent settings.
- Control output – Redirect disassembly listings into files organized by project.
- Compare tools – Try different disassemblers and compare their output and features.
- Install on multiple PCs -Having disassembly tools handy on development machines will encourage frequent use.
Setting up a clean, automated disassembly environment saves time and encourages getting into the habit of peering into compiled code.
Correlate Source Code
A key challenge with analyzing disassembly is correlating instructions back to the original source code. Here are some strategies that help:
- Add comments – Instrument important parts of the source code with comments.
- Print symbols – Make use of easily identifiable print statements.
- Structure code – Organize source code files, functions, and objects in a recognizable way.
- Use tools – Some IDEs can map from disassembly back to source lines.
- Know the patterns – Become familiar with code patterns that compilers generate.
Start by disassembling small programs you have the source for. Seeing how structured C code gets compiled down into assembly will help you start recognizing code patterns in large programs.
Practice on Open Source Code
One of the best ways to become proficient at disassembly is by practicing on real world open source programs. Some things to try:
- Bootloaders – Simple bootloaders for ARM processors have straightforward code to dissect.
- OS kernels – Try disassembling the core initialization of RTOSes like FreeRTOS.
- Libraries – Pick a small library like newlibc and explore the disassembly.
- Utilities – Disassemble command line tools to see their inner workings.
Seeing how different programmers structure real world code examples will quickly improve your ability to analyze disassembly listings. Don’t be afraid to also dive into the huge world of PC reverse engineering projects and x86_64 disassembly examples to gain broader experience.
Learn Assembly
The best way to understand disassembly is to have a solid grasp of assembly programming concepts. Some areas to focus on include:
- Registers – Know the core registers and calling conventions used by the architecture.
- Instruction set – Learn what the primary instructions do and their variations.
- Data access – Understand memory addressing modes and data access patterns.
- Control flow – Recognize branching instructions, loops, and function calls.
- Subroutines – See how code is structured into routines and called.
You don’t need to become an expert assembly programmer, but knowing the basics will help you recognize patterns quickly. The more exposure you have writing raw ARM assembly code, the faster you will become at reverse engineering it.
Use a Debugger
A debugger like GDB allows stepping through disassembled code at runtime. This helps confirm assumptions made during static analysis. Some ways a debugger augments disassembly work:
- Control flow – Trace path of execution to validate branch targets.
- Data values – Inspect data in registers and memory when code references it.
- Reversing – Debugging provides clues that help when reversing complex algorithms.
- Validation – Check that code matches your annotations by walking it.
Stepping through disassembled code executed on real hardware uncovers runtime code behaviors difficult to see statically. Use a debugger to confirm theories and gain additional context.
Perform Dynamic Analysis
Dynamic program analysis examines code as it executes. This provides additional context beyond static disassembly. Useful techniques include:
- Profiling – Record frequency of instructions executed to identify hot spots.
- Tracing – Logging program execution can expose high level code paths.
- Instrumentation – Inserting debug/log statements at interesting points in the code.
- Fuzzing – Deterministic random testing to trigger corner cases.
Dynamic techniques help validate assumptions made during static analysis. They also explore how the program reacts to real world input data. Combine disassembly with dynamic techniques to better understand program behavior.
Document Findings
It helps to record discoveries made during disassembly for future reference. Useful documentation includes:
- Comments – Annotate disassembly listings to capture insights.
- Control flow – Diagram program control flow and call graphs.
- Data maps – Document memory regions and discovered data structures.
- Algorithms – Pseudocode interesting functions and algorithms.
Disassembly is a repetitive process of forming and validating theories about code. Thorough documentation accelerates future reverse engineering efforts and allows building up tribal knowledge.
Learn from Other’s Work
There is a wealth of information already documented from skilled reverse engineers disassembling ARM programs:
- Books – Grab a copy of Practical Reverse Engineering or The Art of ARM Disassembly.
- Blogs – Numerous blogs document ARM reversing like www.mobileforensics.com.
- Forums – Forums like RE Stack Exchange have many ARM disassembly questions.
- Talks – Watch conference talks on YouTube for insights.
Studying material from experienced reversers provides new insights and best practices. Don’t reinvent the wheel when absorbing other’s work can shortcut the learning curve.
Use Helper Tools
There are many tools that assist with disassembly and analysis:
- Decompilers – Generate pseudo-C code from instructions.
- Cross-references – Tools like IDA Pro automatically generate code x-refs.
- Diffing – Compare binary diffs to quickly spot code changes.
- Search – Grepping through disassembly listings helps find patterns.
- Beautify – Formatting tools clean up raw disassembly dumps.
Assemble a toolkit of helper apps that accelerate common tasks like function discovery and control flow mapping. Leverage tools so you can focus time on actual reverse engineering.
Identify Key Functions
When disassembling a large program, focus energy on identifying and understanding the key functions involved in the core logic. Strategies to find important functions:
- Entry points – The main() method or boot entry are good starting points.
- Strings – Names and references reveal logical functions.
- Size – Large functions tend to be significant.
- Frequency – Code executed often points to critical paths.
- Experiment – Prodding at runtime guides code discovery.
Reverse engineering big programs can feel daunting. Start by carefully discovering and documenting the functionality concentrated in key functions.
Simplify Complex Code
Complex code obscures understanding. Techniques to reduce complexity:
- Refactor – Restructure into cleaner functions, objects, modules.
- Remove dead code – Deleting unused code clarifies the main logic flow.
- Abstract – Encapsulate low level tasks to expose higher level concepts.
- Decompile to C – Higher level languages allow reasoning more easily.
- Document – Describing code purpose helps cement understanding.
Simplifying forces structured thinking about code purpose and design. Eliminate complexity so core logic becomes clearer.
Leverage ARM Hardware
ARM’s extensive ecosystem provides tools to increase visibility into code:
- Debuggers – Debug hardware like JTAG probes stop and start code execution.
- Emulators – QEMU can model ARM devices for broader introspection.
- Dev Boards – Inexpensive boards from STM, Beagle, Raspberry Pi, etc allow live debugging.
- FPGA – Prototyping on FPGA emulates ARM SoCs while monitoring inside the logic.
Use surrounding ARM hardware to instrument, probe, and visualize executing code. Hardware assisted disassembly provides more robust insights.
Summary
Disassembling ARM binaries requires gaining familiarity with the ARM toolchain and developing techniques for analyzing machine code. Proficiency takes time and practice across diverse programs. Patience and meticulous experimentation ultimately produce satisfying results. Use the strategies outlined here to build up your own ARM reversing skills.