The hard fault exception in ARM Cortex-M processors is designed to handle catastrophic software errors and hardware faults. It acts as a catch-all exception handler when no other fault handler is able to process the error. The hard fault handler allows the system to recover gracefully or at least safely halt execution when unrecoverable errors occur.
Background on Exceptions in ARM Cortex-M
Exceptions are events that temporarily interrupt normal program execution in a processor. They allow the processor to respond to issues and handle errors. There are two main types of exceptions:
- Interrupts – Asynchronous events triggered by external peripherals or hardware.
- Exceptions – Synchronous events triggered by the code that is executing.
When an exception occurs in ARM Cortex-M, the processor halts execution of the current program and jumps to an exception handler function. This handler runs code to deal with the exception before returning to normal program execution.
Each type of exception has its own exception handler. For example, the SysTick exception is triggered by the SysTick timer peripheral, and the memory management fault exception handles access violations. Cortex-M devices typically support around 15 different exceptions.
If multiple exceptions occur at the same time, the processor prioritizes critical system exceptions over others. This prevents catastrophic failures from going unhandled.
The Role of the Hard Fault Exception
The hard fault exception is the highest priority exception in Cortex-M devices. It acts as a “catch-all” handler for exceptional events that don’t have a dedicated exception handler.
The hard fault handler gets triggered for problems like:
- Undefined instructions attempted to execute
- Invalid state during exception processing
- Faulty stack usage
- Math errors like divide by zero
- Failed integrity checks on exception return
- Invalid exception entry or return
Many of these errors reflect catastrophic software failures or hardware malfunctions. Without the hard fault handler, the system would likely crash or enter an unstable undefined state when they occur.
So the purpose of the hard fault handler is to catch these errors, allow the system to gracefully recover if possible, or at least halt execution in a controlled way. This prevents uncontrolled system crashes.
Built-In Hard Fault Handling
When a hard fault occurs, the Cortex-M processor automatically performs several steps:
- Saves context of interrupted program to stack
- Enters exception mode
- Loads HardFault handler address into Program Counter
- Begins executing handler code
The hard fault handler code is written by the application developer. By default, the handler just locks up the system when a fault occurs. The developer needs to add code to clear the fault, recover gracefully, or halt the system safely.
To help diagnose the cause of the hard fault, the processor stores details like the stacked program counter and error status words. The handler code can examine these to identify the specific error.
Once the handler has finished, program execution returns to the original application code. Alternatively, the handler can decide recovery is not possible and intentionally lock up the system to prevent further damage.
Configuring and Writing the Hard Fault Handler
Using the hard fault exception effectively involves:
- Setting up the handler function and its priority level
- Writing code to identify the error cause
- Attempting corrective actions if possible
- Halting execution if recovery is not possible
The handler function is just like any interrupt service routine (ISR) registered with the Nested Vectored Interrupt Controller (NVIC). It gets triggered when the processor’s Fault Mask register flags a hard fault.
To identify the error, the handler checks the stacked program counter, stacked processor state registers, and Fault Status Register. Useful debugging info is available in the Hard Fault Status Register and Configurable Fault Status Register.
Once the cause is determined, the code can attempt to clear the fault and recover. For example, it may reset peripherals or reconfigure hardware to return to a working state.
If recovery is not possible, the handler forces the system into an infinite loop or otherwise halts execution in a controlled way. Rebooting the processor is one option if completely restarting the application is acceptable.
Example Pseudocode for a Hard Fault Handler
Here is some example pseudocode for a simple hard fault handler: // HardFault exception handler void HardFault_Handler() { __asm(“TST LR, #4”); //Check EXC_RETURN code in LR to see if return stack was used if(Stacked_PC == HardFault_Handler) //Double fault occurred { Reset_System(); //Reset and reboot } if(SHCSR.BUSFAULT) //Bus fault detected { Clear_BUSFAULT_Flags(); //Try recovery like resetting peripheral } else if (SHCSR.USAGEFAULT) //Usage fault occurred { Clear_USAGEFAULT_Flags(); //Try recovering invalid state } else //Unknown unrecoverable fault { Halt_System(); //Halt execution } }
This demonstrates checking for the fault cause, attempting recovery where possible, and halting operation when the error cannot be cleared.
More robust implementations would log debug information to help diagnose software defects. The handler may also attempt to restart in a known safe state when possible.
Using a Debugger to Understand Hard Faults
Because hard faults often indicate catastrophic code failures, using a debugger to step through the handler can help identify software bugs.
Debugging features like breakpoints and register inspection make it possible to closely examine the state when the fault occurred. This reveals the precise error cause.
For Arm chips specifically, CoreSight debug components like the Embedded Trace Macrocell (ETM) and Data Watchpoint and Trace (DWT) provide detailed tracing during exception handling. This unleashes powerful Cortex-M debugging capabilities.
Debugging hard faults quickly pinpoints whether issues stem from stack corruption, race conditions, invalid memory access, assertion failures, or any number of other firmware problems.
Advanced Hard Fault Handling Techniques
With complex, critical applications, developers can employ more advanced strategies to handle hard faults:
- Fault logging: Save detailed fault info to persistent storage for post-mortem analysis.
- Priority levels: Assign separate priority levels for different fault conditions to handle the most severe first.
- Debug signaling: Have faults trigger external debug events to observe system state.
- Redundant execution: Recover by rolling back to a known good state saved during normal operation.
- Secondary cores: Use a second core to safely restart and reconfigure the main application core.
For safety-critical systems especially, robust fault handling mechanisms help ensure reliable operation even in the face of random hardware issues or systematic code failures.
Conclusion
The hard fault exception and handler serve a critical role in Cortex-M processors. They catch erroneous conditions that would otherwise lead to unstable undefined behavior or system crashes. Well-written fault handling code allows systems to respond to hardware issues and software bugs in a controlled way. This improves the robustness and resilience of embedded applications built around Arm cores.
Understanding the triggers for hard faults, configuring the NVIC properly, and using debugging tools to analyze handler execution aids development of complex real-time systems. Knowledge of the hard fault mechanism helps developers write firmware that stands up to the rigors of embedded deployment.