Hard Fault behavior differences across Cortex-M variants

The Cortex-M series of ARM processors are extremely popular in embedded systems due to their low cost, low power consumption, and excellent performance. All Cortex-M variants have built-in fault handling capabilities, with the HardFault exception being the catch-all fault handler for undefined faults not handled by other fault exceptions. While the HardFault mechanism is consistent across Cortex-M variants, there are some behavioral differences to be aware of when migrating code between variants or debugging HardFaults.

Contents

Root Causes of HardFaults Handler Mode and Stack Usage Automatic Stack Limit Checking Fault Status Registers Fault Address Registers Return Address Registers Exception Link Registers Debug Fault Handling HardFault Priority Lockup Fault Detection Memory Protection Units Exception Tracing Fault Handling Extensions Fault Isolation Extensions Conclusion

Root Causes of HardFaults

Some common root causes of HardFault exceptions include:

Attempting to execute an undefined instruction

Accessing invalid or misaligned memory addresses
Dividing by zero
Corruption of stack pointers leading to stack overflows

Incorrect exception return behavior
Faulty exception handlers

The specific HardFault behavior exhibited on these faults can vary between Cortex-M variants.

Handler Mode and Stack Usage

On all Cortex-M variants, the HardFault handler executes in Handler mode using the main stack pointer (MSP). However, if the processor enters HardFault due to an exception return error, the HardFault stack frame may reflect the original exception stack.

For example, if an exception handler returns incorrectly to Thread mode instead of Handler mode, the resulting HardFault will have a main stack frame instead of a process stack frame. This can complicate root cause analysis but provides insight into the origin of the fault.

Automatic Stack Limit Checking

Some Cortex-M variants like the Cortex-M3 and Cortex-M4 automatically detect stack overflows and trigger a HardFault before executing the offending instruction. This prevents stack corruption but limits debug visibility into the faulting code.

Other variants like the Cortex-M0+ do not perform automatic stack checking, so will execute the offending instruction before faulting. This provides more debug visibility at the cost of potential stack corruption before the HardFault.

Fault Status Registers

The HardFault handler can leverage fault status registers like the Fault Status Register (FS) and HardFault Status Register (HFSR) to determine fault origins. For example, the FS register indicates whether a fault occurred in privileged or unprivileged mode, while the HFSR indicates the source of faults like divide-by-zero, invalid PC, or stack overflow.

However, not all Cortex-M variants have the same complement of status registers. For example, the Cortex-M0 and Cortex-M1 lack the HFSR, limiting fault diagnosis. The Cortex-M23 and Cortex-M33 add additional registers like the Debug Fault Status Register (DFSR) for more detailed fault tracing.

Fault Address Registers

Most Cortex-M variants provide Fault Address Registers (FARs) to capture the memory address of bus faults. However, the Cortex-M0, Cortex-M0+, and Cortex-M1 lack FARs, making it difficult to identify invalid memory access without external debug hardware.

FAR availability also varies for stack limit faults. For example, the FAR holds the stack overflow address on the Cortex-M3, but is untouched on the Cortex-M4 which performs silent stack limit checks.

Return Address Registers

The Cortex-M3 onwards provide Return Address Registers (RA) to capture the program counter at which a fault originated. This improves debuggability compared to earlier Cortex-Ms without RAs.

However, RAs are updated differently across variants. For example, the Cortex-M3 RA is updated only if recovering from a previously stacked exception frame. The Cortex-M4 always updates its RA, even for new HardFaults. The Cortex-M7 updates its RA on both stacked and new exceptions.

Exception Link Registers

The Cortex-M4F, Cortex-M7F, and Cortex-M33F variants add Exception Link Registers (ELR) to hold the exception return address, enhancing exception return behavior. This improves recovery from incorrect exception returns compared to variants without ELRs.

The ELRs also enable fault isolation and independent stack usage per exception, improving robustness. Variants without ELRs like the Cortex-M3 couple the link register with the main stack.

Debug Fault Handling

When debugging with breakpoints, debug variants like the Cortex-M3 halt on breakpoint matches before reaching the debugger. However non-debug variants like the Cortex-M0+ trigger a HardFault which invokes the debugger.

So debug Cortex-Ms exhibit HardFault behavior more representative of true faults, compared to non-debug variants where HardFaults are essentially proxies for breakpoints.

HardFault Priority

The Cortex-M0+, Cortex-M1, Cortex-M23, and Cortex-M33 treat HardFault as the highest priority exception, above even the Non-Maskable Interrupt (NMI). This helps ensure HardFaults are serviced immediately.

On other variants, NMI retains top priority, meaning NMIs will preempt HardFault exception handling. This priority difference can alter fault visibility in corner case debug scenarios.

Lockup Fault Detection

Some Cortex-Ms include extensions to detect lockup faults if the processor halts for too long, either due to deadlock or live-lock scenarios. For example, the Cortex-R52 has a Safety Checker for this, while the Cortex-M23 and Cortex-M33 include a Lockup detector.

The lockup detectors trigger a HardFault exception if lockup is detected, allowing recovery action to be taken. Of course, variants without lockup detection cannot detect these halting faults.

Memory Protection Units

Some Cortex-M variants integrate memory protection units (MPUs) to control access permissions for memory regions. Privilege violations or access faults will invoke the MPU fault handler before HardFault, improving debuggability.

On variants without MPUs, protection faults will directly enter HardFault exception handling, lumping the faults together with other causes like undefined instructions.

Exception Tracing

Advanced Cortex-M variants include exception tracing features to capture exception context snapshots via Embedded Trace Macrocell (ETM) or Instrumentation Trace Macrocell (ITM) tracing. For example, the Cortex-M3 supports tracing via the ETM, while the Cortex-M7 can use the ITM.

This exception tracing improves root cause analysis compared to variants without tracing capabilities. The trace snapshots are particularly useful for diagnosing HardFaults originating from exception return errors.

Fault Handling Extensions

Some Cortex-M variants add extensions to enrich HardFault handling capabilities. For example, the Cortex-M23 integrates StackCheck, a software library to help debug stack overflows by capturing stack trace data before the HardFault.

And Cortex-M33 adds the System Protection Unit (SPU) to safely manage escalation from less privileged software. The SPU extensions enhance handling of privilege-related faults.

Fault Isolation Extensions

Advanced variants like the Cortex-R52 introduce fault isolation features that provide more control and configurability over fault handling. Key capabilities include:

Selective fault handling per exception
Independent return stacks per exception

Dedicated stack pointers for interrupts
More selective HardFault behavior

These give developers more control over tailoring fault handling behaviors compared to the fixed HardFault handling of mainstream Cortex-M variants.

Conclusion

While the HardFault mechanism provides a baseline fault handling capability across all Cortex-M variants, there are a number of behavioral differences in HardFault triggering, register usage, debuggability, extensions, and configurability across variants.

Software architects should carefully assess these differences when selecting Cortex-M processors or migrating code between variants. HardFault debugging may require tuning expectations and debug flows to match the specific strengths and weaknesses of each variant’s HardFault implementation.