Hard faults on Cortex-M0/M0+ microcontrollers are often caused by software bugs, improperly configured hardware, or faulty external devices. While hard faults can seem cryptic at first, there are some common root causes to look for when debugging these issues.
Software Bugs
Many hard faults originate from bugs in the software code running on the MCU. Here are some of the most common coding mistakes that lead to hard faults:
- Accessing restricted or invalid memory – Attempting to read/write memory locations that are not mapped for the application code will cause a hard fault exception.
- Stack overflows – Failing to allocate sufficient stack space or allowing unbounded recursion can lead to corruption of stack memory and ultimately a hard fault.
- Double faults – If a fault occurs when the fault handling routines are already running, it will trigger a double fault exception and hard fault.
- Division by zero – Dividing by an integer value of 0 will raise a divide-by-zero exception that escalates to a hard fault.
- Unaligned memory access – The Cortex-M0/M0+ cores do not support unaligned accesses, so these will fault hard.
- Interrupt issues – Returning from ISRs improperly or having a nested interrupt depth > 2 can also cause hard faults.
Rigorously testing code on development boards, adding asserts and sanity checks, and enabling the MCU’s debug features can help identify and resolve these software bugs before deployment.
Hardware Configuration Issues
Besides software defects, improperly configured hardware can also lead to hard faults at runtime:
- Clock configuration – Setting up an unstable clock source or incorrect clock dividers can cause timing issues and glitches that disrupt operation.
- Power supply problems – An inadequate, noisy, or sporadic power source can trigger exceptions and resets.
- Invalid peripherals – Enabling inconsistent or unsupported peripherals will lead to faults on their access.
- Faulty external devices – Issues with sensors, memory chips, displays etc. can cause bus faults.
- Pin muxing mistakes – Incorrect assignment of GPIOs and peripherals to pins can make peripherals fail.
Carefully reviewing the MCU datasheet, double checking connections, and testing sub-systems incrementally can surface these kinds of hardware issues early.
Unhandled Exceptions
By default, any unhandled exception on Cortex-M0/M0+ will escalate to a hard fault. Common unmanaged exceptions include:
- Memory management faults – Caused by accessing restricted or invalid memory regions.
- Bus faults – Generated on failed transactions over the AHB bus or APB peripherals.
- Usage faults – Triggers when coprocessor instructions execute illegally.
- Debug monitor faults – Raised when debug configuration is invalid.
- Unaligned access faults – As mentioned earlier, unaligned accesses fault on Cortex-M0/M0+.
Utilizing a fault exception handler and debugging these lower level faults when they occur can avoid escalation to a hard fault.
Analyzing Hard Faults
When a hard fault does occur, we need an effective strategy to analyze root causes:
- Check fault registers – The HFSR and MMFSR contain status flags on fault sources.
- Examine call stack – This gives insight into the code flow leading to the fault.
- Inspect variable values – Live view variables in debugger to check for issues.
- Add assertions – Strategically add runtime sanity checks to isolate the fault location.
- Review code changes – Understand latest changes in relation to the fault symptoms.
It also helps to methodically stress test sub-systems and recovery logic. Hardware tools like oscilloscopes and logic analyzers are also useful when debugging more complex issues.
Best Practices to Avoid Hard Faults
Some key best practices can help avoid or recover from hard faults:
- Carefully validating memory and bus access within code to avoid access violations.
- Having asserts in place to catch errors early before they escalate.
- Implementing a hard fault handler and lower level fault handlers to contain issues.
- Enabling fault diagnostics features like stack overflow checking on the MCU.
- Testing edge cases thoroughly via unit tests, Fault injection etc.
- Following defensive coding practices to handle unexpected conditions gracefully.
- Designing all critical data structures and hardware access to be fault resilient.
While hard faults can certainly be frustrating to debug, methodically tracing their root causes and applying fault avoidance best practices helps build more robust and resilient embedded systems.