SoC
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
  • Arm Cortex M3
  • Contact
Reading: Hard Fault behavior – timing, randomness, root causes
SUBSCRIBE
SoCSoC
Font ResizerAa
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Search
  • Home
  • Arm
  • Arm Cortex M0/M0+
  • Arm Cortex M4
Have an existing account? Sign In
Follow US
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
© S-O-C.ORG, All Rights Reserved.
Arm

Hard Fault behavior – timing, randomness, root causes

Neil Salmon
Last updated: September 17, 2023 8:04 am
Neil Salmon 8 Min Read
Share
SHARE

A Hard Fault on an ARM Cortex chip refers to an unrecoverable exception that occurs when the processor detects an error condition that it cannot handle gracefully. Hard Faults result in a complete halt of normal program execution, requiring a reset or power cycle to recover. Understanding the timing, randomness, and root causes of Hard Faults is critical for debugging and resolving issues in Cortex-based systems.

Contents
When Do Hard Faults Occur?Hardware vs Software Triggered Hard FaultsWhen Do They NOT Occur?Root Causes of Hard FaultsInvalid Memory AccessesUnaligned Memory AccessesDivide By ZeroFPU ErrorsUnhandled ExceptionsStack OverflowCritical System ErrorsIdentifying the Root CauseTiming and Randomness FactorsAsynchronous NatureIntermittent Fault ConditionsComplex Hardware InteractionsHeisenbugsStrategies for Dealing with Timing and RandomnessSummary

When Do Hard Faults Occur?

Hard Faults can occur at any time during program execution on a Cortex CPU. Unlike interrupts which occur at predictable times, Hard Faults are asynchronous and can happen randomly. The processor immediately stops what it is doing, saves context state to the stack, and jumps to the Hard Fault handler. This results in an abrupt and unexpected halt in the software flow. From the software perspective, a Hard Fault has timing and randomness similar to an unexpected reset.

Hardware vs Software Triggered Hard Faults

Hard Faults originate from both hardware and software causes. Hardware issues like clock errors, power glitches, and electrical noise can corrupt processor state and trigger a Hard Fault. These occurrences are rare and random in nature. Software bugs like dereferencing invalid pointers, infinite loops, and stack overflows are much more common sources of Hard Faults. While the root causes may differ, the processor handles both hardware and software triggered faults the same way – by immediately halting execution.

When Do They NOT Occur?

Hard Faults do not occur when the processor is already busy handling a higher priority exception. For example, if a Hard Fault event occurs during an interrupt handler, it will be pended until the interrupt returns. This allows existing critical exception handlers to complete before the Hard Fault is taken. Hard Faults also do not occur when the processor is in power down sleep modes. Any pending faults will be recognized immediately after wakeup.

Root Causes of Hard Faults

There are several common root causes of Hard Faults in Cortex-M systems:

Invalid Memory Accesses

Accessing invalid memory locations outside of accessible Flash or RAM regions will trigger a Hard Fault. This includes NULL pointer dereferences, stack overflows, and out of bounds array accesses. Enabling the MPU to limit memory regions can help detect invalid accesses.

Unaligned Memory Accesses

Unaligned accesses that do not meet the data alignment requirements of the Cortex-M processor will fault. Common examples are unaligned 32-bit reads on Cortex-M0/M0+/M1 parts without bus fault support.

Divide By Zero

A divide by zero exception will trigger a Hard Fault. Software should check divisors for 0 before performing divide operations.

FPU Errors

Using a disabled FPU or illegal floating point instructions will cause a Hard Fault on Cortex-M4/M7 chips. Make sure to enable the FPU before use.

Unhandled Exceptions

Exceptions without a handler configured will escalate to a Hard Fault. The default handlers like NMI, MemManage, and BusFault should have handlers set up by software.

Stack Overflow

Stack overflows from runaway recursion or large stack frames can corrupt memory and trigger a Hard Fault. Monitoring stack usage and limiting stack depth can help avoid overflows.

Critical System Errors

Critical system errors like RAM parity errors, clock issues, and memory protection errors will Hard Fault. These are usually complex hardware related faults.

Identifying the Root Cause

Identifying the specific root cause of a Hard Fault often requires debugging with a debugger like GDB or IDE, analyzing crash dumps, and/or adding logging and assertions in code. Some techniques include:

  • Inspecting the stacked PC value to identify the fault location
  • Checking CFSR registers for fault status flags
  • Enabling fault diagnostics like MemManage handler
  • Tracing instruction execution to replay crashes

Locating the first point of failure helps narrow down the root cause. Stack overflows may first manifest as a MemManage fault before escalating to a Hard Fault for example. Having good debugging tools, crash logs, and diagnostic handlers set up is crucial for effective root cause analysis.

Timing and Randomness Factors

From a troubleshooting perspective, the two most challenging attributes of Hard Faults are their timing and apparent randomness. The timing and randomness factors can be explained by a few reasons:

Asynchronous Nature

Hard Faults originate from asynchronous events like illegal memory accesses, exceptions, and hardware errors. These can occur at any point in program execution, unlike synchronous exceptions like interrupts.

Intermittent Fault Conditions

Issues like power supply noise, marginal RAM, and temperature fluctuations can cause intermittent faults. The same code may run fine billions of times before a unique combination of conditions triggers a fault.

Complex Hardware Interactions

In complex SoCs, hardware blocks like the memory controller, bus interconnects, clocking, and power domains all interact, often non-deterministically. This can create chaos theory-like “butterfly effects” that add to apparent randomness.

Heisenbugs

“Heisenbugs” are problems that seem to disappear when debugging tools are applied. The added trace logic, lower speeds, and ideal lab conditions mask the underlying issue.

Strategies for Dealing with Timing and Randomness

Despite the challenges posed by the timing and randomness aspects, a systematic approach can help uncover the root causes of Hard Faults:

  • Log fault stack frames, error codes, and runtime trace data
  • Stress test components like RAM and Flash to force latent faults
  • Increase code assertions, traces, and defensive checks
  • Reproduce on real production hardware, not just simulators
  • Evaluate firmware, libraries, stacks, and compilers for defects
  • Consider statistical and probability analysis of failure conditions

Thorough testing under varied operating conditions along with sufficient instrumentation code and debug visibility is key to overcoming the apparent randomness of these faults during development.

Summary

To summarize, Hard Faults on ARM Cortex-M processors represent an unrecoverable halting of the core in response to exceptional conditions in hardware or software. The timing appears random since faults originate from asynchronous events. Real root causes range from invalid memory access, divide-by-zero, uninitialized handlers, stack corruption, to complex hardware failures. Identifying specific fault causes requires rigorous debugging techniques and methodologies given the randomness factor. A combination of test instrumentation, failure analysis, stress testing, and hardware debugging helps overcome these challenges during the development process.

Newsletter Form (#3)

More ARM insights right in your inbox

 


Share This Article
Facebook Twitter Email Copy Link Print
Previous Article Common Causes of Hard Faults on Cortex-M0/M0+ MCUs
Next Article Hard Fault behavior differences across Cortex-M variants
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

2k Followers Like
3k Followers Follow
10.1k Followers Pin
- Sponsored-
Ad image

You Might Also Like

How are the atomic functions implemented in case of ARM architecture?

Atomic functions in ARM architecture provide synchronization capabilities to ensure…

6 Min Read

ARM Cortex M4 Development Board

The ARM Cortex-M4 is a 32-bit processor core licensed by…

8 Min Read

ARM Cortex M0 Cycles Per Instruction

The ARM Cortex-M0 is an ultra low power 32-bit RISC…

7 Min Read

What is the difference between ARM M4 and M0?

The main differences between ARM Cortex-M4 and Cortex-M0 processors come…

8 Min Read
SoCSoC
  • Looking for Something?
  • Privacy Policy
  • About Us
  • Sitemap
  • Contact Us
Welcome Back!

Sign in to your account