ECC stands for Error Correcting Code. It is a mechanism used to detect and correct data corruption errors in computer memory systems. ECC is important for system reliability and data integrity, especially in mission-critical applications.
In Arm Cortex-M series processors, ECC can be enabled for on-chip SRAM memories like Tightly Coupled Memory (TCM) and caches. This allows detection and correction of single-bit errors that may occur in these memories during operation. Enabling ECC provides higher reliability for code and data stored in the on-chip SRAM.
Why ECC is needed for TCM and Cache
SRAM memories like TCM and caches are susceptible to data corruption due to effects like cosmic rays, alpha particles, power supply noise etc. Without ECC, such errors can cause unpredictable system behavior or crashes. ECC helps prevent this by detecting and fixing single-bit errors.
For mission-critical applications like automotive, industrial, medical etc, ECC is a must-have for reliability. Even a single-bit error in a critical variable can be catastrophic in such systems. ECC provides the necessary data integrity to meet safety requirements.
Additionally, at advanced process nodes like 40nm and below, SRAM cells become more vulnerable to data corruption due to reduced operating voltages. ECC becomes even more important for reliability as process nodes shrink.
How ECC works in Arm Cortex-M TCM and Cache
Arm uses Single Error Correction Double Error Detection (SECDED) ECC for protecting on-chip SRAM. Every 32-bit word in the TCM or cache has an additional 7-bit ECC code associated with it.
When a 32-bit data word is written to the TCM/cache, the ECC logic generates and stores a 7-bit ECC code for that word. This ECC code is calculated based on the data bits using a specialized algorithm.
When the data word is read back, the ECC logic recalculates the ECC code based on the 32 data bits. It compares this to the 7-bit ECC code that was stored earlier. If the two ECC codes match, there is no error. If they don’t match, it means the data got corrupted at some point.
If there is a mismatch, the ECC logic can examine the ECC codes to identify the exact bit that got flipped. It then flips this bit back to correct the error. ThisSECDED ECC allows single-bit errors to be detected and corrected in this manner.
If the ECC logic detects two or more bit errors, it will raise an exception and notify the processor. The system software can then take appropriate corrective action.
Enabling ECC on Cortex-M TCM and Cache
ECC support for TCM and cache needs to be enabled during processor configuration and chip design stages. The ARM processor IP will include additional ECC logic when this feature is turned on.
ECC checkbits are generated and stored in additional SRAM added for this purpose. For 32-bit TCM/cache with 7-bit ECC, about 22% extra on-chip SRAM is required to store checkbits.
During system boot, ECC detection and correction needs to be explicitly enabled by setting certain registers. The processor system architecture and operating system needs to be ECC-aware to leverage the feature.
For Cortex-M7 and newer cores, ECC can be enabled for both TCMs and the unified cache. Older cores like Cortex-M3 and M4 only support ECC for TCM, not for caches.
Software Usage
To leverage ECC, the compiler toolchain needs to insert appropriate ECC setup code during program startup. Linker scripts must allocate ECC bits alongside TCM/cache memory. OS drivers need to enable ECC decoding on memory reads.
ECC exceptions need to trigger handler code that attempts recovery or restart. Periodic ECC status checks and scrubbing routines should run to detect errors proactively.
Most ECC usage details are encapsulated by the runtime library, OS and drivers. Application software typically does not need ECC-specific handling. However some mission-critical software may want to check ECC status explicitly.
Performance and Power Implications
Enabling ECC causes some performance overhead during memory writes, as ECC bits have to be calculated and written. However, the cycle timing overhead is minimal on modern cores like Cortex-M7.
For reads, ECC checking is off the critical path and does not affect cycle count. However, power consumption increases slightly as ECC logic remains active for every read.
Overall, the power and performance overhead from ECC is modest. The reliability benefits typically outweigh the small cost for mission-critical applications requiring high data integrity.
Comparison to Parity Protection
Parity is a simpler form of error detection using a single parity bit per data word. Unlike ECC, parity cannot correct errors, only detect them. Parity also cannot identify which bit got corrupted.
Parity offers lower cost, lower latency and lower power than ECC. However, ECC provides higher data integrity than parity alone. In many cases, the reliability of ECC is preferable despite marginally higher overheads.
Conclusion
ECC support on internal TCM and cache memories enables Arm Cortex-M cores to meet very high reliability requirements. ECC detects and fixes single-bit errors that may occur in on-chip SRAM arrays due to environmental disturbances or device issues. For applications demanding high data integrity, the protection of ECC is an important feature in the Cortex-M family.