A watchdog timer (WDT) is an essential component in real-time embedded systems, especially in multitasking environments. It provides a recovery mechanism to detect and handle system malfunctions. This article explains the role, working, and implementation of a general purpose WDT module suitable for multitasking systems using ARM Cortex processors.
Role of a Watchdog Timer
A WDT is basically a timer that needs to be periodically refreshed by the system software. If the software fails to refresh it within a configured time interval, the WDT will reset the system or trigger an interrupt. This acts as a safety mechanism to bring back the system from an unknown locked state to a known good state.
In a complex multitasking system, any task can get stuck in an infinite loop or deadlock situation. A WDT can detect such system malfunctions and reset the system before an unintended state is reached. The periodic refresh from software is an indication that the critical system tasks are functioning normally.
Working of a Watchdog Timer
A WDT peripheral has a counter register that increments every clock cycle. The counter has a maximum value equal to the timeout period. When the counter reaches the maximum value, the WDT resets the system or interrupts the processor. The software periodically writes to a refresh register to reset the counter and prevent it from timing out.
For example, if the timeout period is 1 second and the system clock is 1 MHz, the maximum counter value will be 1,000,000. The software must refresh the WDT within 1 second to avoid a timeout reset. This periodic refresh rate is called the WDT window.
Timeout Period
The timeout period determines how soon the WDT can detect a system malfunction. A shorter timeout period will detect errors faster but requires more frequent refreshes. Typical timeout periods range from 100s of milliseconds to a few seconds.
Watchdog Window
The watchdog window is the periodic rate at which the software must refresh the WDT. It is always shorter than the full timeout period to provide an error margin. A common setting is to have a window that is 50-80% of the total timeout.
Reset Action
When a timeout occurs, the WDT can either reset the system or generate an interrupt. Resetting the system completely reinitializes all hardware and software states. Generating an interrupt gives a chance for the software to recover from the error.
Features of a General Purpose WDT
A general purpose WDT module suitable for multitasking embedded systems would have the following key features:
- Configurable timeout period and watchdog window
- Option to generate a system reset or an interrupt on timeout
- Ability to start/stop the WDT after system initialization
- Status flags to indicate refresh status and timeout events
- Option to lock down the WDT from unintended disables
- Interrupt generation on various fail cases like disabled WDT, refresh error etc.
WDT Design Considerations
Some key considerations while designing a general purpose WDT are:
Timeout Value
The WDT timeout should be long enough for the software to refresh it under normal operation, but short enough to detect errors quickly. Typically 10s of milliseconds to a few seconds.
Watchdog Window
The window should account for worst-case refresh jitter from the software. Typically 50-80% of the full timeout.
Criticality of Tasks
For less critical tasks, a timeout can trigger an interrupt to attempt recovery. For critical tasks, a reset is safer. Both options should be available.
Refresh Design
Refreshing the WDT should have defined interfaces and proper access mechanisms. Restricting refreshes only from the OS kernel is a good design practise.
Recovery Design
The reset handlers or interrupt service routines should be designed to put the system into a known good state on WDT events.
Fail-safety Features
Mechanisms to detect WDT refresh failures and WDT disable attempts should be present. The WDT module should be resilient against unintended disables.
Example Implementation on ARM Cortex-M
Here is an example implementation of a general purpose WDT peripheral for an ARM Cortex-M based microcontroller:
Hardware
- Free running counter clocked by the core clock with reload value for timeout period
- Refresh register to reset the counter and refresh the WDT
- Status flags for timeout event, refresh status etc.
- Control bits to start/stop timer, generate interrupt/reset, lock enable etc.
- Connect counter timeout to system reset circuitry and processor NVIC
Software
- Initialize WDT peripheral with timeout value and clock source
- Configure and enable WDT interrupt in the NVIC
- Start WDT after system initialization
- Refresh the WDT periodically within watchdog window by writing to the refresh register
- WDT interrupt service routine to handle recovery
- Backup reset handler to put system into known good state
Usage
- Configure required timeout and window values
- Enable WDT during system startup
- Refresh the WDT at least once every window period
- Service WDT interrupt and backup reset handler on timeout events
- Optionally lock down WDT to prevent unintended disable
This gives a flexible yet robust WDT implementation that can service a wide range of use cases on an ARM Cortex-M based processor.
Summary
A general purpose watchdog timer module is an essential reliability feature in multitasking systems. The configurability, failure handling mechanisms and ease of use determine how effective it is in practice. A well-designed WDT can detect and recover from a wide range of system malfunctions.