Embedded systems pretty much run everything today, especially critical medical, mobility and automation equipment. Safe operation of the equipment is paramount in these applications.
A recent project allowed me the chance to implement such a safety feature into the micro controller firmware. Functional safety is a vast topic, so here I'll be briefly covering the implementation used for IEC60730 Class B compliance of the product.
Electronics components undergo wear and tear at the basic level. Radiation can corrupt state of bits in memory or internal peripherals may fail. Functional safety requires that the system be able to identify a fault and enter into a safe state preventing damage to itself and its surrounding.
This is done by running checks at startup and at periodic intervals in between regular operation of the equipment. In this post I'll be discussing on how the micro controller core can perform functional safety checks on itself.
IEC60730 Class B compliance requires the following checks to be performed CPU register check, System clock source check, ADC check, RAM check, ROM Check, IO Check.
CPU Register Check
The CPU is the most important part of an embedded system and IEC60730 compliance requires that the working registers be checked for stuck at faults. This is a condition where a bit or bits of a register would be permanently stuck at either 1 or 0. The controller would perform all sorts of wrong calculations if this were to happen.
For an 8-bit system for example, detection of such a failure requires writing 0xAA to one of the working registers, copying it to the next register and comparing the two. Repeating it for 0x55. A comparison failure indicates a stuck bit failure.
Also important to cover the program counter (PC), stack pointer (SP) and link register (LR) in this check.
Clock Check
The system clock is the main driver for the CPU and almost every other peripheral of the micro controller. That's why its important that any deviation in clock frequency be detected. Improper clocks will affect task timing, PWM outputs and system stability.
A secondary independent clock source is required to perform this check. Usually done using the low speed internal RT clock.
In the ISR for the secondary clock, check for consistency in the number of cycles of the system clock.
RAM Check
The RAM can be checked in two methods, using the March C algorithm and maintaining a shadow inverse copy of the working RAM.
The March C algorithm is a destructive method of testing, wherein the entire RAM is first cleared by writing $00, then $FF, verify followed by a checker board pattern $AA, verify, $55, verify. It is therefore only to be done at the start of your application before any variables are initialized.
Periodic testing of RAM will require maintaining an inverse copy of the working RAM. This effectively halves the amount of RAM that can be used, a resource very expensive in embedded systems. So for a byte $F0, there should be a byte $0F in the shadow space. The shadow copy should be modified whenever the main copy is modified. The copies are then compared to find RAM stuck bit faults.
ROM Check
ROM holds program opcodes and const data. The check is done to ensure integrity of data for proper execution.
At the end of binary generation, a checksum is calculated of the entire binary output. This checksum is stored, usually at the last word address of ROM.
During ROM check, the application calculates a checksum for the entire ROM and compares it with that stored at the last word address.
Calculating CRC will cost CPU time. So the CRC calculation should be divided into blocks of memory and done during idle time between the main application.
IO Check
This check can be expensive cause of the 1oo2 implementation required. Critical inputs should be physically routed to another input pin on another port all together. Similarly, output pins should be routed to input pins on another port.
Implementing the check would involve verifying the the functioning of the input or output against the state of its copy input pin.
ADC Check
Checking the ADC is divided into a number of steps
A. Vref validation. The instantaneous Vref ADC count is compared with a preset value, keeping some tolerance for deviation if possible.
B. Stagnancy check. Analog input always has variations. These variations can be used to check if an input has failed by checking for constant count over an interval of time.
C. Value within bounds. Here the counts are checked for being within set upper and lower boundaries
D. Multiplexer operation. To verify that the internal multiplexer of the analog peripheral is working, the delta of maximum and minimum values of a channel is compared against that of the VREF input. Delta values within tolerance verify correct operation of the hardware.
Chip manufacturers ST and Microchip provide library packages for their micro controllers that'll simplify implementation of these checks.
Build safe !


.png)
Comments
Post a Comment