One-line summary: Use compiler to insert additional instructions that duplicate the result of instructions in the actual codepath, using otherwise-unused shadow registers. Before each memory store or conditional branch (that depends on a register), compare the shadow register and real register, and jump to an error handler if mismatch. The idea is that if alpha particles cause bitflips in registers, this technique will catch the error before it is allowed to cause side effects. Authors show (by fault injection) that with this technique, only 1.5% (rather than 20%) of injected bit flip errors eventually propagate to have bad side effects. Also, since superscalar pipelines are underutilized anyway, this can be done without signifiacnt performance impact compared to unguarded code.
Authors successfully used this technique to do software-only FT on a non-space-qualified, COTS motherboard on a small satellite; they haven't had to reboot it yet. I'm not sure how many soft errors were actually detected and caught. (Its space-qualified counterpart failed early in the mission, ironically.)
Intel is putting in shadow hardware and more ECC in its internal structures, I imagine they know about this?
I assume the reason checks are done before a memory write is that memory has its own protection?
What is the relationship between this kind of checking vs. end-to-end application-level checking?