Ghosts in the Silicon: Fixing Memory Safety and Surviving Hardware Decay

The Ghosts in Our Machines

At NDSS 2024, Herbert Bos delivered a sobering keynote on the history of memory corruption, aptly subtitled: "Those who don’t know history are doomed to repeat it." For decades, the security community has been stuck in a disheartening cycle—fighting the same persistent memory vulnerabilities and settling for temporary fixes instead of lasting solutions. At NDSS 2026, however, Dan Wallach presented a radically optimistic perspective: we may finally be on the verge of solving the memory safety problem once and for all.

Watching Dan's keynote, I was struck by a sense of coming full circle. Our collaboration began at the NSF ACCURATE (A Center for Correct, Usable, Reliable, Auditable, and Transparent Elections) center, where he led as Principal Investigator and I was an ACCURATE-funded graduate student. Together, we grappled with the complexities of securing critical socio-technical infrastructure. At the core of our collective work in the election verification community was Ron Rivest and John Wack’s concept of "software independence" (PDF): the principle that a system must be structurally verifiable, because the software layer itself cannot be inherently trusted.

What’s most striking about Dan’s work now, as a Program Manager at DARPA's Information Innovation Office (I2O), is how he’s scaling this philosophy to an unprecedented level. The principles of software independence—once applied to voting machines—are being extended to the very foundation of the global Internet. Just as we must eliminate deceptive patterns in user interfaces to protect human autonomy, we must eradicate memory unsafety to secure the autonomy and resilience of our digital infrastructure.

In this post, I want to trace this evolution from software verification down to the raw silicon. We will break down exactly what memory safety means today, look at how DARPA’s TRACTOR program is using AI to automatically translate vulnerable legacy C code into safe Rust, and finally, dive down into the atomic level to examine how the COOP program plans to defend against the rising threat of silent hardware failures.

Deconstructing the Threat: The Flavors of Memory Safety

Before diving into some of the solutions Dan is working on at DARPA, it’s crucial to clarify what "memory safety" actually means. The term hides two primary categories of failure:

Spatial Safety (The Neighborhood): Imagine a computer program as a homeowner. Spatial safety ensures it never builds or trespasses outside its property line. When this fails—a classic buffer overflow—the program can read or write into adjacent memory, possibly allowing attackers to hijack the system or steal data.
Temporal Safety (The Timeline): Now imagine the homeowner demolishes their house and moves away. Temporal safety means the program doesn’t try to use memory it no longer owns. Violations lead to use-after-free bugs, where programs access memory that’s been reassigned, leading to instability and exploitation.

Even if we could magically eradicate all spatial and temporal bugs, other threats persist: logic errors, race conditions, cryptographic flaws, and subtle side-channel attacks. Memory safety is necessary, but it’s not the only ingredient for secure computing.

Scaling Independence: TRACTOR and the Legacy Bottleneck

Today’s crisis is one of scale. The Internet runs on millions of lines of legacy C and C++ code—languages riddled with memory unsafety. While the industry is shifting to safer languages like Rust and Go, manual rewrites are too slow and costly for the global codebase. Google’s incremental Android rewrite is impressive, but we can’t replicate that effort everywhere.

Automated translation is no easy fix. Any tool faces a trilemma:

Correctness: The translated code must behave identically to the original for legitimate inputs, but structurally reject malicious inputs.
Performance: The resulting safe code cannot introduce crushing runtime overhead.
Idiomaticity: The output cannot be an unreadable machine transliteration; it must look like code authored by a skilled human Rust programmer, which is notoriously subjective.

Enter DARPA’s TRACTOR (Translating All C to Rust). TRACTOR blends formal code analysis with cutting-edge AI (LLMs) to achieve what once seemed impossible: converting unsafe legacy code into robust, idiomatic Rust.

To illustrate just how intelligent this translation is, consider this example from Dan's keynote demonstrating a classic spatial vulnerability:

Original C Code:

int main(int argc, char **argv) {
    char buff[8];
    int success = 0;

    // VULNERABILITY: Long inputs will overwrite the success flag.
    strcpy(buff, argv[1]);

    if(!strcmp(buff, "s3cr8tpw")) {
        success = 1;
    }

    if(success) {
        printf("Welcome!\n");
    }
}

Idiomatic Rust Translation:

fn main() {
    let password = if let Some(arg) = args().nth(1) {
        arg
    } else {
        return; // Exit program if no argument provided
    };

    if password == "s3cr8tpw" {
        println!("Welcome!");
    }
}

Notice what happened here. A naive transpiler might have simply wrapped the unsafe strcpy function in an unsafe {} block and kept the dangerous array manipulation. Instead, the AI inferred the intent of the programmer. It recognized that buff was being used as a password container, automatically renamed the variable to password for semantic clarity, completely eliminated the buffer overflow, and introduced safe error handling for missing arguments—a safeguard the original C code lacked entirely.

When Computers Become Analog: The Rabbit Hole of Marginality

Yet, perfect code isn’t the whole answer. At the very end, Dan’s keynote opened a much deeper rabbit hole for me: What happens when flawless, memory-safe software runs on imperfect hardware?

We trust computers because we assume digital determinism—a one is always a one, a zero is always a zero. But at the atomic scale, computers are analog devices. As chips shrink and power thresholds drop, hardware operates closer to the edge—what engineers call marginality. Gizopoulos et al. (2019) (PDF) provides an excellent survey.

Historically, hardware reliability concerns were confined to aerospace and deep-space missions, where cosmic rays could flip a bit in memory—a phenomenon known as a Single Event Upset (SEU). The solution was usually to add radiation shielding or error-correcting code (ECC) memory.

Today, however, the threat model has shifted from extraterrestrial SEUs to terrestrial Silent Data Corruptions (SDCs) within massive hyperscale data centers. As transistors age, degrade, or fluctuate due to minuscule manufacturing variances, the silicon can silently make a math error. The system does not crash; it just confidently returns the wrong answer. In a hyperscaler environment training a massive neural network or routing secure global traffic, a single SDC can cascade into catastrophic logical failures.

Because the hardware is becoming inherently less reliable, the industry has been forced to shift mitigations back up the stack. We are now seeing the rise of compiler-driven mitigations—such as near-Zero Data Corruptions (nZDC) and Compiler-Assisted Software Tolerance (COAST)—where the compiler intentionally injects redundant instructions into the software to double-check the hardware's math in real-time. It is a desperate attempt to patch physical decay with software overhead.

The Physical Oracle: DARPA's COOP Program

If compiler-driven mitigations are a reactionary patch, DARPA’s COOP (Continuous-correctness On Opaque Processors) program, another initiative under Wallach's purview, is the structural cure.

COOP seeks to guarantee continuous computing correctness by effectively treating the processor as a black box and monitoring its physical exhaust. Every time a processor executes an instruction, it emits a physical signature—a microscopic fluctuation in power draw, an electromagnetic emission, or a thermal variation. COOP aims to establish a baseline analog signature for digital correctness.

Instead of relying on the software to double-check the hardware, COOP utilizes these physical side-channels as an undeniable oracle. If an SDC occurs, or if a highly sophisticated cyber attack attempts to alter the execution flow of a supposedly secure program, the physical electromagnetic footprint of the chip will instantly shift. By mathematically analyzing these physical emissions in real-time, COOP can detect and theoretically correct silent errors and malicious deviations before they corrupt the system state.

Building a Resilient Architecture

The arc from Herbert Bos’s warnings to Dan Wallach's DARPA initiatives illustrates a profound evolution in how we must secure the Internet. The days of simply patching the latest buffer overflow are coming to an end.

By utilizing TRACTOR to enforce structural verification at the software layer, and leveraging COOP to enforce physical verification at the hardware layer, we are moving toward a future where computing infrastructure is resilient by design. It is the ultimate realization of the software independence we chased two decades ago—a world where trust is not assumed, but mathematically and physically guaranteed.