What the Model Has to Forget: Selective Forgetting as a Public Health Tool

A 'Pier Closed' sign stands at the entrance to a wooden walkway overlooking a restless ocean.
"Safety Protocols"

Case of the WeekMin Wu, PhD  ·  ai-public-health.com

I took the Gaokao — China's national college entrance exam — many years ago. By any reasonable accounting, I should be done with it. I passed. I went to university. I built a career. The exam ended for me decades ago.

But every now and then, I still wake up from a dream about it. The dream is usually some version of the same scene: I am about to walk into the exam hall and I have forgotten my ID. Or I cannot find the room. Or the test has already started and I am running. The dream does not appear because anything in my current life resembles Gaokao. It appears because the memory was trained into me at a moment when the stakes were enormous, and it does not seem to know that those stakes are gone.

I cannot decide to stop having that dream. I cannot reason my way out of it. The memory is not in a notebook I can close. It is somewhere deeper, in patterns my mind absorbed so completely that it still surfaces them on its own.

I tell that story because it is what came to mind when I was thinking about the question I left open at the end of last week's post.

The previous post was about diagnostic bias in AI tools used in medicine and public health. The argument was that the bias did not start with the model. It was inherited from the data the model learned from — data built decades ago, on a default "universal patient" that was never universal. The detective work was tracing the noise from the kitchen back to the basement, where the actual leak was.

That post diagnosed. This one is about what comes after the diagnosis.

Once you have found the leak, what do you do? You cannot un-train the models that already exist. You cannot un-publish the trial datasets that were built without women, or without dark-skinned patients, or without patients from low-income regions. The data is already there. The models trained on that data are already deployed. Forward-only solutions — better future trials, better future practices — leave the contaminated training corpus in place, and the self-reinforcing loop continues without interruption.

The fix has to be retrospective. The model has to forget. And as the Gaokao dream reminds me, forgetting something that was deeply trained is not a simple act of will. It is a technical problem.

From databases that never forget to models that must

For most of my career in health informatics, the goal of a good data system was simple: do not forget. The database era — roughly the late 1990s through the 2010s — was built on the premise that the value of digital memory was in its perfection. PubMed, CDC WONDER, ClinicalTrials.gov, VAERS, SEER. These projects defined what a good public health data infrastructure looked like, and they all shared the same memory ideal: store everything, lose nothing, retrieve exhaustively.

That ideal made sense in the database era because the database was a library. You walked in, looked something up, walked out. The data sat still while you used it.

AI memory does not sit still. AI memory is closer to a training gym than a library. The same data that gets stored also shapes the model that learns from it — and once it has shaped the model, it is not easy to remove. The model has absorbed the patterns. The weights of the network now encode what was in the data, including what was wrong with it. If the data underrepresented women, the model has learned women as the exception. If the data overrepresented light-skinned patients, the model has learned light skin as the default.

The bias is no longer in a row of a database that you can edit. It is distributed through the weights of a system that has already been trained. It is more like a Gaokao dream than a notebook entry. It surfaces on its own, in patterns we did not consciously decide to keep.

Which means the rules for managing public health data are changing. Forgetting, which the database era treated as a weakness, is becoming a design requirement.

In the database era, the goal of memory was that nothing be forgotten. In the AI era, one of the most important things a public health AI system can do is forget the right thing.

"The deep-dive portion of this weekly case is reserved for members. Subscribe for free to unlock the full analysis instantly."