Post
23
A patient's name in a clinical note. Their voice in an ASR transcript. A waveform header with their DOB. Three separate records, each one low-risk by itself. Together, they're enough to re-identify someone.
Per-record de-identification doesn't see this. It can't. It has no memory of what came before.
We built a system that does. AMPHI tracks PHI exposure across modalities and time, maintains a risk score per patient, and escalates masking automatically as exposure accumulates. When a text record and an audio record co-reference the same patient via embedding similarity, the system catches it and responds before the next record arrives.
The core results: adaptive policy holds privacy at 0.991 on high-risk bursty workloads while keeping utility at 0.847. Static redaction gets the privacy number but destroys utility. Static weak masking keeps utility but leaks on high-risk bursts. The adaptive system doesn't trade one for the other.
Full system is open-source. Five models, three datasets, two demo spaces, 141 passing tests.
Spaces: [AMPHI Demo]( vkatg/amphi-rl-dpgraph) | [DCPG Scorer]( vkatg/dcpg-scorer-demo)
Models: [DCPG Encoder]( vkatg/exposureguard-dcpg-encoder) | [Cross-Modal Risk Scorer]( vkatg/dcpg-cross-modal-phi-risk-scorer) | [PolicyNet]( vkatg/exposureguard-policynet) | [FedCRDT]( vkatg/exposureguard-fedcrdt-distill) | [DAGPlanner]( vkatg/expsoureguard-dagplanner)
Datasets: [Multimodal PHI Masking]( vkatg/multimodal-phi-masking-benchmark) | [Streaming De-ID Benchmark]( vkatg/streaming-phi-deidentification-benchmark) | [DAG Remediation Traces]( vkatg/dag_remediation_traces)
Code: [phi-exposure-guard on GitHub](https://github.com/azithteja91/phi-exposure-guard)
Per-record de-identification doesn't see this. It can't. It has no memory of what came before.
We built a system that does. AMPHI tracks PHI exposure across modalities and time, maintains a risk score per patient, and escalates masking automatically as exposure accumulates. When a text record and an audio record co-reference the same patient via embedding similarity, the system catches it and responds before the next record arrives.
The core results: adaptive policy holds privacy at 0.991 on high-risk bursty workloads while keeping utility at 0.847. Static redaction gets the privacy number but destroys utility. Static weak masking keeps utility but leaks on high-risk bursts. The adaptive system doesn't trade one for the other.
Full system is open-source. Five models, three datasets, two demo spaces, 141 passing tests.
Spaces: [AMPHI Demo]( vkatg/amphi-rl-dpgraph) | [DCPG Scorer]( vkatg/dcpg-scorer-demo)
Models: [DCPG Encoder]( vkatg/exposureguard-dcpg-encoder) | [Cross-Modal Risk Scorer]( vkatg/dcpg-cross-modal-phi-risk-scorer) | [PolicyNet]( vkatg/exposureguard-policynet) | [FedCRDT]( vkatg/exposureguard-fedcrdt-distill) | [DAGPlanner]( vkatg/expsoureguard-dagplanner)
Datasets: [Multimodal PHI Masking]( vkatg/multimodal-phi-masking-benchmark) | [Streaming De-ID Benchmark]( vkatg/streaming-phi-deidentification-benchmark) | [DAG Remediation Traces]( vkatg/dag_remediation_traces)
Code: [phi-exposure-guard on GitHub](https://github.com/azithteja91/phi-exposure-guard)