privacy-judge · DPA-defensible PII-prevalence audit at a scale where exhaustive scanning is intractable
privacy-judge

Product

A calibrated prevalence bound you can defend.

Not "we find your PII" (every scanner does that, cheaper). The artifact is a misclassification-corrected corpus-prevalence estimate with a confidence interval, stratified and drift-corrected. A calibrated, documented bound, built to align with EU regulator guidance on identifiability (Art. 29 WP216) and AI-audit accuracy (ICO), a bound a DPO can stand behind once validated on your corpus. Zero of thirteen incumbent scanners emit this.

Runs local / BYOK GDPR 3-tier taxonomy Singling-out verdict $364/cycle, fixed in corpus size

Packaging

Three tiers. The first two are the flagship engagement.

Flagship · entry

Prevalence Diagnostic

$45-65k
one-time

Sampled, stratified, misclassification-corrected corpus-prevalence audit with a CI, the GDPR 3-tier taxonomy (direct / quasi-indirect / Art. 9), a document-level singling-out verdict, and local/BYOK execution. A DPA-defensible report.

Engage
Flagship · recurring

Drift-Monitoring Retainer

$2-12k/mo
recurring

The drift sentinel as a service. We re-run the sample on cadence, detect when the live judge-score distribution diverges from calibration, re-calibrate, and re-issue the bound. Your audit number stays valid as the corpus shifts.

Engage
Adjunct · self-serve

audit-that-ships

Low
per-commit / self-serve

Per-commit fairness artifact: a tamper-evident, commit-SHA-keyed, framework-mapped pass/fail record fired on push. Sells the artifact and the cadence, not the math.

Buy (Stripe — placeholder)
TODO (before launch): the "audit-that-ships" button points at a PLACEHOLDER Stripe Payment Link. Replace buy.stripe.com/test_PLACEHOLDER_REPLACE_ME with the real link. Do NOT wire a live Stripe backend here.

Why this, not Google SDP

A raw scan count is uncorrected. The corrected bound is the statement you can defend.

Google Cloud Sensitive Data Protection profiles at $0.03/GB and returns a per-finding likelihood. At petabyte scale that is billions of individual findings you cannot adjudicate, and a DPO cannot stand behind count(findings). We sample, divide out the detector's own false-positive and false-negative rate, and bound the corpus with a CI. The human-adjudication cost is roughly fixed in corpus size. That is the edge, and it only exists at scale.

Scope a diagnostic

Tell us the corpus and the regulatory exposure. We come back with a sampling plan, a gold-label budget, and a fixed price.

All demo data is synthetic. We never ship real PII to detect PII.

Validated so far on synthetic and Presidio-style corpora on dev splits, not yet on a live customer corpus. The first engagement is that validation, and we sell it as such. The interval bounds sampling error under the stated method; it does not bound disagreement over what counts as personal data under the GDPR tiers, which is settled with your annotators in the diagnostic.