EU-Cookies Compliance Audit Czech News Center · Snapshot 2026-04-27
Snapshot2026-04-27 Findings28 Critical8

Brief · 01

What this is.

The EU-Cookies Compliance Monitor visits every CNC website on a schedule, attempts to accept the cookie banner, and records the tracking activity it observes — both before and after consent. Its purpose is to give us, on demand, the evidence to show that our properties only track users with valid consent, in line with GDPR and the ePrivacy Directive. This brief explains what the tool does well today, where it falls short, and what needs to change. It is written for leadership, product, and legal — not for engineering.

28
Issues identified
8
Need immediate fixing
100%
CNC properties scanned
~Q
Quarter to remediate

Today, the tool tells us where we might be exposed. It does not yet give us the evidence to defend ourselves if we are. Closing that gap changes our position from "we believe we are compliant" to "here is the proof".

Brief · 02

What works today.

The tool is reliable enough for internal monitoring and engineering triage. The list below describes its current capabilities — what we can and do depend on right now.

  • Automatic scanning. Every CNC web property is visited on a recurring schedule. No human has to launch a scan.
  • Cookie and storage detection. Each scan records the cookies and local browser-storage activity it observes during the visit.
  • Before / after consent split. Each observation is labelled as before or after the consent click, so reports can compare the two states.
  • Weekly summary. A digest of the week's findings is delivered automatically to leadership channels.
  • Engineering triage signal. Obvious problems — for example, an unfamiliar third-party tracker — surface in time for engineering to investigate.

For internal use, the tool earns its keep. The remainder of this brief is about the gap between useful internally and defensible in a regulatory audit.

Brief · 03

Three critical problems.

These are the items that, in our judgement, would be raised first by a Data Protection Authority reviewing the tool's output. Each is stated in business terms; the technical detail lives in the Technical Audit.

A

We cannot reliably prove when tracking starts.

Every observation is labelled as before consent or after consent. The way that label is decided depends on millisecond-level timing inside the browser, and it can be wrong in either direction. Some observations marked "after" may actually have happened "before", and the reverse.

The label is the foundation of every compliance claim we make. The foundation is not as solid as the report makes it look.

What this means for us: any number we report — "X trackers fire before consent" — sits on shaky ground.

B

We do not keep the full original evidence.

When a tracking value is recorded, only the first 50 characters are stored in the central database, and the cookie name is rewritten into a simplified form. The full version exists in side files, but the database — which is what the reports query — does not match what the website actually did.

A regulator who asks "show me what the site set" sees a summary. The original is no longer in the system of record.

What this means for us: we can detect issues but we cannot evidence them at the level a regulator expects.

C

We do not see all forms of tracking.

The tool monitors the two oldest types of browser storage. Modern advertising platforms, analytics tools, and most consent management vendors now use newer storage technologies that the tool does not look at. They could be active on our pages and we would not know.

The tool also captures only the click that accepts consent — not the standardised consent record that browsers and advertising partners rely on. We know consent happened; we do not record what consent was.

What this means for us: the figures we report today are a floor, not a ceiling.

Brief · 04

The risk landscape.

The three problems above translate into three kinds of business risk. Each is independent of the others, and each is material on its own.

Regulatory

The question we cannot answer.

A Data Protection Authority asking "show me every non-essential tracker set on a CNC site before the user consented" cannot be answered today with confidence. The classification is unstable. The evidence is partial. The records are mutable.

Legal

A weak position under challenge.

If our compliance is challenged, our position rests on records that have been silently shortened, partly rewritten, and may be misclassified. Cross-examined under those conditions, the defence is weak. Any remediation order would start with re-collection — months of work.

Reputational

A narrative we cannot afford.

"They said they were compliant, but could not prove it." That sentence damages trust simultaneously with users, advertising partners, and regulators. CNC's editorial credibility is a competitive asset; any compliance episode that calls it into question carries cost beyond the legal one.

Brief · 05

What we need to do.

Six high-level actions, each addressing a specific gap above. Together they take the tool from internal monitoring to defensible evidence. The technical audit lays out the order and the implementation; what follows is a leadership-level view.

  1. Make the consent boundary precise.

    Record the exact moment the user consents. Tag every observation with the time it happened. Decide which side of the boundary it falls on at report time, not at collection time.

  2. Keep the original evidence intact.

    Stop shortening cookie values in the database. Store the cookie name as observed, alongside any normalised form. The database becomes the system of record.

  3. Record what consent was actually given.

    Capture the standardised consent record that browsers and ad partners rely on — not just "a button was clicked".

  4. Extend coverage to modern tracking storage.

    Or, at minimum, disclose in our methodology what the tool does not look at, and what that excludes. Silent gaps are worse than disclosed ones.

  5. Make every scan tamper-evident.

    Each scan produces a sealed, signed evidence pack with a verifiable fingerprint. We can prove, on demand, that the data has not been edited after the fact.

  6. Record what happened on each scan.

    Did the page load? Was the consent banner found? Which one? Was consent accepted? The current report has rows of cookies but no run-level context. Add it.

Brief · 06

What good reporting looks like.

Engineering reads these reports to find something broken. A Data Protection Authority reads them for proof we can hand over without a footnote. The reports we produce today are fine for the first reader. They aren't built for the second.

What we produce today, and what's missing.

Each scan writes per-site CSV files to disk and inserts rows into a central database. Both are queryable when an issue surfaces. What's missing is a self-contained pack per scan that someone outside the team can open, with the data signed so anyone can verify it hasn't been changed. We also don't produce a PDF, which is the form most regulators expect a report to arrive in.

A complete report answers three questions.

  1. What happened?

    Every cookie and tracking item recorded during the visit, with the original value intact and the source it came from.

  2. When did it happen?

    The exact moment of each event, and which side of the consent boundary it falls on — before consent, after consent, or with no valid consent.

  3. Can this be trusted?

    The data has not been edited since the scan ran. The run can be reproduced. The evidence is signed.

The evidence package.

Each scheduled scan produces one sealed ZIP. The same pack is the source of every downstream report — leadership digest, regulator submission, legal review.

  • runs.csv — how the scan was configured and executed (browser version, geography, software fingerprint, timing).
  • visits.csv — what happened on each site: consent state, banner detected, errors, timestamps.
  • observations.csv — every tracking event observed, with raw values and source.
  • screenshots/ — visual proof of each page state, including consent banner and post-consent view.
  • har/ — full network log per visit, in the standard HTTP-archive format any inspector can read.
  • manifest.json — list of every file in the pack with a cryptographic hash.
  • manifest signature — proves the manifest itself has not been altered.

The pack is uploaded to append-only storage where it cannot be edited later. Anyone — including a regulator — can verify on their own machine that the package they received is byte-identical to the package we produced.

What this looks like in practice.

Example 1 — Two-minute leadership view

Leadership digest reflex.cz · scan 2026-04-27 13:00 UTC
42
Tracking events recorded
7
Set before consent
35
Set after consent
accepted
Consent state

Headline. Seven non-essential trackers fired before the consent banner was accepted. Risk: high. Each of the seven appears in the regulator-facing table below with full evidence.

Example 2 — Regulator-facing table

Pre-consent observations reflex.cz · 7 of 42 events
CookieWhenSourceTypeRisk
_ga_HJ4PMK62Q113:24:55.812Google AnalyticsNon-essentialHigh
_fbp13:24:56.108Meta PixelNon-essentialHigh
_gid13:24:55.901Google AnalyticsNon-essentialHigh
session_id13:24:54.502First-partyEssentialOK
didomi-session13:24:54.610Didomi (CMP)EssentialOK
+ 2 further pre-consent rows in observations.csv

Example 3 — Single-cookie evidence card

The most important artifact. The answer to a regulator's question about a specific cookie. Every field is sourced directly from one column of the evidence package — no interpretation, no derivation beyond the consent-phase label, which is itself reproducible from the timestamps below.

Cookie_ga_HJ4PMK62Q1 normalised: _ga_*
Sitereflex.cz https://www.reflex.cz/
SourceGoogle Analytics gtag.js
Observed at2026-04-27 13:24:55.812
Consent accepted at2026-04-27 13:25:01.044
PhaseSet before consent (5.2 s before the user accepted the banner)
TypeNon-essential · analytics
Domain.reflex.cz
Expires2027-04-27 (1 year)
Value (head)GS1.1.1714224295.1.0.1714224295…
Value SHA-2567b9a3c3f…d28e
From runstd-7c4f9a32-1b25-4ea2-9b6c-77f2a9c0b1a4

Example 4 — Evidence integrity

Run IDstd-7c4f9a32-1b25-4ea2-9b6c-77f2a9c0b1a4
Date2026-04-27 13:00 UTC
Files in pack118 (CSVs · screenshots · HAR · manifest)
Manifest SHA-256a91f4d66bc8e5d1ae772f3c4a90d11e2…c82d
SignatureVerified · published to Sigstore Rekor (transparency log)
StorageAppend-only object storage · 6-year retention lock

Anyone holding the package and the public key can verify that what we delivered is byte-identical to what we produced on 2026-04-27. If a single character of any file changes after the fact, verification fails. This is the integrity guarantee a regulator expects, and the one we cannot offer today.

Today versus target.

QuestionTodayTarget
What did the site set?Per-site CSVs plus shortened database rows that disagree with each other.One observations.csv per run with raw, full values.
Was consent given, and how?"The button was clicked" — nothing more.Recorded consent state, vendor, and standardised consent record.
Can this be reproduced?No software-version or geographic data on the run.Reproducibility metadata in runs.csv.
Has the report been edited?No mechanism to tell.Signed manifest in append-only storage; verifiable by anyone.

What this enables.

  • We can answer regulators directly. "Here is the exact tracking before consent on this run, in machine-readable form, and here is the integrity proof."
  • We can defend our position. "Here is the proof of what consent was given and exactly when, and here is the gap between the two."
  • We can reproduce any past result. "Here is the configuration that produced this evidence; anyone with the same tool version can re-run it."

What happens if we do not build this.

  • We can describe compliance. We cannot evidence it to a regulator's standard.
  • Findings can be challenged — and we have no signed record to push back with.
  • If a remediation order arrives, the first step is "re-collect everything" — months of additional work before we even start fixing.

A report is not enough. We need defensible evidence. That is the single line that separates today's monitoring from tomorrow's audit posture.

Brief · 07

Where this leaves us.

The tool tells us where we might be exposed. It does not yet give us the evidence to defend ourselves if we are. Closing that gap is roughly a quarter of focused engineering work, and it changes the conversation from "we think we are compliant" to "here is the proof".

For the engineering and legal review, continue to the Technical Audit. It contains the full evidence trail, the verified findings with severity ratings, the proposed data model, and the deployment plan.

Section 01

What this is, in one paragraph.

The EU-Cookies Compliance Monitor visits every CNC web property, attempts to accept the consent banner, and records every cookie and LocalStorage write it observes — before and after consent — into MariaDB and CSV. This site is the result of a code-level audit of that tool. It documents how the tool works today, where the logic falls short under regulatory scrutiny, and the concrete plan to make the evidence defensible to a Data Protection Authority.

28
Verified findings
32
Codebase facts catalogued
8
Findings rated P0
10
Findings rated P1

The three risks a regulator will raise first

  1. The pre/post-consent classification is not reconstructable.

    A mutable closure variable is read inside async event handlers after one or more await points. A pre-consent Set-Cookie can be filed as post-consent, and vice versa. The legal split that drives every ePrivacy assessment sits on top of that flag.

    F-001, F-005 · src/cookies-checker.ts:166–182, src/lib/EdpsCookieStore.ts:286–296

  2. Primary evidence is destroyed at write time.

    The cookie value is truncated to 50 characters before insert; the cookie name is rewritten to its regex-normalised form; the original observed name is never stored. The DB is not the cookie the site set — it is a summary.

    F-009, F-010 · src/lib/EdpsCookieStore.ts:21–29, 486, 508–518

  3. Storage scope is narrower than the headline implies.

    The interceptor hooks only document.cookie and window.localStorage. Modern trackers and most CMPs use IndexedDB; the IAB TCF v2.2 TC-string is never captured. There is no run-level metadata table to anchor reproducibility.

    F-014, F-016, F-025 · preload/preload-trace.js, src/tools/dbcreate.ts

P0 · 8 P1 · 10 P2 · 5 P3 · 5

28 findings, all keyed to verified file:line evidence in docs/audit/state.json.

Section 02

How the scanner works today.

Each script invocation produces one pipeline_id and walks every site listed in config/sites-config-full.json sequentially. Per site, the flow below runs end-to-end. The numbered steps are mirrored exactly in src/cookies-checker.ts.

  1. 1

    Browser launch

    One Chromium instance for the whole script run. --test-third-party-cookie-phaseout is passed in every mode (standard / stealth / headed).

    src/cookies-checker.ts:80–95 · F-015

  2. 2

    Per-site context, JS interceptors injected

    A new browser context (UA pinned to Chrome 126, viewport 1920×1080) is created. preload/preload-trace.js overrides document.cookie's setter and proxies window.localStorage. Two listeners are attached: context.on('console') and context.on('response').

    src/cookies-checker.ts:115–164 · S-006, S-001

  3. 3

    Page load + 5 s wait

    Page is opened with waitUntil: 'domcontentloaded', then the script sleeps 5 000 ms hoping the consent banner has rendered.

    src/cookies-checker.ts:170–171 · S-004

  4. 4

    Pre-consent JAR snapshot, tagged afterConsent=false

    Native context.storageState() is dumped. Every cookie and every LocalStorage entry sitting in the jar at this moment is recorded.

    src/cookies-checker.ts:173–177 · S-009

    P0 The flag is a closure variable, not a timestamp boundary. Cookies arriving in steps 2–3 may be classified inconsistently with this snapshot. F-001
  5. 5

    Consent click

    Inside the page, document.querySelector('button[id*="didomi-notice-agree-button"], button[id*="cpexSubs_consentButton"]')?.click(). On success, the closure variable afterConsent flips to true.

    src/lib/uniweb-site.ts consent() · S-002

    P1 Selector matches Didomi and CPEx only. OneTrust, Cookiebot, TrustArc, Sourcepoint, iframe-hosted banners all fall through silently. F-003
    P0 If the click throws, afterConsent never flips and the run continues — the post-consent dump is then mis-labelled as pre-consent. F-002
  6. 6

    10 s wait

    Hard-coded adLoad = 10_000. Late-loading SDKs that finish after this window are missed.

    src/cookies-checker.ts:23, 199 · F-004

  7. 7

    Post-consent JAR snapshot, tagged afterConsent=true

    Second context.storageState() dump. The fix from Finding 16 prevents double-counting cookies that persisted from snapshot 1, but the type field of merged rows still reflects the first source seen.

    src/cookies-checker.ts:201–205 · S-009

    P2 JAR provenance is lost when collapsed onto an HDR/JS row. F-006
  8. 8

    CSV write + DB insert

    CSV at csvoutput/cookiesoutput-<site>.csv writes the full cookie value. The MariaDB cookies table inserts truncate(c.value, 50) — first 50 chars + .... The cookie name written to both is the regex-normalised form, not the as-observed name.

    src/lib/EdpsCookieStore.ts:435–506 · S-012, S-013

    P0 CSV and DB diverge. Observed cookie name is never persisted. F-009 / F-010

How records are de-duplicated

The dedup logic decides what counts as "the same cookie observed twice". The four code paths use four different keys.

Path Dedup key Source Issue
addCookie (JS, HDR) name + domain + afterConsent EdpsCookieStore.ts:286–296 P0 domain="" collision with JAR
addPlaywrightStorageState cookies (JAR) name + domain EdpsCookieStore.ts:393–405 P2 Loses type provenance on merge
addPlaywrightStorageState LS (JAR) name + host EdpsCookieStore.ts:417–431 Asymmetric with event-driven LS path
addLocalStorage (event) name + domain + afterConsent (domain always "") EdpsCookieStore.ts:337–350 P1 Same key counted twice across paths

What ends up in the database today

One MariaDB table — cookies — carries every observation. There is no runs table, no site_visits table, and no run-level metadata is recorded against the pipeline_id.

CREATE TABLE `cookies` (
  `id`            int AUTO_INCREMENT,
  `timestamp`     datetime,
  `pipeline_id`   varchar(128),
  `site_id`       varchar(128),
  `type`          varchar(10),         -- JS / HDR / LS / JAR
  `host`          varchar(128),        -- divergent semantics across sources
  `cookie_source` text,                -- the cookie's Domain attribute
  `cookie_name`   varchar(100),        -- regex-NORMALISED, raw is lost
  `cookie_value`  text,                -- TRUNCATED to 50 chars on insert
  `source`        text,                -- the URL at observation time
  `path`          varchar(45),         -- silently truncates long paths
  `expires`       datetime,            -- includes deletion sentinels (1970)
  `http_only`     tinyint(1),
  `secure`        tinyint(1),
  `same_site`     varchar(20),
  `callstack`     text,
  `known`         boolean,
  `known_from`    varchar(10),
  `after_consent` tinyint(1)           -- can be wrong: race + failed-consent fallback
);

Source: src/tools/dbcreate.ts:20–43, src/lib/EdpsCookieStore.ts:475

Section 03

Verified flaws.

Each flaw below has been re-verified against the codebase and anchored to a specific file:line. Severity is rated by the question "would this fail under DPA scrutiny?" — not by code-quality impact. Use the filter to narrow by severity.

F-001
P0

Race between afterConsent flip and async event handlers

The closure variable afterConsent is read inside async context.on('console') and context.on('response') handlers after one or more await points. A pre-consent Set-Cookie response can be filed as post-consent if the handler resolves after the flag flip — and vice versa.

Exploit: response arrives at T-50 ms; handler awaits data.headerValue('set-cookie'); resumes at T+10 ms when the flag has already flipped. Cookie recorded with afterConsent=true.

Fix: snapshot the flag at event entry — const ac = afterConsent; await ...; addCookie(ac, ...). Persist consent_accepted_at once and observed_at per cookie; derive after_consent at read time.

src/cookies-checker.ts:125–148, 152–163, 166, 182
F-002
P0

Failed consent silently mapped to afterConsent=false

If the consent click throws (button missing, banner in iframe, vendor unsupported), the flag never flips. The post-consent jar dump is then tagged as pre-consent. A regulator cannot distinguish "no banner", "banner timeout", "vendor not supported", or "user rejected".

Fix: persist a consent_state enum: {accepted, rejected, failed_selector, no_banner_detected, error}. Re-derive after_consent only when consent_state == 'accepted'.

src/cookies-checker.ts:179–193
F-003
P1

CMP detection limited to Didomi and CPEx

Selector hard-wired to button[id*="didomi-notice-agree-button"], button[id*="cpexSubs_consentButton"]. Other CMPs and iframe-hosted banners are not searched.

Fix: probe window.__tcfapi, window.OneTrust, window.Cookiebot, window.didomiOnReady, window.Sourcepoint. Record the detected vendor; fail loudly when probe and click-selector disagree.

src/lib/uniweb-site.ts consent()
F-004
P1

Hard-coded post-consent capture window of 10 s

Trackers loaded after 10 s are missed. A regulator can ask "why 10?" and the answer today is "we picked it".

Fix: wait for networkidle after consent, then take two snapshots 10 s apart and record the delta in the runs table.

src/cookies-checker.ts:23, 199
F-005
P0

domain="" collision: same cookie stored twice

JS/HDR cookies without an explicit Domain attribute keep domain="". The JAR cookie for the same name has the resolved domain. Dedup keys do not match across sources, so one cookie is stored twice.

Fix: in addCookie, set domain = explicitDomain ?? extractHost(src) before dedup. Keep raw_domain_attribute as a separate column.

src/lib/EdpsCookieStore.ts:127, 286–296, 393–399
F-006
P2

Type/source field is whoever-wrote-first

When the JAR snapshot collapses onto an existing HDR/JS row, only ocurrence, value, and known are updated. JAR provenance is lost.

Fix: replace the single type column with sources SET('JS','HDR','LS','JAR') or store an array of (source, observed_at).

src/lib/EdpsCookieStore.ts:393–414
F-007
P2

Known-cookies CSV loader fragile to delimiter / header changes

Dataset uses semicolons and a UTF-8 BOM. Loader passes header: false with no explicit delimiter and starts at index 1. If auto-detection ever misfires, every cookie becomes "unknown" silently.

Fix: parse(csvData, { header: true, delimiter: ';', skipEmptyLines: true, transformHeader: h => h.trim() }); assert N rows loaded; throw if zero.

src/lib/EdpsCookieStore.ts:54–66
F-008
P1

Known-cookie classification keyed by name only

_ga from a benign first-party deployment and _ga from a tracker iframe collapse to one classification.

Fix: key the lookup on (cookie_name, domain_suffix); mark ambiguous matches as known='ambiguous'.

src/lib/EdpsCookieStore.ts:47–77
F-009
P0

DB cookie_value truncated to 50 chars; CSV keeps full value

Most ad-tracking IDs are longer than 50 chars. Cannot evidence what the cookie carried; cannot recognise TCF TC-strings. The cookie_value column is TEXT in the schema — the truncation is a code policy.

Fix: drop truncate() from exportToDb; promote to MEDIUMTEXT if needed. Add cookie_value_sha256 alongside.

src/lib/EdpsCookieStore.ts:21–29, 486
F-010
P0

Regex normalisation overwrites the observed cookie name

validateRegex sets validatedCookie.name = v.cookie_name. The DB carries _ga_* instead of the actual _ga_HJ4PMK62Q1 that was set on the page. Destructive transformation of primary evidence.

Fix: add cookie_name_raw NOT NULL column; keep cookie_name as the normalised form (or rename to cookie_name_normalized).

src/lib/EdpsCookieStore.ts:508–518
F-011
P1

Tombstone cookies stored as live

Set-Cookie deletion sentinels (Expires=Thu, 01 Jan 1970... or Max-Age=0) are recorded with their parsed expiry. Over-reports tracker persistence.

Fix: detect expires < now or Max-Age <= 0; set is_deletion=1; exclude from persistent-cookie counts.

src/lib/EdpsCookieStore.ts:150–157
F-012
P2

extractHost returns empty string on parse failure (silent)

Bad URL → host="". Two unrelated bad-URL cookies collide on dedup.

Fix: log a structured warning, keep the raw src, set src_parse_error flag.

src/lib/EdpsCookieStore.ts:39–45
F-013
P1

host field has divergent semantics across sources

JS/HDR records host = extractHost(response_url). JAR records host = extractHost(siteConfig.url). The surviving host on dedup is order-dependent. The field is unreliable for first/third-party attribution.

Fix: separate into observed_via_url and observed_origin; never reuse one field for two semantics.

src/lib/EdpsCookieStore.ts:299–306, 375–385
F-014
P0

Storage coverage limited to HTTP cookies and localStorage

Not collected: window.cookieStore API, IndexedDB, sessionStorage, Cache Storage, ServiceWorker registrations. Modern trackers and most CMPs use IndexedDB.

Fix (immediate): document the scope as "cookies + localStorage". Fix (medium): per snapshot, page.evaluate(() => indexedDB.databases()) per origin; hook window.cookieStore.set.

preload/preload-trace.js
F-015
P1

Browser launched with --test-third-party-cookie-phaseout

Simulates Chrome's tracking-protection state. EU users in 2026 still receive third-party cookies because the phase-out has been deprecated. The test undercounts.

Fix: run two passes (with/without flag), or remove the flag and document the change.

src/cookies-checker.ts:84–95
F-016
P0

No runs / site-visits metadata table

pipeline_id is the only run grouping key but carries no metadata. Cannot answer: did the page load? was consent accepted? what UA? what TC-string?

Fix: add runs and site_visits tables — schema in Section 5.

src/tools/dbcreate.ts, src/cookies-checker.ts:16
F-017
P3

Set-Cookie multi-header split on \n only

CRLF leaves trailing \r on each token.

Fix: rspHeaders.split(/\r?\n/).map(s => s.trim()).filter(Boolean)

src/cookies-checker.ts:154–159
F-018
P3

Cookie value not trimmed

Only the name is trimmed. Per RFC 6265 the value should be too.

src/lib/EdpsCookieStore.ts:143–146
F-019
P3

Stack overwritten on dedup (last wins)

The first observation's stack is lost.

Fix: keep first-seen, or store stacks as an append-only array.

src/lib/EdpsCookieStore.ts:297–303
F-020
P3

First-regex-wins normalisation

break on first match. Two patterns matching the same name silently shadow each other.

Fix: on startup, run all regexes against a sample set; warn on overlap.

src/lib/EdpsCookieStore.ts:510–515
F-021
P3

Array.find dedup is O(n²)

Performance issue, not correctness. High-tracker sites can time out.

Fix: replace with Map keyed by composite dedup key.

src/lib/EdpsCookieStore.ts
F-022
P2

DB column widths smaller than realistic data

cookie_name VARCHAR(100), path VARCHAR(45), known_from VARCHAR(10), same_site VARCHAR(20). With strict_mode on, INSERT fails; without it, MariaDB silently truncates.

Fix: promote to VARCHAR(255), VARCHAR(2048), VARCHAR(64); audit existing rows for truncation.

src/tools/dbcreate.ts:27–39
F-023
P2

LEG warnings stderr-only

When a cookie matches the LEG list, only console.warn fires. Not persisted in the DB.

Fix: add is_leg TINYINT(1) column on observations; add a leg_warnings table per run.

src/lib/EdpsCookieStore.ts:263–269
F-024
P1

Event vs JAR LocalStorage dedup keys disagree on consent state

Event-driven LS dedup includes afterConsent; JAR LS dedup excludes it. Same key can be stored once or twice depending on order.

Fix: apply F-001's fix universally — dedup by (name, host); carry observation timestamp; derive consent phase at read time.

src/lib/EdpsCookieStore.ts:337–350, 417–431
F-025
P0

No TCF TC-string captured

The system records "the consent button was clicked", not "what consent was granted". The IAB TCF v2.2 TC-string carries purpose 1–10 consents and per-vendor consents — that is the legal evidence of consent shape.

Fix: after click, page.evaluate(() => window.__tcfapi('getTCData', 2, cb)). Persist tc_string, cmp_id, cmp_version, purpose_consents, vendor_consents in site_visits.

repo-wide: no __tcfapi reference
F-026
P1

No tamper-evidence on outputs

CSVs are mutable; MariaDB rows are mutable. No SHA-256, no signing, no append-only manifest.

Fix: per run — SHA-256 each file; write a signed manifest; ship to append-only object storage (S3 Object Lock or B2 with retention).

repo-wide: no createHash / signing
F-027
P1

No geographic context recorded

Many EU sites alter cookie behaviour by visitor geo. Egress IP / country are not stored.

Fix: at run start, record egress_ip and inferred country_code in runs.

src/cookies-checker.ts
F-028
P1

Report tool produces aggregates only

Five COUNT(*) rollups and a vendor GROUP BY. No row-level "every non-essential cookie set before consent" export — exactly the question regulators ask.

Fix: add an evidence-export tool that produces a per-run ZIP: cookies.csv, runs.json, localstorage.csv, manifest.sha256.txt, README.md.

src/tools/generateReport.ts

Section 04

How it should work.

The principle: separate observation from interpretation. Raw evidence rows are immutable. Classifications — known, before-consent, after-consent, deletion — are derived at read time and can be re-derived if rules change. This is the shape that survives any DPA challenge along the lines of "show me what the site actually did, not what your tool decided."

The corrected event flow

  1. 1

    Browser launch — record reproducibility data

    At launch, write a row into runs: tool git SHA, browser version, UA, viewport, mode, egress IP, dataset SHA-256.

  2. 2

    Per-site visit — open a site_visits row

    visit_id = uuid(). Record started_at immediately. Inject preload-trace.js with extended hooks: document.cookie, window.localStorage, window.sessionStorage, window.cookieStore.set.

  3. 3

    Page load → wait for networkidle

    Replace the 5 s sleep with page.waitForLoadState('networkidle', { timeout: 30_000 }). Record page_loaded_at.

  4. 4

    CMP probe and consent click

    Probe in order: window.__tcfapi, window.OneTrust, window.Cookiebot, window.didomiOnReady, window.Sourcepoint. Record consent_vendor. Click. On success, record consent_accepted_at (millisecond precision) and call __tcfapi('getTCData', 2, cb); persist the TC-string.

  5. 5

    Snapshot the consent boundary

    From this point on, every observation row carries observed_at. The phase label is derived: observed_at < consent_accepted_atbefore; otherwise → after; if consent_state ≠ 'accepted'no_consent.

  6. 6

    Late-load capture

    Wait for networkidle; take two further snapshots 10 s apart; record the count delta in site_visits.late_observation_count.

  7. 7

    Persist with raw evidence intact

    Write rows to observations: full cookie_value, original cookie_name_raw, both domain_raw and domain_resolved, both observed_origin and observed_via_url, is_deletion for tombstones.

  8. 8

    Build the per-run evidence pack

    Export runs.csv, visits.csv, observations.csv, the existing CSVs, screenshots, and a Playwright HAR per visit. Compute SHA-256 of every file. Sign the manifest. Ship to append-only storage.

What changes for a regulator's three core questions

QuestionTodayAfter
Show every non-essential cookie set before consent on this run. Counts only; after_consent may be wrong; cookie value truncated; cookie name rewritten. Row-level CSV from observations_classified view; raw values; raw names; consent phase derived per row.
Prove the user actually consented and what they consented to. "The button was clicked." TCF TC-string + per-purpose consents persisted in site_visits.
How do we know this CSV was not edited after the run? No mechanism. Per-run signed manifest with SHA-256 of every file; published to append-only storage.

Section 05

The proposed evidence database.

Three additive tables, one derived view. Old cookies table kept for one release for parallel comparison, then dropped. Below is the SQL — same shape can be expressed in Postgres or SQLite.

runs — one row per script invocation

CREATE TABLE runs (
  run_id              VARCHAR(36)   PRIMARY KEY,
  started_at          DATETIME(3)   NOT NULL,
  ended_at            DATETIME(3)   NULL,
  tool_version        VARCHAR(32)   NOT NULL,
  tool_git_sha        CHAR(40)      NOT NULL,
  browser_engine      VARCHAR(16)   NOT NULL,
  browser_version     VARCHAR(32)   NOT NULL,
  user_agent          TEXT          NOT NULL,
  viewport_w          SMALLINT      NOT NULL,
  viewport_h          SMALLINT      NOT NULL,
  mode                VARCHAR(16)   NOT NULL,        -- std / stealth / headed
  third_party_phaseout BOOL         NOT NULL,
  egress_ip           VARCHAR(45)   NULL,
  egress_country      CHAR(2)       NULL,
  dataset_version     VARCHAR(32)   NULL,
  errors_count        INT           NOT NULL DEFAULT 0,
  manifest_sha256     CHAR(64)      NULL
);

site_visits — one row per (run, site)

CREATE TABLE site_visits (
  visit_id              VARCHAR(36)  PRIMARY KEY,
  run_id                VARCHAR(36)  NOT NULL,
  site_id               VARCHAR(128) NOT NULL,
  site_url              TEXT         NOT NULL,
  final_url             TEXT         NULL,
  started_at            DATETIME(3)  NOT NULL,
  page_loaded_at        DATETIME(3)  NULL,
  consent_attempted_at  DATETIME(3)  NULL,
  consent_accepted_at   DATETIME(3)  NULL,
  consent_state         ENUM('accepted','rejected','failed_selector',
                             'no_banner','timeout','error') NOT NULL,
  consent_vendor        VARCHAR(32)  NULL,
  cmp_id                SMALLINT     NULL,
  cmp_version           SMALLINT     NULL,
  tc_string             TEXT         NULL,
  purpose_consents      JSON         NULL,
  vendor_consents       JSON         NULL,
  page_load_status      VARCHAR(16)  NOT NULL,
  error_summary         TEXT         NULL,
  cookies_observed_count INT NOT NULL DEFAULT 0,
  ls_observed_count      INT NOT NULL DEFAULT 0,
  idb_observed_count     INT NOT NULL DEFAULT 0,
  KEY ix_site_run (run_id, site_id),
  CONSTRAINT fk_visit_run FOREIGN KEY (run_id) REFERENCES runs(run_id)
);

observations — one row per storage write

CREATE TABLE observations (
  obs_id              BIGINT       AUTO_INCREMENT PRIMARY KEY,
  visit_id            VARCHAR(36)  NOT NULL,
  observed_at         DATETIME(3)  NOT NULL,
  source              ENUM('JS','HDR','LS','JAR','IDB','CS') NOT NULL,
  cookie_name_raw     VARCHAR(255) NOT NULL,        -- as observed, never rewritten
  cookie_name_norm    VARCHAR(255) NOT NULL,        -- regex-normalised
  cookie_value        MEDIUMTEXT   NULL,            -- full value, no truncation
  cookie_value_sha256 CHAR(64)     NULL,
  domain_raw          VARCHAR(255) NULL,            -- raw Domain attr (or NULL)
  domain_resolved     VARCHAR(255) NOT NULL,        -- normalised host
  path                VARCHAR(2048) NULL,
  expires             DATETIME     NULL,
  max_age             INT          NULL,
  is_session          BOOL         NOT NULL DEFAULT 0,
  is_deletion         BOOL         NOT NULL DEFAULT 0,
  http_only           BOOL         NULL,
  secure              BOOL         NULL,
  same_site           VARCHAR(20)  NULL,
  partitioned         BOOL         NULL,
  observed_origin     VARCHAR(255) NOT NULL,
  observed_via_url    TEXT         NULL,
  callstack           MEDIUMTEXT   NULL,
  is_known            BOOL         NOT NULL DEFAULT 0,
  known_from          VARCHAR(64)  NULL,
  is_leg              BOOL         NOT NULL DEFAULT 0,
  occurrence_count    INT          NOT NULL DEFAULT 1,
  KEY ix_visit (visit_id),
  KEY ix_name  (cookie_name_norm),
  CONSTRAINT fk_obs_visit FOREIGN KEY (visit_id) REFERENCES site_visits(visit_id)
);

observations_classified — derived view

CREATE OR REPLACE VIEW observations_classified AS
SELECT
  o.*,
  v.consent_state,
  v.consent_accepted_at,
  CASE
    WHEN v.consent_state <> 'accepted' THEN 'no_consent'
    WHEN o.observed_at < v.consent_accepted_at THEN 'before_consent'
    ELSE 'after_consent'
  END AS consent_phase
FROM observations o
JOIN site_visits v ON v.visit_id = o.visit_id;

Per-run evidence pack — what gets shipped to storage

run-<run_id>.zip
├── manifest.json              run metadata + SHA-256 of every file
├── manifest.json.sig          detached signature (cosign / minisign / GPG)
├── runs.csv                   one row from runs
├── visits.csv                 rows from site_visits for this run
├── observations.csv           full observations rows for this run
├── csvoutput/
│   └── cookiesoutput-<site>.csv
├── screenshots/
│   └── <site>.jpeg
├── har/
│   └── <site>.har             Playwright HAR per visit (network trail)
└── README.md                  generated; scope, methodology, software version

Section 06

Presenting evidence to controllers.

Controllers want three things, in this order: a clean summary they can read in two minutes, the ability to drill down to a single cookie row, and confidence that the evidence has not been edited after the fact.

Run PDF

One PDF per run. Cover, methodology (one page including scope statements: cookies + LocalStorage; Didomi/CPEx; --test-third-party-cookie-phaseout), headline numbers per site, top-20 vendors, the smoking-gun risk table, and a manifest signature fingerprint.

Interactive viewer

Static-site dashboard generated per run, hosted internally. Filter by site, consent phase, source, known/unknown, vendor, LEG. Each row links to its screenshot, HAR network entry, and call stack. Built on a SQLite copy of observations + Datasette, or a Next.js page reading the CSVs.

The "cookie row" detail card

One artifact per obs_id. Cookie name (raw + normalised), site, source, exact timestamps, expiry, value (head + SHA-256), known label, full call stack. Every field is a column — no interpretation.

Weekly digest

Four bullets per week, posted to a leadership channel: runs OK, pre-consent unknown count (with hyperlink to the filtered viewer URL), new vendors detected this week, evidence pack hash + Sigstore link.

Tamper evidence — three options ranked by DPA defensibility

OptionProsCons
S3 Object Lock (compliance mode) Industry-standard write-once-read-many; immutable for the retention window; DPAs accept without explanation. Vendor lock-in; cost.
Backblaze B2 with Object Lock Same WORM model; cheaper; EU-friendly. Slightly less recognised by EU regulators than AWS.
Git repo with signed tags + LFS Cheap; auditable history; fits existing tooling. Tampering possible by anyone with push access; needs branch-protection + signed tags + an external mirror.

Recommendation: B2 with Object Lock, governance retention 6 years — matches accountability obligations under GDPR Art. 5(2). Tag run name YYYY-MM-DD-<run_id>. Include the manifest signature in a Sigstore Rekor entry so the signature itself is also append-only.

Section 07

Roadmap.

Order moves the system to defensible state in the cheapest sequence. After step 6 the headline DPA challenges are answered; steps 7–9 close the residual scope gaps.

  1. 1

    Add runs and site_visits tables

    Behind a feature flag. Stop using a single pipeline_id as the only run key. F-016

  2. 2

    Drop value truncation in exportToDb

    Add cookie_value_sha256. F-009

  3. 3

    Snapshot afterConsent at event entry

    Write consent_accepted_at in site_visits; derive consent phase at read time. F-001

  4. 4

    Persist raw cookie name

    Stop overwriting it during regex normalisation. F-010

  5. 5

    Capture TCF TC-string

    After consent click, call window.__tcfapi('getTCData', 2, cb). F-025

  6. 6

    Build the per-run ZIP + manifest + SHA-256 chain

    Ship to append-only storage. F-026

  7. 7

    Add CMP probe

    Broaden detection beyond Didomi/CPEx. F-003

  8. 8

    IndexedDB enumeration per visit

    Or document the gap explicitly. F-014

  9. 9

    Tighten the dataset loader

    Explicit delimiter + header row + assertion on minimum rows loaded. F-007

Section 08

Methodology.

Every claim on this site is anchored to a verified codebase fact. The verification record lives in docs/audit/state.json in the repository: 32 facts, each with file path, line range, and excerpt. Each finding in docs/audit/findings.json references one or more facts. The full machine-readable evidence set is shipped alongside this report.

How a future verifier can re-walk the evidence

jq '.facts[] | {id, file: .evidence[0].file, claim}' docs/audit/state.json
jq '.findings[] | {id, severity: .regulatory_severity, claim}' docs/audit/findings.json

What this audit explicitly does not cover

  • Service Worker / Cache Storage coverage (subset of F-014).
  • Browser fingerprinting evidence — out of cookie/storage scope; arguable for ePrivacy Art. 5(3) but treated separately.
  • Independence: this is a self-audit by the controller. A DPA may discount it relative to an independent third-party audit. The artifacts here lean heavily on reproducibility and signed evidence to compensate.

Document control

Snapshot date2026-04-27
RepositoryCNC/eu-cookies
Branchmain
Git HEADb461a14
Codebase facts32 (in docs/audit/state.json)
Findings28 (in docs/audit/findings.json)