Better I18NBetter I18N

Data Model

How content views are stored in Cloudflare Analytics Engine — blob and double mappings, query patterns, retention

Content analytics events are written to Cloudflare Analytics Engine (contentAnalytics dataset). This page explains the schema so you understand what's stored, what's queryable, and what's not.

Why CF Analytics Engine?

FeatureBenefit
Fire-and-forget writeswriteDataPoint() is non-blocking — ingestion never slows down responses
Built-in samplingHigh-volume events get statistically sampled (still accurate via SUM(_sample_interval))
90-day retentionFree tier; we archive older data to R2
SQL via REST APISame queries we use power your dashboard charts
No schema migrationsAdding new dimensions = adding a new blob slot

The trade-off: AE is append-only. Events can't be edited or deleted. This is fine for analytics but means you can't "fix" a past event.

Schema

The contentAnalytics dataset uses 1 index, 14 blobs (strings), 2 doubles (numbers).

Indexes

SlotFieldNotes
index1orgIdPartition key — max 32 bytes. Queries by orgId are fast partitioned scans.

Blobs (strings)

SlotFieldSource
blob1projectIdFrom API key validation
blob2eventNamee.g. content.view, content.click
blob3entryIdFrom properties.entryId
blob4contentModelSlugFrom properties.contentModelSlug
blob5entrySlugFrom properties.entrySlug (or properties.slug)
blob6languageFrom properties.language
blob7frameworkFrom properties.framework (set by adapter)
blob8countryCodeFrom request.cf.country (edge metadata, not IP)
blob9sdkVersionFrom properties.sdkVersion
blob10hostnameFrom properties.hostname or window.location.hostname
blob11pathFrom properties.path or window.location.pathname
blob12referrerFrom properties.referrer or document.referrer
blob13userIdFrom identity.userId or identity.anonymousId
blob14environmentproduction / preview / development

Doubles (numbers)

SlotFieldNotes
double1countAlways 1. Sum across rows for total event count (sampling-aware via SUM(_sample_interval)).
double2loadTimeMsNumeric — content render time, if provided

What's NOT stored

  • IP addresses — only country code from CF edge metadata
  • User agent — not collected (Phase 1)
  • Email addressesidentity.email is used for routing but never persisted
  • Arbitrary properties — only the reserved property names map to AE columns. Phase 2 will support a free-form JSON blob.

Query patterns

All reads go through POST /api/trpc/contentAnalytics.getContentStats. Internally, six AE SQL queries run in parallel:

-- Overview: total views + unique entries
SELECT
  SUM(_sample_interval) AS total_views,
  COUNT(DISTINCT blob3) AS unique_entries
FROM contentAnalytics
WHERE index1 = '{orgId}'
  AND blob1 = '{projectId}'
  AND blob2 = 'content.view'
  AND timestamp > NOW() - INTERVAL '7' DAY
-- Top entries
SELECT blob3 AS entry_id, blob5 AS entry_slug, blob4 AS model_slug,
       SUM(_sample_interval) AS views
FROM contentAnalytics
WHERE index1 = '{orgId}' AND blob1 = '{projectId}'
  AND blob2 = 'content.view'
  AND timestamp > NOW() - INTERVAL '7' DAY
GROUP BY entry_id, entry_slug, model_slug
ORDER BY views DESC
LIMIT 20
-- Time series — hourly for 24h, daily for 7d/30d
SELECT toStartOfInterval(timestamp, INTERVAL '1' DAY) AS ts,
       SUM(_sample_interval) AS views
FROM contentAnalytics
WHERE index1 = '{orgId}' AND blob1 = '{projectId}'
  AND blob2 = 'content.view'
  AND timestamp > NOW() - INTERVAL '7' DAY
GROUP BY ts
ORDER BY ts ASC

Results are cached in KV: 5 min (24h period), 15 min (7d), 1 hour (30d).

SQL gotchas

If you query AE directly via the REST API, watch for:

  • All numeric values come back as strings — always Number() cast on the client
  • No COALESCE — empty time buckets must be filled in JS
  • No OFFSET — pagination is LIMIT n then .slice() in JS
  • No parameterized queries — escape interpolated values yourself (we whitelist [a-zA-Z0-9-] for IDs)
  • SUM(_sample_interval) for counts, not COUNT(*) — respects sampling

Retention

LayerDuration
Hot (queryable AE)90 days
Cold (R2 archive)Roadmap: indefinite, daily Apache Arrow exports

After 90 days, AE drops events. We'll mirror Counterscale's pattern: a daily cron exports the previous day's data to R2 as Apache Arrow IPC files for long-term analysis.

Capacity & limits

LimitValue
Max writeDataPoint calls per Worker invocation250
Max blob size per data point16 KB total
Max blobs per data point20 (we use 14)
Max doubles per data point20 (we use 2)
Per-dataset write rateNo documented hard limit — CF auto-throttles at infrastructure level

Phase 2 mitigations on top: per-IP rate limit (~50 req/s burst), per-project quota tied to plan, datacenter IP and bot-UA drop.

Next steps

On this page