Hook — Your realtime app is fast, but analytics are killing your budget
You built Firestore-powered realtime features — chat, presence, live dashboards — and they’re snappy. But when your analytics queries start scanning millions of documents, Firestore costs spike and latency kills exploration. The solution in 2026 is a hybrid: keep Firestore for realtime, and stream an append-only events feed into an OLAP engine like ClickHouse for fast, cheap analytics.
Executive summary — Reference architecture in two sentences
Stream Firestore document change events into a durable streaming layer (Pub/Sub/Kafka) then batch or stream-load into ClickHouse using an ingestion service (Cloud Run/Dataflow/Kafka Connect). In ClickHouse, design an append-only events schema and use materialized views for rollups and user-facing aggregates to keep queries sub-second and storage cost-effective.
Why this matters in 2026
In late 2025 and early 2026 we saw OLAP adoption accelerate — ClickHouse raised large growth capital and extended integrations across cloud ecosystems. Teams increasingly choose hybrid architectures: transactional (Firestore) + analytical (ClickHouse). This pattern reduces operational cost and preserves realtime UX while enabling exploratory analytics, ML features, and compliance-friendly audit trails.
Top benefits
- Cost efficiency: OLAP storage + compressed columnar format is far cheaper for large scan queries than transactional document reads.
- Realtime UX: Firestore remains the primary datastore for live views and presence, avoiding query-time joins or heavy indexing.
- Analytics at scale: ClickHouse gives sub-second analytical queries over billions of events.
- Flexible materialized views: Precompute metrics and maintain them in ClickHouse for dashboards and feature stores.
Reference architecture overview
+----------------+ Firestore events +-----------+ Stream/Batch +-------------+
| Mobile/Web SDK | -----------------------> | CloudFn / | ---------------> | Pub/Sub / |
| (Firestore) | (onCreate/onUpdate) | Eventarc | | Kafka |
+----------------+ +-----------+ +-------------+
|
| Debatching &
| transformation
v
+-----------------+
| Cloud Run / |
| Dataflow worker |
| (ClickHouse DB) |
+-----------------+
|
v
+--------------------+
| ClickHouse cluster |
| - events (MergeTree)|
| - materialized views|
+--------------------+
Key components
- Firestore: Keep document writes and realtime listeners native to client SDKs.
- Event emitter: Cloud Functions / Eventarc that converts Firestore change events into canonical event messages.
- Streaming layer: Pub/Sub or Kafka for guaranteed at-least-once delivery, buffering and replays.
- Ingest workers: Cloud Run/Dataflow/Kafka Connect — responsible for batching, idempotency, and writing to ClickHouse.
- ClickHouse: Columnar OLAP store with MergeTree tables and materialized views for aggregation, TTL, and cold storage (S3/GCS).
Designing the ClickHouse schema for Firestore events
Think append-only and immutable. Store a canonical event per change: document snapshot, event type, timestamp, and metadata. Avoid storing denormalized application objects for operational use — keep Firestore for that.
Event table best practices
- Append-only: Each event is immutable; use a unique event_id (UUID) and source_version (optional).
- Partitioning: Use a date-based partition (partition by toYYYYMM(event_time)) for efficient TTL and pruning.
- Primary key & order: Use (document_id, event_time, event_id) for deterministic ordering.
- Column types: Use low-cardinality types for fields like event_type; JSON or Nested for flexible payloads.
- Compression & TTL: Use ClickHouse TTL to move raw events to cheaper S3 (S3 table engine) after N days, retaining aggregates locally.
Sample ClickHouse DDL
CREATE TABLE analytics.events
(
event_id UUID,
document_path String,
document_id String,
event_type LowCardinality(String), -- create, update, delete
event_time DateTime64(3),
firestore_ts DateTime64(3),
user_id String,
payload JSON,
metadata JSON
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_time)
ORDER BY (document_id, event_time, event_id)
TTL event_time + INTERVAL 90 DAY
SETTINGS index_granularity = 8192;
Materialized views for derived tables and aggregates
ClickHouse shines with materialized views that keep aggregates up-to-date in real time. Use them to precompute daily counts, session metrics, or user summaries and serve dashboards directly without expensive scans.
Example: daily active users (DAU)
CREATE MATERIALIZED VIEW analytics.dau_mv
TO analytics.dau
AS
SELECT
toDate(event_time) AS day,
user_id,
count() AS events_count
FROM analytics.events
WHERE event_type = 'activity'
GROUP BY day, user_id;
CREATE TABLE analytics.dau
(
day Date,
user_id String,
events_count UInt64
)
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(day)
ORDER BY (day, user_id);
Streaming ETL options and deployment patterns
Choose the right ingestion pattern based on latency, complexity, and operational constraints:
1) Fast path — Cloud Functions -> Pub/Sub -> Cloud Run writer
- Latency: sub-second to a few seconds.
- Simplicity: small Cloud Function onWrite sends canonical event to Pub/Sub.
- Cloud Run workers pull batches from Pub/Sub and write using ClickHouse HTTP INSERT JSONEachRow.
2) Managed dataflow — Eventarc / Dataflow / Apache Beam
- Latency: seconds; good for transformations and deduplication.
- Use when you need windowed aggregations or heavy transformations before load.
3) Kafka Connect + ClickHouse Sink
- Latency: near-real-time; excellent for large-scale clusters.
- Use enterprise Kafka or Confluent; ClickHouse has Kafka/MaterializedView ingestion patterns.
Idempotency, ordering, and exactly-once considerations
Firestore triggers and streaming systems are usually at-least-once. Implement idempotency and ordering at ingestion:
- Include event_id and document_version (if available) in each message.
- Use deduplication in ingest worker: track last-seen event_id per partition (in-memory + periodic checkpoint to Redis or local checkpointing via Pub/Sub ack).
- For ordering within a document, write with an ORDER BY that ensures event_time+event_id lexicographic sequence.
- Use ClickHouse's Insert with replace? Not recommended for large-scale; prefer append-only and let materialized views compute state.
Security and networking (2026 best practices)
- VPC egress: Run ingestion workers in a VPC (Serverless VPC Access or VPC-native) to talk securely to a ClickHouse cluster in private subnets.
- Credentials: Store ClickHouse credentials and TLS certs in Secret Manager / HashiCorp Vault and mount into Cloud Run with minimal surface area.
- Least privilege: Grant the Cloud Function only Pub/Sub publish rights and the ingestion service only ClickHouse write rights.
- Transport security: Use HTTPS/TLS for ClickHouse HTTP endpoint; enable mTLS if available.
Observability and debugging
Instrument every stage with distributed tracing and metrics:
- Trace events from Firestore -> Cloud Function -> Pub/Sub -> Ingest worker -> ClickHouse. Use OpenTelemetry or Cloud Trace.
- Expose metrics: ingestion throughput, batch sizes, write latency, ClickHouse insert failures and retries.
- Set up alerts on lag (Pub/Sub backlog), increase in duplicate events, and ClickHouse insert errors.
Edge cases and advanced strategies
Schema evolution
Event payloads change. Use a flexible payload column (JSON) and write transformation layers that normalize fields into typed columns in ClickHouse materialized views. Maintain a schema registry (even simple) with change logs.
Backfills and historical import
When you add ClickHouse, you’ll often backfill months of data. Export Firestore to GCS as newline-delimited JSON or Parquet (via managed export), then do bulk INSERTs into ClickHouse or use the S3 table engine for cold-store queries.
Cold storage and cost control
Use ClickHouse TTLs to move raw events to S3/GCS after N days. Keep materialized aggregates in ClickHouse longer and rebuild aggregates cheaply from cold store when needed. ClickHouse’s S3 table engine (available in recent releases) makes this seamless.
Example: ready-to-deploy code snippets
Firestore trigger (Node.js / TypeScript)
import * as functions from 'firebase-functions';
import {PubSub} from '@google-cloud/pubsub';
const pubsub = new PubSub();
const TOPIC = process.env.EVENT_TOPIC!;
export const onDocChange = functions.firestore
.document('{col}/{docId}')
.onWrite(async (change, ctx) => {
const before = change.before.exists ? change.before.data() : null;
const after = change.after.exists ? change.after.data() : null;
const event = {
event_id: ctx.eventId, // functions event id
document_path: ctx.resource.name,
document_id: ctx.params.docId,
event_type: !before ? 'create' : !after ? 'delete' : 'update',
event_time: new Date().toISOString(),
payload: after || before,
metadata: {before, after}
};
await pubsub.topic(TOPIC).publishMessage({json: event});
});
Cloud Run ingest worker (Node.js)
/* Pulls batches from Pub/Sub pull-subscription and writes to ClickHouse */
import fetch from 'node-fetch';
async function writeBatchToClickHouse(events) {
const body = events.map(e => JSON.stringify(e)).join('\n');
const resp = await fetch(process.env.CLICKHOUSE_URL!, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body
});
if (!resp.ok) throw new Error(`CH write failed: ${await resp.text()}`);
}
Operational checklist — deploy in 8 steps
- Provision ClickHouse: managed ClickHouse cloud or self-hosted k8s. Configure TLS and user auth.
- Create ClickHouse analytics schema and materialized views (start simple: events table).
- Provision Pub/Sub or Kafka as streaming backbone.
- Deploy Firestore trigger (Cloud Functions / Eventarc) to emit canonical events.
- Deploy ingest workers (Cloud Run / Dataflow / Kafka Connect) with batching, retries, and idempotency code.
- Set up VPC, Secrets, and logging; secure network paths.
- Backfill historical data via Firestore export -> GCS -> bulk import to ClickHouse.
- Monitor, tune partitions, and create materialized views for top queries.
Performance and cost tuning tips
- Batch inserts to ClickHouse — many small writes are inefficient; target ~1–5MB batches or 10k rows depending on object size.
- Use compression for payload JSON (ClickHouse handles this), and drop unnecessary fields before ingest.
- Limit retention for raw events and retain aggregates longer.
- Leverage ClickHouse cluster features — distributed tables, shards, replicas — as query volume grows.
Real-world example & lessons learned (case study)
One mid-stage SaaS team in 2025 migrated analytics off Firestore after costs tripled with growth. They implemented the pattern above: Firestore triggers -> Pub/Sub -> Cloud Run -> ClickHouse. Results:
- Analytical query cost dropped 70% vs equivalent Firestore reads.
- Dashboard latency reduced from tens of seconds to sub-second for common queries.
- Operational overhead: initial effort ~3 sprint-weeks; incremental maintenance small with robust monitoring.
"Splitting responsibilities let us iterate on features in Firestore while running heavy analytics in ClickHouse — we regained control of cost and performance." — Lead Eng, example SaaS
When not to use ClickHouse
Keep ClickHouse for analytics and large scans. Don’t use it for low-latency transactional updates or as a source of truth for realtime clients. If your dataset is tiny or you prefer a managed BI-friendly store with deep SQL ecosystem, BigQuery or Snowflake might be a better fit depending on team skills and integrations.
Future-proofing & 2026 trends to watch
- ClickHouse is expanding cloud integrations and enterprise tooling (2025–26 funding and product momentum), making hybrid deployments simpler.
- Serverless OLAP endpoints and inline materialized computation will reduce complexity for small teams.
- Standardized event schemas and schema registries will become essential as more teams run multi-OLAP pipelines.
- Expect improved managed connectors between Firebase ecosystems and OLAP engines, reducing custom code.
Checklist: production readiness
- Ensure at-least-once delivery is handled and idempotency built into ingest.
- Test backfills and schema evolution in staging.
- Set up SLOs for ingestion lag and ClickHouse query p95.
- Document recovery playbooks for replays from Pub/Sub or GCS.
Starter kit & templates
To get you started quickly, the minimal starter kit should include:
- Cloud Function template that emits canonical events to Pub/Sub.
- Cloud Run ingest worker with batching and ClickHouse HTTP client.
- ClickHouse DDL for events and a couple of materialized views (DAU, aggregates).
- Terraform templates for Pub/Sub topics, subscriptions, and ClickHouse basic infra.
Actionable takeaways
- Separate concerns: Use Firestore for realtime operations and ClickHouse for analytics.
- Stream events: Emit canonical events from Firestore and centralize in Pub/Sub/Kafka.
- Design for append-only: ClickHouse tables should be append-first; use materialized views for derived state.
- Plan for cost: Use TTL and cold storage to keep hot storage small and cheap.
- Automate observability: Trace events end-to-end and alert on ingestion lag and errors.
Next steps — a one-week rollout plan
- Day 1–2: Provision ClickHouse and create events table; scaffold Cloud Function and Pub/Sub.
- Day 3–4: Implement Cloud Run ingest worker; test end-to-end with staging traffic.
- Day 5: Deploy materialized views for 1–2 high-value metrics and validate results against Firestore queries.
- Day 6: Run backfill for last 30 days; validate and compare costs.
- Day 7: Cutover dashboards and deprecate heavy Firestore analytics reads.
Call to action
Ready to deploy this hybrid realtime + OLAP reference architecture in your stack? Clone the starter kit, run the 7-day rollout plan, and tune materialized views for your top queries. If you want, share your schema and we’ll suggest a ClickHouse schema and MV design optimized for your query patterns.
Related Reading
- WGA Honors Terry George — What Career Achievement Awards Mean for Screenwriters’ Market Value
- A Practical Guide to Building a Heart-Centered Habit System for 2026
- Meme Formats That Pay: Monetizing Timely Cultural Trends Without Selling Out
- Celebrity-Driven Tours: Should London Offer ‘Star-Spotting’ Walks?
- Safe Heat Therapy for Beauty: Using Microwavable Warmers and Hot Packs with Face Masks