Embed an LLM-powered Assistant into Desktop Apps Using Firebase Realtime State Sync
Build a private, realtime desktop assistant: run LLMs locally and sync conversation state and prefs with Firebase Realtime Database.
Ship a private, realtime desktop assistant: keep state in Firebase while running LLMs locally
Hook: You want a desktop assistant that feels instant, private, and always in sync across devices — without routing sensitive prompts to cloud LLMs or rebuilding realtime sync from scratch. This tutorial shows how to embed an LLM-powered assistant into a desktop app that runs inference locally while keeping conversation state, presence, and user preferences synced with Firebase Realtime Database.
We cover an opinionated, production-ready approach: architecture, security, offline-first patterns, code for desktop integration (Electron/Tauri), realtime rules, presence via onDisconnect, and how to wire a local inference process into Firebase. This is written for 2026 — when local LLM runtimes and edge AI hardware are mainstream, and developers must balance privacy, cost, and scale.
Why this architecture matters in 2026
Two trends shaped this guide:
- Local inference at scale — affordable edge hardware (e.g., AI HAT+ for Raspberry Pi 5, dedicated NPU dongles) and efficient quantized runtimes (ggml/llama.cpp, Ollama, native ONNX/NNAPI runtimes) make on-device LLM inference realistic for desktop apps.
- Realtime UX expectations — users expect synchronized conversation state, cross-device continuity, and low-latency presence signals like typing indicators.
Combining local inference with Realtime Database gives the best of both worlds: privacy (models run locally), and developer productivity + realtime sync (Firebase handles presence, durable state, and conflict-safe updates).
High-level architecture
+----------------+ +--------------------+ +--------------------------+
| Desktop App | <----> | Firebase Realtime | <----> | Other clients (mobile, |
| (Electron) | | Database + Auth | | web) |
| - UI + sync | | | | - read conversation state |
| - local LLM | | - conversations/ | | - presence, preferences |
| - inference | | preferences | | |
+----------------+ +--------------------+ +--------------------------+
| ^ ^
| local inference logs | | cloud triggers (optional)
v | |
+----------------+ | v
| LLM runtime | <--------------+ +----------------+
| (llama.cpp, | | Cloud Functions |
| Ollama REST, | | - push notif |
| local server) | | - token minting |
+----------------+ +----------------+
Design goals
- Private inference: user data stays on-device unless they opt in to cloud syncing.
- Realtime sync: conversation metadata, message pointers, and preferences live in Realtime Database.
- Offline-first: local cache + atomic sync when online.
- Secure by design: per-user rules and optional encryption for sensitive transcripts.
Step 1 — Firebase project & Realtime Database schema
1) Create a Firebase project (console.firebase.google.com). Enable Realtime Database and Firebase Authentication. For desktop apps, a common pattern is to use OAuth or a small backend that mints custom tokens for Firebase Auth.
Suggested RTDB structure (shallow keys for scale)
{
"users": {
"$uid": {
"profile": { "displayName": "...", "prefs": { /* theme, assistant voice */ } },
"conversations": {
"$convId": {
"meta": { "title": "...", "updatedAt": 1670000000 },
"head": "messageIds/123"
}
},
"presence": { "state": "online", "lastSeen": 1670000000 }
}
},
"messages": {
"$msgId": { "convId": "...", "author": "user|assistant|system", "text": "...", "createdAt": 1670000000 }
}
}
Why separate messages? Realtime Database scales better when large arrays are flattened into shallow nodes. Writing a new message is a single child write to /messages/$msgId and then a pointer update under the conversation.
Step 2 — Realtime Database Rules & Security
Protect data with per-user rules. Only allow users to read/write their own data. If you support team-shared assistants, add ACLs at the conversation level.
{
"rules": {
"users": {
"$uid": {
".read": "auth != null && auth.uid === $uid",
".write": "auth != null && auth.uid === $uid",
"presence": {
".write": "auth != null && auth.uid === $uid"
}
}
},
"messages": {
"$msgId": {
".read": "auth != null && root.child('users').child(auth.uid).child('conversations').child(data.child('convId').val()).exists()",
".write": "auth != null && newData.child('author').val() === auth.uid || newData.child('author').val() === 'assistant'"
}
}
}
}
Notes: Adjust .read/.write rules for guest sessions, team sharing, or admin roles. Realtime Database rules evaluate fast but always test with the Firebase Emulator Suite.
Step 3 — Auth patterns for desktop apps
Desktop apps can’t rely on browser popup flows the same way web apps do. Two recommended approaches:
- Device / system OAuth + backend minting: open the system browser for OAuth (Google/GitHub) and exchange the provider token on your backend for a Firebase
customToken(via Firebase Admin SDK). Desktop app signs in with signInWithCustomToken(). This avoids embedding long-lived API keys in the client. - Embedded browser + OAuth: use an embedded system WebView with a secure redirect URI that your desktop app can capture (custom URL scheme). This can be more friction-prone and must be carefully sandboxed.
Example Cloud Function (Node) to mint a custom token after verifying an OAuth provider token:
// functions/index.js (simplified)
const functions = require('firebase-functions');
const admin = require('firebase-admin');
admin.initializeApp();
exports.mintCustomToken = functions.https.onCall(async (data, context) => {
const { provider, providerToken } = data;
// validate providerToken with provider's API (Google/GitHub). Omitted for brevity.
const providerUid = await verifyProviderToken(provider, providerToken);
// create / ensure user in Firebase Auth
const uid = `oauth:${provider}:${providerUid}`;
await admin.auth().updateUser(uid, { displayName: '...' }).catch(() => {});
const customToken = await admin.auth().createCustomToken(uid);
return { token: customToken };
});
Step 4 — Presence & offline-first sync
Realtime Database includes client-side offline support and onDisconnect handlers. Use these for online/offline presence and to ensure partial writes don’t corrupt conversation state.
// client (JavaScript/Electron)
const presenceRef = rtdb.ref(`/users/${uid}/presence`);
presenceRef.set({ state: 'online', lastSeen: Date.now() });
presenceRef.onDisconnect().set({ state: 'offline', lastSeen: Date.now() });
// writing a message with atomic pointer update
const msgRef = rtdb.ref('messages').push();
await msgRef.set({ convId, author: uid, text: 'Hi', createdAt: Date.now() });
await rtdb.ref(`users/${uid}/conversations/${convId}/meta`).update({ updatedAt: Date.now(), head: msgRef.key });
Conflict handling: If multiple devices write to the same conversation concurrently, make pointer updates idempotent by storing message timestamps and using transactions for counters or lastEdited fields.
Step 5 — Local inference patterns
Run the model locally and connect it to your app via one of three integration patterns:
- In-process library — link native inference library directly from the app (e.g., Rust bindings or WASM). Works well with Tauri or native apps.
- Local inference server — spawn a local process that exposes a small HTTP or WebSocket API (common with llama.cpp, GGML-based servers, or Ollama). The desktop app talks via localhost:PORT.
- External runtime/daemon — dedicated local runtime (e.g., AI HAT driver or device-specific service) accessible through IPC or REST.
We’ll illustrate the local inference server approach because it maps cleanly to Electron + Node and separates concerns.
Minimal local inference server (Node wrapper)
// spawn-llm.js (simplified)
const { spawn } = require('child_process');
function startLocalLLM(binaryPath, args = []) {
const proc = spawn(binaryPath, args);
proc.stdout.on('data', d => console.log('[llm]', d.toString()));
proc.stderr.on('data', d => console.error('[llm]', d.toString()));
proc.on('exit', code => console.log('LLM exited', code));
return proc;
}
// Example: spawn a local HTTP wrapper that serves /v1/generate
const proc = startLocalLLM('path/to/llm-server', ['--port', '8080']);
// Then your app can POST to http://localhost:8080/v1/generate
Tip: prefer a JSONL streaming API or server-sent events for token-by-token UI responsiveness.
Step 6 — Wiring the inference output into Firebase
Keep message writes consistent: user message is written first, then local model result is appended once generated. For long generations stream partial assistant tokens into ephemeral nodes and consolidate when complete.
// flow in client
1) user types -> write message node with author=uid and status='sent'
2) push a generation job to a local queue: { jobId, msgId }
3) local LLM streams tokens to /messages/$msgId/streams/$clientId
4) on completion, atomically set assistant message: update messages/$assistantMsgId with final text and delete the ephemeral stream nodes
Example client snippet that requests local LLM and writes assistant message:
async function askAssistant(convId, userText) {
// 1. write user message
const userMsgRef = rtdb.ref('messages').push();
await userMsgRef.set({ convId, author: uid, text: userText, createdAt: Date.now(), role: 'user' });
// 2. call local LLM server
const resp = await fetch('http://localhost:8080/v1/generate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt: buildPrompt(convId), stream: true })
});
// 3. stream tokens (simplified)
const reader = resp.body.getReader();
let assistantText = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = new TextDecoder().decode(value);
assistantText += chunk;
// write ephemeral partials for realtime clients (optional)
await rtdb.ref(`messages/${userMsgRef.key}/partial`).set(assistantText);
}
// 4. finalize assistant message
const assistantMsgRef = rtdb.ref('messages').push();
await assistantMsgRef.set({ convId, author: 'assistant', text: assistantText, createdAt: Date.now(), role: 'assistant' });
// update conversation head
await rtdb.ref(`users/${uid}/conversations/${convId}/meta`).update({ head: assistantMsgRef.key, updatedAt: Date.now() });
}
Step 7 — Cloud Functions and optional server-side logic
Use Cloud Functions sparingly for features that must run server-side: notifications, transcription, backups, or generating short-lived tokens for devices that need tiered access. Prefer keeping sensitive inference local to the device.
// Example: notify other devices when a new convo head updates
exports.onConversationUpdate = functions.database.ref('/users/{uid}/conversations/{convId}/meta')
.onWrite(async (change, context) => {
const after = change.after.val();
// send push notifications or update search index
});
Cost & scaling strategies
- Shallow writes: write new messages as separate nodes instead of updating huge arrays.
- Fan-out sparingly: when you need to update multiple mirrors, use Cloud Functions triggered by a single source-of-truth write.
- Presence sampling: avoid constant presence pings — use onDisconnect and heartbeat intervals (e.g., every 30s) that increase when activity is low.
- Store minimal transcript server-side: keep pointers and metadata in RTDB; store full transcripts only if user opts in or for backups (consider encryption at rest using Cloud KMS).
Observability & debugging
Run the Firebase Emulator Suite during development to test rules, auth, and functions locally. For production, enable Realtime Database logging, Cloud Functions tracing, and instrument the local app with local telemetry (opt-in) to capture inference performance and memory usage.
Advanced patterns and future-proofing
CRDTs and fine-grained merge
For complex multi-device collaborative assistant states (e.g., shared notes), consider using CRDT libraries and storing deltas in RTDB. Firebase doesn’t provide CRDT primitives natively, but you can store operation logs and converge on-device.
Selective sync & encryption
Keep sensitive data local by default. If you must sync transcripts, encrypt them client-side before writing to Realtime Database (e.g., using a per-user AES key derived from a passphrase or stored in secure enclave). This balances convenience with compliance.
Edge inference & hardware acceleration
2026 hardware like AI HAT+ for Raspberry Pi and consumer NPUs make running 7B–13B models on-device feasible. Detect hardware capabilities at runtime and select appropriate quantized models or fall back to smaller models to maintain UX.
“Local-first inference reduces data egress cost and eliminates round-trips for private prompts — but it requires careful sync design and encryption when server-side backups are used.”
Sample end-to-end checklist
- Create Firebase project and enable Realtime Database + Auth
- Define shallow RTDB schema (users, messages, conversations)
- Write and test security rules in the Emulator
- Implement auth flow (system OAuth + custom token minting)
- Implement presence with onDisconnect and heartbeat
- Run local LLM server and expose a secure localhost API
- Stream partial assistant tokens to ephemeral RTDB nodes, finalize into messages node
- Instrument telemetry and test scale patterns
- Optional: add Cloud Functions for cross-device notifications and backups
Real-world considerations & 2026 trends
By late 2025 and into 2026 the industry has shifted toward hybrid models: local inference for privacy-critical operations and cloud LLMs for heavy multi-stage tasks. Products like Anthropic’s Cowork (desktop agents with file system access) show demand for intelligent, local-aware assistants. ZDNET’s coverage of new hardware (Raspberry Pi AI HAT+) highlights how inexpensive edge hardware widens the set of feasible device targets.
For developers, the implication is clear: build for dual-mode operation — local inference first, cloud fallback second. Keep your realtime state model flexible so you can add server-side features (search index, analytics) without changing client contracts.
Security checklist
- Enforce strict Realtime Database rules; never use public read/write in production.
- Use custom token minting for desktop OAuth flows rather than embedding service credentials.
- Encrypt transcripts if they contain PII before syncing to the cloud.
- Use onDisconnect to clear ephemeral tokens and presence.
Troubleshooting common problems
Auth failing in Electron
Use system browser + backend minting. Embedded WebViews can break modern OAuth flows and cause CORS or cookie issues.
Realtime data large reads
Split big arrays into child nodes. Use queries to paginate messages and only subscribe to the latest window (e.g., last 100 messages).
Local model uses too much RAM or stalls
Switch to a smaller quantized model, use streaming generation, and offload heavy ops to a native worker thread or a local GPU/NPU runtime.
Example: Minimal Electron + Firebase flow
// Main process: spawn local LLM server
const { app, BrowserWindow } = require('electron');
const { startLocalLLM } = require('./spawn-llm');
app.whenReady().then(() => {
startLocalLLM('path/to/llm-server');
const win = new BrowserWindow({ webPreferences: { nodeIntegration: false } });
win.loadURL('app://index.html');
});
// Renderer: sign-in via cloud function, then sync
import { getDatabase, ref, onChildAdded, push, set } from 'firebase/database';
import { getAuth, signInWithCustomToken } from 'firebase/auth';
import { httpsCallable, getFunctions } from 'firebase/functions';
const functions = getFunctions();
const mint = httpsCallable(functions, 'mintCustomToken');
async function signInWithProvider(providerToken) {
const { data } = await mint({ provider: 'google', providerToken });
await signInWithCustomToken(getAuth(), data.token);
}
Takeaways — practical, production-ready guidance
- Run LLMs locally when privacy or latency matters; sync only necessary state to Firebase.
- Design a shallow RTDB schema for messages and conversation pointers to avoid performance pitfalls.
- Use Realtime Database presence and onDisconnect for accurate online/offline signals across devices.
- Opt for system OAuth + custom token minting in desktop apps to avoid embedding credentials.
- Stream partial outputs to ephemeral nodes and finalize writes atomically to keep clients in sync.
Next steps & call-to-action
Start building today: spin up the Firebase Emulator Suite, scaffold an Electron app, and test local LLMs with a small quantized model. If you want a starter kit, clone the companion repo (includes example Electron app, Cloud Function for token minting, RTDB rules, and a local llama.cpp wrapper) — then iterate: add encryption, team sharing, and analytics as needed.
Build a private, realtime desktop assistant that scales. Want the starter kit or a tailored architecture review? Reach out via the firebase.live community or check our implementation repo to get a working demo in under a day.
Related Reading
- 10 Rare Citrus Fruits to Try (and How to Use Them in Everyday Cooking)
- Wearable Tech at the Desk: Can Smartwatches Improve Your Workday?
- Too Many Tools? How to Audit Your Shift-Worker Tech Stack in a Weekend
- Gift Guide for Foodies 2026: Smart Lamps, Cozy Heat Packs, and Cocktail Syrups
- Podcast Pitch Party: How Friends Can Ideate, Record and Monetize a Series on Pop Culture Drama
Related Topics
Unknown
Contributor
Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.
Up Next
More stories handpicked for you
Embracing AI in App Development: Learning from Google’s Technological Advances
No-Code Development Using Claude: A Firebase-Friendly Approach to Building Apps
Navigating Linux File Management: Essential Tools for Firebase Developers
Navigating the Future of Mobile Platforms: Implications for Firebase Development
Seamless User Experiences: The Role of UI Changes in Firebase App Design
From Our Network
Trending stories across our publication group