Embed an LLM-powered Assistant into Desktop Apps Using Firebase Realtime State Sync
desktoprealtimellm

Embed an LLM-powered Assistant into Desktop Apps Using Firebase Realtime State Sync

UUnknown
2026-02-21
12 min read
Advertisement

Build a private, realtime desktop assistant: run LLMs locally and sync conversation state and prefs with Firebase Realtime Database.

Ship a private, realtime desktop assistant: keep state in Firebase while running LLMs locally

Hook: You want a desktop assistant that feels instant, private, and always in sync across devices — without routing sensitive prompts to cloud LLMs or rebuilding realtime sync from scratch. This tutorial shows how to embed an LLM-powered assistant into a desktop app that runs inference locally while keeping conversation state, presence, and user preferences synced with Firebase Realtime Database.

We cover an opinionated, production-ready approach: architecture, security, offline-first patterns, code for desktop integration (Electron/Tauri), realtime rules, presence via onDisconnect, and how to wire a local inference process into Firebase. This is written for 2026 — when local LLM runtimes and edge AI hardware are mainstream, and developers must balance privacy, cost, and scale.

Why this architecture matters in 2026

Two trends shaped this guide:

  • Local inference at scale — affordable edge hardware (e.g., AI HAT+ for Raspberry Pi 5, dedicated NPU dongles) and efficient quantized runtimes (ggml/llama.cpp, Ollama, native ONNX/NNAPI runtimes) make on-device LLM inference realistic for desktop apps.
  • Realtime UX expectations — users expect synchronized conversation state, cross-device continuity, and low-latency presence signals like typing indicators.

Combining local inference with Realtime Database gives the best of both worlds: privacy (models run locally), and developer productivity + realtime sync (Firebase handles presence, durable state, and conflict-safe updates).

High-level architecture

+----------------+       +--------------------+        +--------------------------+
|  Desktop App   | <----> | Firebase Realtime  | <----> | Other clients (mobile,   |
|  (Electron)    |       | Database + Auth    |        | web)                     |
|  - UI + sync   |       |                    |        | - read conversation state |
|  - local LLM   |       | - conversations/   |        | - presence, preferences   |
|  - inference   |       |   preferences      |        |                          |
+----------------+       +--------------------+        +--------------------------+
         |                         ^   ^
         | local inference logs    |   | cloud triggers (optional)
         v                         |   |
  +----------------+               |   v
  | LLM runtime    | <--------------+  +----------------+
  | (llama.cpp,    |                  | Cloud Functions |
  | Ollama REST,   |                  | - push notif     |
  | local server)  |                  | - token minting  |
  +----------------+                  +----------------+

Design goals

  • Private inference: user data stays on-device unless they opt in to cloud syncing.
  • Realtime sync: conversation metadata, message pointers, and preferences live in Realtime Database.
  • Offline-first: local cache + atomic sync when online.
  • Secure by design: per-user rules and optional encryption for sensitive transcripts.

Step 1 — Firebase project & Realtime Database schema

1) Create a Firebase project (console.firebase.google.com). Enable Realtime Database and Firebase Authentication. For desktop apps, a common pattern is to use OAuth or a small backend that mints custom tokens for Firebase Auth.

Suggested RTDB structure (shallow keys for scale)

{
  "users": {
    "$uid": {
      "profile": { "displayName": "...", "prefs": { /* theme, assistant voice */ } },
      "conversations": {
        "$convId": {
          "meta": { "title": "...", "updatedAt": 1670000000 },
          "head": "messageIds/123"
        }
      },
      "presence": { "state": "online", "lastSeen": 1670000000 }
    }
  },
  "messages": {
    "$msgId": { "convId": "...", "author": "user|assistant|system", "text": "...", "createdAt": 1670000000 }
  }
}

Why separate messages? Realtime Database scales better when large arrays are flattened into shallow nodes. Writing a new message is a single child write to /messages/$msgId and then a pointer update under the conversation.

Step 2 — Realtime Database Rules & Security

Protect data with per-user rules. Only allow users to read/write their own data. If you support team-shared assistants, add ACLs at the conversation level.

{
  "rules": {
    "users": {
      "$uid": {
        ".read": "auth != null && auth.uid === $uid",
        ".write": "auth != null && auth.uid === $uid",
        "presence": {
          ".write": "auth != null && auth.uid === $uid"
        }
      }
    },
    "messages": {
      "$msgId": {
        ".read": "auth != null && root.child('users').child(auth.uid).child('conversations').child(data.child('convId').val()).exists()",
        ".write": "auth != null && newData.child('author').val() === auth.uid || newData.child('author').val() === 'assistant'"
      }
    }
  }
}

Notes: Adjust .read/.write rules for guest sessions, team sharing, or admin roles. Realtime Database rules evaluate fast but always test with the Firebase Emulator Suite.

Step 3 — Auth patterns for desktop apps

Desktop apps can’t rely on browser popup flows the same way web apps do. Two recommended approaches:

  1. Device / system OAuth + backend minting: open the system browser for OAuth (Google/GitHub) and exchange the provider token on your backend for a Firebase customToken (via Firebase Admin SDK). Desktop app signs in with signInWithCustomToken(). This avoids embedding long-lived API keys in the client.
  2. Embedded browser + OAuth: use an embedded system WebView with a secure redirect URI that your desktop app can capture (custom URL scheme). This can be more friction-prone and must be carefully sandboxed.

Example Cloud Function (Node) to mint a custom token after verifying an OAuth provider token:

// functions/index.js (simplified)
const functions = require('firebase-functions');
const admin = require('firebase-admin');
admin.initializeApp();

exports.mintCustomToken = functions.https.onCall(async (data, context) => {
  const { provider, providerToken } = data;
  // validate providerToken with provider's API (Google/GitHub). Omitted for brevity.
  const providerUid = await verifyProviderToken(provider, providerToken);
  // create / ensure user in Firebase Auth
  const uid = `oauth:${provider}:${providerUid}`;
  await admin.auth().updateUser(uid, { displayName: '...' }).catch(() => {});
  const customToken = await admin.auth().createCustomToken(uid);
  return { token: customToken };
});

Step 4 — Presence & offline-first sync

Realtime Database includes client-side offline support and onDisconnect handlers. Use these for online/offline presence and to ensure partial writes don’t corrupt conversation state.

// client (JavaScript/Electron)
const presenceRef = rtdb.ref(`/users/${uid}/presence`);
presenceRef.set({ state: 'online', lastSeen: Date.now() });
presenceRef.onDisconnect().set({ state: 'offline', lastSeen: Date.now() });

// writing a message with atomic pointer update
const msgRef = rtdb.ref('messages').push();
await msgRef.set({ convId, author: uid, text: 'Hi', createdAt: Date.now() });
await rtdb.ref(`users/${uid}/conversations/${convId}/meta`).update({ updatedAt: Date.now(), head: msgRef.key });

Conflict handling: If multiple devices write to the same conversation concurrently, make pointer updates idempotent by storing message timestamps and using transactions for counters or lastEdited fields.

Step 5 — Local inference patterns

Run the model locally and connect it to your app via one of three integration patterns:

  1. In-process library — link native inference library directly from the app (e.g., Rust bindings or WASM). Works well with Tauri or native apps.
  2. Local inference server — spawn a local process that exposes a small HTTP or WebSocket API (common with llama.cpp, GGML-based servers, or Ollama). The desktop app talks via localhost:PORT.
  3. External runtime/daemon — dedicated local runtime (e.g., AI HAT driver or device-specific service) accessible through IPC or REST.

We’ll illustrate the local inference server approach because it maps cleanly to Electron + Node and separates concerns.

Minimal local inference server (Node wrapper)

// spawn-llm.js (simplified)
const { spawn } = require('child_process');

function startLocalLLM(binaryPath, args = []) {
  const proc = spawn(binaryPath, args);
  proc.stdout.on('data', d => console.log('[llm]', d.toString()));
  proc.stderr.on('data', d => console.error('[llm]', d.toString()));
  proc.on('exit', code => console.log('LLM exited', code));
  return proc;
}

// Example: spawn a local HTTP wrapper that serves /v1/generate
const proc = startLocalLLM('path/to/llm-server', ['--port', '8080']);

// Then your app can POST to http://localhost:8080/v1/generate

Tip: prefer a JSONL streaming API or server-sent events for token-by-token UI responsiveness.

Step 6 — Wiring the inference output into Firebase

Keep message writes consistent: user message is written first, then local model result is appended once generated. For long generations stream partial assistant tokens into ephemeral nodes and consolidate when complete.

// flow in client
1) user types -> write message node with author=uid and status='sent'
2) push a generation job to a local queue: { jobId, msgId }
3) local LLM streams tokens to /messages/$msgId/streams/$clientId
4) on completion, atomically set assistant message: update messages/$assistantMsgId with final text and delete the ephemeral stream nodes

Example client snippet that requests local LLM and writes assistant message:

async function askAssistant(convId, userText) {
  // 1. write user message
  const userMsgRef = rtdb.ref('messages').push();
  await userMsgRef.set({ convId, author: uid, text: userText, createdAt: Date.now(), role: 'user' });

  // 2. call local LLM server
  const resp = await fetch('http://localhost:8080/v1/generate', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ prompt: buildPrompt(convId), stream: true })
  });

  // 3. stream tokens (simplified)
  const reader = resp.body.getReader();
  let assistantText = '';
  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    const chunk = new TextDecoder().decode(value);
    assistantText += chunk;
    // write ephemeral partials for realtime clients (optional)
    await rtdb.ref(`messages/${userMsgRef.key}/partial`).set(assistantText);
  }

  // 4. finalize assistant message
  const assistantMsgRef = rtdb.ref('messages').push();
  await assistantMsgRef.set({ convId, author: 'assistant', text: assistantText, createdAt: Date.now(), role: 'assistant' });
  // update conversation head
  await rtdb.ref(`users/${uid}/conversations/${convId}/meta`).update({ head: assistantMsgRef.key, updatedAt: Date.now() });
}

Step 7 — Cloud Functions and optional server-side logic

Use Cloud Functions sparingly for features that must run server-side: notifications, transcription, backups, or generating short-lived tokens for devices that need tiered access. Prefer keeping sensitive inference local to the device.

// Example: notify other devices when a new convo head updates
exports.onConversationUpdate = functions.database.ref('/users/{uid}/conversations/{convId}/meta')
  .onWrite(async (change, context) => {
    const after = change.after.val();
    // send push notifications or update search index
  });

Cost & scaling strategies

  • Shallow writes: write new messages as separate nodes instead of updating huge arrays.
  • Fan-out sparingly: when you need to update multiple mirrors, use Cloud Functions triggered by a single source-of-truth write.
  • Presence sampling: avoid constant presence pings — use onDisconnect and heartbeat intervals (e.g., every 30s) that increase when activity is low.
  • Store minimal transcript server-side: keep pointers and metadata in RTDB; store full transcripts only if user opts in or for backups (consider encryption at rest using Cloud KMS).

Observability & debugging

Run the Firebase Emulator Suite during development to test rules, auth, and functions locally. For production, enable Realtime Database logging, Cloud Functions tracing, and instrument the local app with local telemetry (opt-in) to capture inference performance and memory usage.

Advanced patterns and future-proofing

CRDTs and fine-grained merge

For complex multi-device collaborative assistant states (e.g., shared notes), consider using CRDT libraries and storing deltas in RTDB. Firebase doesn’t provide CRDT primitives natively, but you can store operation logs and converge on-device.

Selective sync & encryption

Keep sensitive data local by default. If you must sync transcripts, encrypt them client-side before writing to Realtime Database (e.g., using a per-user AES key derived from a passphrase or stored in secure enclave). This balances convenience with compliance.

Edge inference & hardware acceleration

2026 hardware like AI HAT+ for Raspberry Pi and consumer NPUs make running 7B–13B models on-device feasible. Detect hardware capabilities at runtime and select appropriate quantized models or fall back to smaller models to maintain UX.

“Local-first inference reduces data egress cost and eliminates round-trips for private prompts — but it requires careful sync design and encryption when server-side backups are used.”

Sample end-to-end checklist

  1. Create Firebase project and enable Realtime Database + Auth
  2. Define shallow RTDB schema (users, messages, conversations)
  3. Write and test security rules in the Emulator
  4. Implement auth flow (system OAuth + custom token minting)
  5. Implement presence with onDisconnect and heartbeat
  6. Run local LLM server and expose a secure localhost API
  7. Stream partial assistant tokens to ephemeral RTDB nodes, finalize into messages node
  8. Instrument telemetry and test scale patterns
  9. Optional: add Cloud Functions for cross-device notifications and backups

By late 2025 and into 2026 the industry has shifted toward hybrid models: local inference for privacy-critical operations and cloud LLMs for heavy multi-stage tasks. Products like Anthropic’s Cowork (desktop agents with file system access) show demand for intelligent, local-aware assistants. ZDNET’s coverage of new hardware (Raspberry Pi AI HAT+) highlights how inexpensive edge hardware widens the set of feasible device targets.

For developers, the implication is clear: build for dual-mode operation — local inference first, cloud fallback second. Keep your realtime state model flexible so you can add server-side features (search index, analytics) without changing client contracts.

Security checklist

  • Enforce strict Realtime Database rules; never use public read/write in production.
  • Use custom token minting for desktop OAuth flows rather than embedding service credentials.
  • Encrypt transcripts if they contain PII before syncing to the cloud.
  • Use onDisconnect to clear ephemeral tokens and presence.

Troubleshooting common problems

Auth failing in Electron

Use system browser + backend minting. Embedded WebViews can break modern OAuth flows and cause CORS or cookie issues.

Realtime data large reads

Split big arrays into child nodes. Use queries to paginate messages and only subscribe to the latest window (e.g., last 100 messages).

Local model uses too much RAM or stalls

Switch to a smaller quantized model, use streaming generation, and offload heavy ops to a native worker thread or a local GPU/NPU runtime.

Example: Minimal Electron + Firebase flow

// Main process: spawn local LLM server
const { app, BrowserWindow } = require('electron');
const { startLocalLLM } = require('./spawn-llm');

app.whenReady().then(() => {
  startLocalLLM('path/to/llm-server');
  const win = new BrowserWindow({ webPreferences: { nodeIntegration: false } });
  win.loadURL('app://index.html');
});

// Renderer: sign-in via cloud function, then sync
import { getDatabase, ref, onChildAdded, push, set } from 'firebase/database';
import { getAuth, signInWithCustomToken } from 'firebase/auth';
import { httpsCallable, getFunctions } from 'firebase/functions';

const functions = getFunctions();
const mint = httpsCallable(functions, 'mintCustomToken');

async function signInWithProvider(providerToken) {
  const { data } = await mint({ provider: 'google', providerToken });
  await signInWithCustomToken(getAuth(), data.token);
}

Takeaways — practical, production-ready guidance

  • Run LLMs locally when privacy or latency matters; sync only necessary state to Firebase.
  • Design a shallow RTDB schema for messages and conversation pointers to avoid performance pitfalls.
  • Use Realtime Database presence and onDisconnect for accurate online/offline signals across devices.
  • Opt for system OAuth + custom token minting in desktop apps to avoid embedding credentials.
  • Stream partial outputs to ephemeral nodes and finalize writes atomically to keep clients in sync.

Next steps & call-to-action

Start building today: spin up the Firebase Emulator Suite, scaffold an Electron app, and test local LLMs with a small quantized model. If you want a starter kit, clone the companion repo (includes example Electron app, Cloud Function for token minting, RTDB rules, and a local llama.cpp wrapper) — then iterate: add encryption, team sharing, and analytics as needed.

Build a private, realtime desktop assistant that scales. Want the starter kit or a tailored architecture review? Reach out via the firebase.live community or check our implementation repo to get a working demo in under a day.

Advertisement

Related Topics

#desktop#realtime#llm
U

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-04-07T02:34:50.825Z