Published on
5 min read

NodeLLM Monitor: Self-Hosted LLM Observability

Authors

Production AI applications need observability. You need to know which requests are slow, what's driving costs, and where errors happen.

@node-llm/monitor is an observability layer that embeds directly into your Node.js app. It captures every LLM request—latency, tokens, cost, tool calls—and serves a real-time dashboard from your own server.

One npm package. No external services. Your data stays in your database.


What It Actually Does

Every LLM request gets tracked:

  • Latency — Total request time, time-to-first-token for streaming
  • Tokens — Prompt tokens, completion tokens, totals
  • Cost — Calculated from provider pricing tables
  • Errors — Full stack traces when things go wrong
  • Tool calls — Which tools were invoked and their results
  • Schema corrections — How many retries for structured output validation

All of this flows into a built-in React dashboard served from your own server.


Quick Start

npm install @node-llm/monitor

With NodeLLM

If you're using @node-llm/core, the monitor is a middleware:

import { createLLM } from "@node-llm/core";
import { Monitor } from "@node-llm/monitor";

const monitor = Monitor.memory();

const llm = createLLM({
  provider: "openai",
  model: "gpt-4o",
  middlewares: [monitor],
});

// Start the dashboard
import express from "express";
const app = express();
app.use(monitor.api({ basePath: "/monitor" }));
app.listen(3000);

That's it. Visit http://localhost:3000/monitor to see your dashboard.


With Vercel AI SDK (OpenTelemetry)

The @node-llm/monitor-otel package hooks into OpenTelemetry spans:

import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
import { NodeTracerProvider } from "@opentelemetry/sdk-trace-node";
import { Monitor } from "@node-llm/monitor";
import { NodeLLMSpanProcessor } from "@node-llm/monitor-otel";

const monitor = Monitor.memory();

// Hook into OpenTelemetry
const provider = new NodeTracerProvider();
provider.addSpanProcessor(new NodeLLMSpanProcessor(monitor.getStore()));
provider.register();

// Use Vercel AI SDK with telemetry enabled
const result = await generateText({
  model: openai("gpt-4o-mini"),
  prompt: "Write a haiku",
  experimental_telemetry: { isEnabled: true }
});

The NodeLLMSpanProcessor intercepts AI spans and extracts model, tokens, and cost.


Manual Instrumentation

For raw OpenAI SDK or any custom setup:

import { Monitor } from "@node-llm/monitor";
import OpenAI from "openai";

const monitor = Monitor.memory();
const openai = new OpenAI();

async function trackedCompletion(prompt: string) {
  const ctx = {
    requestId: crypto.randomUUID(),
    provider: "openai",
    model: "gpt-4o",
    state: {},
    messages: [{ role: "user", content: prompt }],
  };

  await monitor.onRequest(ctx);

  try {
    const completion = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: [{ role: "user", content: prompt }],
    });

    await monitor.onResponse(ctx, {
      toString: () => completion.choices[0].message.content ?? "",
      usage: completion.usage,
    });

    return completion;
  } catch (error) {
    await monitor.onError(ctx, error as Error);
    throw error;
  }
}

Storage Adapters

Choose where telemetry lives:

import { Monitor, MemoryAdapter, FileAdapter, PrismaAdapter } from "@node-llm/monitor";
import { createPrismaMonitor, createFileMonitor } from "@node-llm/monitor";

// Development: In-memory (fast, lost on restart)
const monitor = Monitor.memory();

// File-based (JSON logs, survives restarts)
const monitor = createFileMonitor("./logs/llm.log");

// Production: Prisma (Postgres, MySQL, SQLite)
const monitor = createPrismaMonitor(prismaClient);

// Custom: Implement MonitoringStore interface
const monitor = new Monitor({ store: new MyRedisAdapter() });

The Dashboard

The dashboard is a React SPA that gets served as static files from your server:

app.use(monitor.api({ 
  basePath: "/monitor",
  i18n: {
    title: "My App Monitor",
    supportedLngs: ["en", "es", "ar"],
  }
}));

Features:

  • Real-time metrics (requests, cost, latency over time)
  • Request traces with full payload inspection
  • Filter by provider, model, status
  • Token analytics per model
  • RTL support for Arabic/Hebrew teams

No external JavaScript. No tracking. Everything runs on your server.


What You Get

MetricDescription
totalRequestsCount of completed requests
totalCostAccumulated USD cost
avgDurationAverage request latency (ms)
errorRatePercentage of failed requests
totalPromptTokensInput tokens across all requests
totalCompletionTokensOutput tokens across all requests
totalSelfCorrectionsRequests that needed schema retry

Per-request traces include:

  • Provider and model
  • Duration and cost
  • CPU time and memory allocations
  • Full event timeline (request.start → tool.start → tool.end → request.end)

Privacy: Content Scrubbing

By default, request/response content is not captured. Enable it with scrubbing:

const monitor = new Monitor({
  store: new MemoryAdapter(),
  captureContent: true,
  scrubbing: { pii: true, secrets: true }
});

The ContentScrubber masks emails, API keys, and other sensitive patterns before storage.


Why Self-Hosted?

NodeLLM MonitorLangfuseLangSmith
HostingYour serverTheir cloudTheir cloud
Data locationYour databaseTheir databaseTheir database
PricingFreeFree tier → $59+/mo$39+/mo
SetupOne npm packageAccount + SDK + configAccount + SDK + config

If you're building internal tools or have compliance requirements, self-hosted is the only option that makes sense.



Questions? Open an issue or reach out @eshaiju.

TwitterLinkedInHacker News