- Published on
- • 5 min read
NodeLLM Monitor: Self-Hosted LLM Observability
- Authors

- Name
- Shaiju Edakulangara
- @eshaiju
Production AI applications need observability. You need to know which requests are slow, what's driving costs, and where errors happen.
@node-llm/monitor is an observability layer that embeds directly into your Node.js app. It captures every LLM request—latency, tokens, cost, tool calls—and serves a real-time dashboard from your own server.
One npm package. No external services. Your data stays in your database.
What It Actually Does
Every LLM request gets tracked:
- Latency — Total request time, time-to-first-token for streaming
- Tokens — Prompt tokens, completion tokens, totals
- Cost — Calculated from provider pricing tables
- Errors — Full stack traces when things go wrong
- Tool calls — Which tools were invoked and their results
- Schema corrections — How many retries for structured output validation
All of this flows into a built-in React dashboard served from your own server.
Quick Start
npm install @node-llm/monitor
With NodeLLM
If you're using @node-llm/core, the monitor is a middleware:
import { createLLM } from "@node-llm/core";
import { Monitor } from "@node-llm/monitor";
const monitor = Monitor.memory();
const llm = createLLM({
provider: "openai",
model: "gpt-4o",
middlewares: [monitor],
});
// Start the dashboard
import express from "express";
const app = express();
app.use(monitor.api({ basePath: "/monitor" }));
app.listen(3000);
That's it. Visit http://localhost:3000/monitor to see your dashboard.
With Vercel AI SDK (OpenTelemetry)
The @node-llm/monitor-otel package hooks into OpenTelemetry spans:
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
import { NodeTracerProvider } from "@opentelemetry/sdk-trace-node";
import { Monitor } from "@node-llm/monitor";
import { NodeLLMSpanProcessor } from "@node-llm/monitor-otel";
const monitor = Monitor.memory();
// Hook into OpenTelemetry
const provider = new NodeTracerProvider();
provider.addSpanProcessor(new NodeLLMSpanProcessor(monitor.getStore()));
provider.register();
// Use Vercel AI SDK with telemetry enabled
const result = await generateText({
model: openai("gpt-4o-mini"),
prompt: "Write a haiku",
experimental_telemetry: { isEnabled: true }
});
The NodeLLMSpanProcessor intercepts AI spans and extracts model, tokens, and cost.
Manual Instrumentation
For raw OpenAI SDK or any custom setup:
import { Monitor } from "@node-llm/monitor";
import OpenAI from "openai";
const monitor = Monitor.memory();
const openai = new OpenAI();
async function trackedCompletion(prompt: string) {
const ctx = {
requestId: crypto.randomUUID(),
provider: "openai",
model: "gpt-4o",
state: {},
messages: [{ role: "user", content: prompt }],
};
await monitor.onRequest(ctx);
try {
const completion = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: prompt }],
});
await monitor.onResponse(ctx, {
toString: () => completion.choices[0].message.content ?? "",
usage: completion.usage,
});
return completion;
} catch (error) {
await monitor.onError(ctx, error as Error);
throw error;
}
}
Storage Adapters
Choose where telemetry lives:
import { Monitor, MemoryAdapter, FileAdapter, PrismaAdapter } from "@node-llm/monitor";
import { createPrismaMonitor, createFileMonitor } from "@node-llm/monitor";
// Development: In-memory (fast, lost on restart)
const monitor = Monitor.memory();
// File-based (JSON logs, survives restarts)
const monitor = createFileMonitor("./logs/llm.log");
// Production: Prisma (Postgres, MySQL, SQLite)
const monitor = createPrismaMonitor(prismaClient);
// Custom: Implement MonitoringStore interface
const monitor = new Monitor({ store: new MyRedisAdapter() });
The Dashboard
The dashboard is a React SPA that gets served as static files from your server:
app.use(monitor.api({
basePath: "/monitor",
i18n: {
title: "My App Monitor",
supportedLngs: ["en", "es", "ar"],
}
}));
Features:
- Real-time metrics (requests, cost, latency over time)
- Request traces with full payload inspection
- Filter by provider, model, status
- Token analytics per model
- RTL support for Arabic/Hebrew teams
No external JavaScript. No tracking. Everything runs on your server.
What You Get
| Metric | Description |
|---|---|
totalRequests | Count of completed requests |
totalCost | Accumulated USD cost |
avgDuration | Average request latency (ms) |
errorRate | Percentage of failed requests |
totalPromptTokens | Input tokens across all requests |
totalCompletionTokens | Output tokens across all requests |
totalSelfCorrections | Requests that needed schema retry |
Per-request traces include:
- Provider and model
- Duration and cost
- CPU time and memory allocations
- Full event timeline (request.start → tool.start → tool.end → request.end)
Privacy: Content Scrubbing
By default, request/response content is not captured. Enable it with scrubbing:
const monitor = new Monitor({
store: new MemoryAdapter(),
captureContent: true,
scrubbing: { pii: true, secrets: true }
});
The ContentScrubber masks emails, API keys, and other sensitive patterns before storage.
Why Self-Hosted?
| NodeLLM Monitor | Langfuse | LangSmith | |
|---|---|---|---|
| Hosting | Your server | Their cloud | Their cloud |
| Data location | Your database | Their database | Their database |
| Pricing | Free | Free tier → $59+/mo | $39+/mo |
| Setup | One npm package | Account + SDK + config | Account + SDK + config |
If you're building internal tools or have compliance requirements, self-hosted is the only option that makes sense.
Links
Questions? Open an issue or reach out @eshaiju.