NodeLLM Monitor: Self-Hosted LLM Observability

Production AI applications need observability. You need to know which requests are slow, what's driving costs, and where errors happen.

@node-llm/monitor is an observability layer that embeds directly into your Node.js app. It captures every LLM request—latency, tokens, cost, tool calls—and serves a real-time dashboard from your own server.

One npm package. No external services. Your data stays in your database.

What It Actually Does

Every LLM request gets tracked:

Latency — Total request time, time-to-first-token for streaming
Tokens — Prompt tokens, completion tokens, totals
Cost — Calculated from provider pricing tables
Errors — Full stack traces when things go wrong
Tool calls — Which tools were invoked and their results
Schema corrections — How many retries for structured output validation

All of this flows into a built-in React dashboard served from your own server.

Quick Start

npm install @node-llm/monitor

With NodeLLM

If you're using @node-llm/core, the monitor is a middleware:

import { createLLM } from "@node-llm/core";
import { Monitor } from "@node-llm/monitor";

const monitor = Monitor.memory();

const llm = createLLM({
  provider: "openai",
  model: "gpt-4o",
  middlewares: [monitor],
});

// Start the dashboard
import express from "express";
const app = express();
app.use(monitor.api({ basePath: "/monitor" }));
app.listen(3000);

That's it. Visit http://localhost:3000/monitor to see your dashboard.

With Vercel AI SDK (OpenTelemetry)

The @node-llm/monitor-otel package hooks into OpenTelemetry spans:

import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
import { NodeTracerProvider } from "@opentelemetry/sdk-trace-node";
import { Monitor } from "@node-llm/monitor";
import { NodeLLMSpanProcessor } from "@node-llm/monitor-otel";

const monitor = Monitor.memory();

// Hook into OpenTelemetry
const provider = new NodeTracerProvider();
provider.addSpanProcessor(new NodeLLMSpanProcessor(monitor.getStore()));
provider.register();

// Use Vercel AI SDK with telemetry enabled
const result = await generateText({
  model: openai("gpt-4o-mini"),
  prompt: "Write a haiku",
  experimental_telemetry: { isEnabled: true }
});

The NodeLLMSpanProcessor intercepts AI spans and extracts model, tokens, and cost.

Manual Instrumentation

For raw OpenAI SDK or any custom setup:

import { Monitor } from "@node-llm/monitor";
import OpenAI from "openai";

const monitor = Monitor.memory();
const openai = new OpenAI();

async function trackedCompletion(prompt: string) {
  const ctx = {
    requestId: crypto.randomUUID(),
    provider: "openai",
    model: "gpt-4o",
    state: {},
    messages: [{ role: "user", content: prompt }],
  };

  await monitor.onRequest(ctx);

  try {
    const completion = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: [{ role: "user", content: prompt }],
    });

    await monitor.onResponse(ctx, {
      toString: () => completion.choices[0].message.content ?? "",
      usage: completion.usage,
    });

    return completion;
  } catch (error) {
    await monitor.onError(ctx, error as Error);
    throw error;
  }
}

Storage Adapters

Choose where telemetry lives:

import { Monitor, MemoryAdapter, FileAdapter, PrismaAdapter } from "@node-llm/monitor";
import { createPrismaMonitor, createFileMonitor } from "@node-llm/monitor";

// Development: In-memory (fast, lost on restart)
const monitor = Monitor.memory();

// File-based (JSON logs, survives restarts)
const monitor = createFileMonitor("./logs/llm.log");

// Production: Prisma (Postgres, MySQL, SQLite)
const monitor = createPrismaMonitor(prismaClient);

// Custom: Implement MonitoringStore interface
const monitor = new Monitor({ store: new MyRedisAdapter() });

The Dashboard

The dashboard is a React SPA that gets served as static files from your server:

app.use(monitor.api({ 
  basePath: "/monitor",
  i18n: {
    title: "My App Monitor",
    supportedLngs: ["en", "es", "ar"],
  }
}));

Features:

Real-time metrics (requests, cost, latency over time)
Request traces with full payload inspection
Filter by provider, model, status
Token analytics per model
RTL support for Arabic/Hebrew teams

No external JavaScript. No tracking. Everything runs on your server.

What You Get

Metric	Description
`totalRequests`	Count of completed requests
`totalCost`	Accumulated USD cost
`avgDuration`	Average request latency (ms)
`errorRate`	Percentage of failed requests
`totalPromptTokens`	Input tokens across all requests
`totalCompletionTokens`	Output tokens across all requests
`totalSelfCorrections`	Requests that needed schema retry

Per-request traces include:

Provider and model
Duration and cost
CPU time and memory allocations
Full event timeline (request.start → tool.start → tool.end → request.end)

Privacy: Content Scrubbing

By default, request/response content is not captured. Enable it with scrubbing:

const monitor = new Monitor({
  store: new MemoryAdapter(),
  captureContent: true,
  scrubbing: { pii: true, secrets: true }
});

The ContentScrubber masks emails, API keys, and other sensitive patterns before storage.

Why Self-Hosted?

	NodeLLM Monitor	Langfuse	LangSmith
Hosting	Your server	Their cloud	Their cloud
Data location	Your database	Their database	Their database
Pricing	Free	Free tier → $59+/mo	$39+/mo
Setup	One npm package	Account + SDK + config	Account + SDK + config

If you're building internal tools or have compliance requirements, self-hosted is the only option that makes sense.

Links

Questions? Open an issue or reach out @eshaiju.