NodeLLM Monitor: Production Observability for LLM Applications

Building AI applications without observability is like flying blind. You ship a chatbot, usage spikes, and suddenly you're hit with a $500 OpenAI bill with no idea where it came from.

Why not just use Datadog, New Relic, or Langfuse?

Traditional APM tools weren't built for LLM workloads. They'll show you request latency, but not token costs. They'll log errors, but not prompt/response pairs. You end up paying $50+/month for generic metrics while building custom instrumentation for the AI-specific data you actually need.

@node-llm/monitor is different:

AI-Native Metrics — Token usage, cost per request, prompt/response content, tool call traces—out of the box
Zero External Dependencies — Embedded dashboard ships with the package. No SaaS vendor, no data leaving your infrastructure
Open Source & Free — No per-seat pricing, no usage tiers, no surprise bills
Self-Hosted — Your data stays in your database (Postgres, SQLite, or even in-memory)
Framework Agnostic — Works with NodeLLM, Vercel AI SDK, LangChain, or raw OpenAI calls

@node-llm/monitor solves this problem at the infrastructure level.

The Problem: LLM Black Boxes

Every production LLM system eventually needs answers to:

"Why is this request so slow?" — Was it the model? The prompt? A tool call?
"Which feature is burning our budget?" — Is it the chatbot? The document search? The agent?
"What happened to user X's session?" — Can we replay the conversation?
"Are we hitting rate limits?" — How close are we to provider quotas?

Without proper monitoring, you're guessing. With @node-llm/monitor, you know.

Quick Start

Install and integrate in under 5 minutes:

npm install @node-llm/monitor

import { createLLM } from "@node-llm/core";
import { Monitor } from "@node-llm/monitor";

// Create monitor (defaults to in-memory)
const monitor = Monitor.memory();

// Add to your LLM instance - monitor IS the middleware
const llm = createLLM({
  provider: "openai",
  model: "gpt-4o-mini",
  openaiApiKey: process.env.OPENAI_API_KEY,
  middlewares: [monitor],
});

Every LLM call is now tracked. To visualize your data, jump to the Dashboard section below.

What Gets Captured

Every request automatically captures:

Metric	Description
Provider	OpenAI, Anthropic, Google, etc.
Model	gpt-4o, claude-3-opus, gemini-pro
Latency	Total request time in milliseconds
Tokens	Input, output, and total token counts
Cost	Calculated cost based on provider pricing
Status	Success, error, timeout
Tool Calls	Function executions with timing

Plus optional full request/response content for debugging.

The Dashboard

The built-in dashboard provides real-time visibility:

Metrics View

At a glance, see:

Total Requests — Request volume over time
Total Cost — Running cost accumulation
Avg Response Time — P50/P90 latency
Error Rate — Failure percentage

Plus interactive charts for requests, costs, response times, and errors.

Traces View

Drill into individual requests:

Full execution timeline
Tool call breakdown
Request/response content
Error details and stack traces

Token Analytics View

Deep dive into token usage:

Total Tokens — Input vs output breakdown with efficiency ratio
Token Breakdown — Visual input/output distribution bar
Time Series Charts — Input and output tokens over time
Cost per 1K Tokens — Per-provider/model cost analysis
Token Summary — Avg cost, tokens per request, estimated spend

Perfect for optimizing prompts and identifying cost drivers.

Running the Dashboard

To launch the dashboard, you need to share the storage adapter between the Monitor and the Dashboard:

import { Monitor, MemoryAdapter } from "@node-llm/monitor";
import { MonitorDashboard } from "@node-llm/monitor/ui";
import { createServer } from "node:http";

// 1. Create a shared store
const store = new MemoryAdapter(); 

// 2. Connect the Monitor
const monitor = new Monitor({ store });

// 3. Connect the Dashboard
const dashboard = new MonitorDashboard(store, { basePath: "/monitor" });

// 4. Mount to a server
createServer((req, res) => dashboard.handleRequest(req, res)).listen(3001);
console.log("Dashboard at http://localhost:3001/monitor");

Storage Adapters

Choose the right storage for your use case:

Memory (Development)

import { Monitor } from "@node-llm/monitor";

const monitor = Monitor.memory();

Fast, ephemeral—perfect for development.

File (Logging)

import { createFileMonitor } from "@node-llm/monitor";

const monitor = createFileMonitor("./logs/llm-events.log");

Append-only JSON lines—great for debugging.

Prisma (Production)

import { PrismaClient } from "@prisma/client";
import { createPrismaMonitor } from "@node-llm/monitor";

const prisma = new PrismaClient();
const monitor = createPrismaMonitor(prisma);

Full SQL storage with querying, aggregation, and retention policies.

Privacy & Security

By default, content capture is disabled:

import { Monitor } from "@node-llm/monitor";

const monitor = Monitor.memory({
  captureContent: false, // Default: don't store prompts/responses
});

When you do need to capture content (for debugging), automatic scrubbing kicks in:

const monitor = Monitor.memory({
  captureContent: true,
  scrubbing: {
    pii: true,     // Scrub emails, phone numbers, SSNs
    secrets: true, // Scrub API keys, passwords, tokens
  },
});

PII is redacted before it ever hits storage.

Cost Attribution

Track costs by session or transaction:

const chat = llm.chat("gpt-4o");

// Use sessionId and transactionId to group related requests
const response = await chat.ask("Summarize this document", {
  sessionId: session.id,
  transactionId: `doc-search-${docId}`,
});

Then query costs from the store:

// Query with Prisma
const costBySession = await prisma.monitoring_events.groupBy({
  by: ["sessionId"],
  where: { 
    eventType: "request.end",
    time: { gte: new Date("2026-01-01"), lte: new Date("2026-01-31") }
  },
  _sum: { cost: true },
});

Time Series Aggregation

Built-in aggregation for dashboards and alerts:

import { TimeSeriesBuilder } from "@node-llm/monitor";

// Create builder with 1-hour buckets
const builder = new TimeSeriesBuilder(60 * 60 * 1000);

// Build time series from events
const timeSeries = builder.build(events);
// Returns: { requests: [...], cost: [...], duration: [...], errors: [...] }

// Or use the store's built-in getMetrics
const metrics = await store.getMetrics({
  from: new Date(Date.now() - 24 * 60 * 60 * 1000),
});

Perfect for Grafana, Datadog, or custom dashboards.

Works Without NodeLLM

While optimized for NodeLLM, the monitor works with any LLM library:

import { Monitor } from "@node-llm/monitor";

const monitor = Monitor.memory();

const ctx = { 
  requestId: crypto.randomUUID(), 
  provider: "openai", 
  model: "gpt-4",
  state: {},
};

await monitor.onRequest(ctx);

try {
  const startTime = Date.now();
  const response = await openai.chat.completions.create({ ... });
  await monitor.onResponse(ctx, {
    toString: () => response.choices[0].message.content,
    usage: response.usage,
  });
} catch (error) {
  await monitor.onError(ctx, error);
}

Use it with Vercel AI SDK, LangChain, or raw OpenAI.

Installation

npm install @node-llm/monitor

With OpenTelemetry (Vercel AI SDK, etc.):

npm install @node-llm/monitor @node-llm/monitor-otel

With Prisma (production):

npm install @node-llm/monitor @prisma/client prisma

Add the schema and run migrations—see the Prisma setup guide.

OpenTelemetry Integration

NEW in v0.3.0: Zero-code instrumentation for Vercel AI SDK and other OTel-instrumented libraries via @node-llm/monitor-otel.

npm install @node-llm/monitor-otel @opentelemetry/sdk-trace-node

import { NodeTracerProvider } from "@opentelemetry/sdk-trace-node";
import { NodeLLMSpanProcessor } from "@node-llm/monitor-otel";
import { Monitor } from "@node-llm/monitor";

const monitor = Monitor.memory();
const provider = new NodeTracerProvider();

// Hook the AI-aware SpanProcessor into your OTel pipeline
provider.addSpanProcessor(new NodeLLMSpanProcessor(monitor.getStore()));
provider.register();

// Now Vercel AI SDK's experimental_telemetry flows to node-llm-monitor
await generateText({
  model: openai("gpt-4o"),
  prompt: "Hello",
  experimental_telemetry: { isEnabled: true }
});

The processor automatically extracts:

Provider and model names (normalized from OTel attributes)
Token counts and cost estimation
Streaming metrics (ms to first chunk, tokens/sec)
Tool call details
Full request/response content (when enabled)

What's Next

This is just the beginning. Coming soon:

Alerting — Get notified when costs exceed thresholds
Anomaly Detection — Automatic detection of unusual patterns
Multi-Tenant — Isolated monitoring per customer

Get Started

npm install @node-llm/monitor

Documentation: nodellm.dev/monitor

GitHub: github.com/node-llm/node-llm-monitor

Stop guessing. Start monitoring.