Stop Building Toy Chatbots: A Guide to Streaming, Persistence, and RAG in Node.js

Building a CLI script that asks GPT-4o a question is a great afternoon project. But building a production-grade HR Chatbot that manages thousand-message threads, implements RAG with vector search, and streams tokens in real-time to a React frontend? That’s where things get messy.

Most AI frameworks focus on the "magic"—the prompt engineering and agentic loops. But for Node.js developers, the real work is in the plumbing: persistence, streaming, and state management.

Today we are releasing @node-llm/orm, a package designed to turn that plumbing into a first-class citizen.

The "Toy" Problem

In a toy example, your chat logic usually looks like this:

const response = await llm.ask("How many vacation days do I have?");
// ... now what? manual save to DB? update context? handle mid-stream crashes?

When you move to production, you suddenly need to answer:

Thread History: How do I retrieve the last 20 messages for this specific user?
Persistence: If the tool call fails or the server restarts, do I lose the whole conversation?
Streaming: How do I deliver data to the frontend token-by-token while still saving the final result to the database?

Introducing @node-llm/orm

We built @node-llm/orm to solve this by integrating directly with Prisma. It treats your Chat sessions as database-backed objects.

Instead of managing arrays of messages manually, you work with an AssistantChat instance:

import { PrismaClient } from '@prisma/client';
import { NodeLLM } from '@node-llm/core';
import { createChat, loadChat } from '@node-llm/orm';

const prisma = new PrismaClient();
const llm = NodeLLM.withProvider('openai');

const chat = await createChat(prisma, llm, {
  userId: "user_123",
  metadata: { department: "Engineering" }
});

// Everything below is automatically persisted to your database
const response = await chat.ask("What's the company policy on remote work?");

Why this matters for Node engineers:

Automatic Threading: No more re-sending the same 50 messages on every request. The ORM fetches the history for you.
Transparent Tool Persistence: When your agent calls a tool (like check_vacation_balance), the tool call and its result are saved as separate records. This is huge for auditing and debugging.
Strict Typing: It's built for TypeScript. Your metadata and customFields follow your Prisma schema exactly.

The Streaming Challenge

Streaming is non-negotiable for modern AI UX. Users hate spinners; they want tokens immediately.

But streaming often breaks persistence. If you stream tokens, you typically don't get the "final" message object until the end. If the user closes the tab mid-stream, the message might never get saved.

@node-llm/orm handles this with askStream. It yields tokens for your frontend while internalizing the persistence logic:

// Server Action (Next.js)
export async function sendMessage(chatId: string, content: string) {
  const chat = await loadChat(prisma, llm, chatId);
  
  const stream = chat.askStream(content);
  
  return new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        controller.enqueue(chunk.content);
      }
      controller.close();
    }
  });
}

Even if the stream is interrupted, the library ensures that partial results and metadata are handled according to your configuration.

Flagship Example: The HR Chatbot RAG

To show exactly how this works, we’ve included a new flagship example: HR Chatbot RAG.

It’s a full-stack Next.js application that demonstrates:

RAG (Retrieval Augmented Generation): Searching employee handbooks via vector search.
Persistence: Using PostgreSQL and @node-llm/orm to store every interaction.
UI Architecture: A sleek React interface using Tailwind and streaming hooks.

It’s not a snippet; it’s a blueprint for building AI applications that don't fall apart under load.

Infrastructure Over "Magic"

At NodeLLM, our philosophy remains the same: LLMs are just another piece of infrastructure. They need timeouts, security guards, and—most importantly—a reliable place to store their data.

Stop building toy scripts. Start building production systems.

Check out the new ORM documentation or jump straight into the HR Chatbot example.

Building something cool with @node-llm/orm? Drop a star on GitHub and let us know!