Next.js + WebGPU: Build an Offline-First AI Chatbot That Runs 100% in the Browser (No Backend Needed)
Making offline AI for you next web application

Overview
The future of software engineering is AI. But the future of AI is not on the server, it’s in the browser - local and offline directly running on the browser without a server.
But is it really possible as of 2025? Short answer - yes.
With WebGPU, modern browsers can run AI models locally on the user’s device.
Combine that with a frontend framework or library such as Next.js, and you can build a fully-functional offline AI chatbot that works:
without a backend,
without API keys,
without sending data to the cloud, and
even in airplane mode.
This is Offline-First AI, and it’s one of the hottest topics in web development right now.
In this tutorial, you’ll learn how to build an LLM-powered offline chatbot in Next.js using WebGPU + Transformers.js.
Why Offline-First AI Is Exploding in 2025
Before diving into the tutorial, let’s first understand why Offline first AI is the future emerging today -
No backend, no GPU servers - LLMs are expensive to host. Browser LLMs = zero infra cost.
Privacy-friendly - Messages never leave the device. Ideal for enterprise, health, finance.
Fast & responsive - Running on the user's GPU via WebGPU = instant inference.
Works even with poor internet - Download once → use forever.
Architecture of an Offline WebGPU Chatbot in Next.js
Next.js (Client Components Only)
│
├── Transformers.js (WebGPU backend)
│ └── TinyLlama / Phi / DistilGPT2 model
│
├── Model Hosted Locally
│ └── /public/models/gguf or safetensors
│
└── No API calls / No backend
Everything runs in-browser.
Let the code begin -
Step 1 - Create a New Next.js Project
npx create-next-app offline-chatbot
cd offline-chatbot
Step 2 - Install Transformers.js (WebGPU Enabled)
npm install @xenova/transformers
Transformers.js automatically uses WebGPU when available. You get GPU inference in Chrome, Edge, and soon Firefox.
Step 3 - Download a Small LLM (TinyLlama or Phi)
You need a small model for fast browser inference.
Recommended:
Xenova/tiny-llamaXenova/phi-1_5Xenova/distilgpt2(lightest)
For this tutorial, we’ll use TinyLlama.
The model is auto-downloaded and cached by Transformers.js, so no backend required.
Step 4 - Build the Offline Chatbot Component
Create the chatbot page:
app/chat/page.tsx
"use client";
import { useState } from "react";
import { pipeline } from "@xenova/transformers";
export default function Chatbot() {
const [messages, setMessages] = useState([
{ from: "bot", text: "Hello! I'm your offline AI assistant 🤖" },
]);
const [input, setInput] = useState("");
const [loading, setLoading] = useState(false);
async function handleSend() {
if (!input.trim()) return;
const userMessage = input;
setMessages((m) => [...m, { from: "user", text: userMessage }]);
setInput("");
setLoading(true);
// Load model on-demand (cached afterward)
const generator = await pipeline("text-generation", "Xenova/tiny-llama");
const output = await generator(userMessage, {
max_new_tokens: 60,
temperature: 0.7,
});
setMessages((m) => [
...m,
{
from: "bot",
text: output[0].generated_text.replace(userMessage, "").trim(),
},
]);
setLoading(false);
}
return (
<div className="p-8 max-w-xl mx-auto">
<h1 className="text-3xl font-bold mb-4">
Offline AI Chatbot (Next.js + WebGPU)
</h1>
<div className="border p-4 rounded-md mb-4 h-96 overflow-y-auto">
{messages.map((msg, i) => (
<div
key={i}
className={`my-2 p-2 rounded-md ${
msg.from === "user" ? "bg-blue-200 ml-auto w-fit" : "bg-gray-200"
}`}
>
{msg.text}
</div>
))}
{loading && <div className="italic">Thinking...</div>}
</div>
<div className="flex gap-2">
<input
className="border p-2 flex-1 rounded-md"
placeholder="Type a message..."
value={input}
onChange={(e) => setInput(e.target.value)}
/>
<button
onClick={handleSend}
className="bg-black text-white px-4 py-2 rounded-md"
>
Send
</button>
</div>
</div>
);
}
This component:
loads TinyLlama locally
performs inference using WebGPU
stores chat history in component state
produces responses offline
Step 5 - Confirm WebGPU Acceleration Is Active
Create a utility:
app/components/WebGPUStatus.tsx
"use client";
export default function WebGPUStatus() {
const supported = typeof navigator !== "undefined" && !!navigator.gpu;
return (
<p className="text-sm text-gray-700 my-2">
WebGPU: {supported ? "Supported ✓" : "Not Supported ✗"}
</p>
);
}
Add to the page:
<WebGPUStatus />
If WebGPU isn't available, the model still works but falls back to WASM (slower).
Step 6 - Add Streaming Responses (LLM Typing Effect) – Optional but Awesome
Replace model generation with streaming:
const generator = await pipeline("text-generation", "Xenova/tiny-llama", {
progress_callback: (progress) => {},
});
let full = "";
await generator.stream(userMessage, {
max_new_tokens: 80,
temperature: 0.7,
callback_function: (token) => {
full += token;
setMessages((m) => [
...m.slice(0, -1),
{ from: "bot", text: full },
]);
},
});
Now your chatbot types like ChatGPT, fully offline.
Step 7 - Memory System (Storing Chat History Locally)
Add offline persistence via LocalStorage:
useEffect(() => {
const saved = localStorage.getItem("chat-history");
if (saved) setMessages(JSON.parse(saved));
}, []);
useEffect(() => {
localStorage.setItem("chat-history", JSON.stringify(messages));
}, [messages]);
Now your chatbot remembers conversations even if the device is offline forever.
Optional: Step 8 - Ship the Model With Your App (True Fully Offline Mode)
Place a model file into /public/models/tiny-llama/ and load:
const generator = await pipeline("text-generation", "/models/tiny-llama/");
This avoids downloading from HuggingFace, making our application perfect for:
offline enterprise apps
kiosk apps
PWA apps
private deployments
Final Results
You now have a Next.js + WebGPU Offline-First Chatbot that runs:
with zero backend
fully private
blazingly fast thanks to WebGPU
offline forever after first load
Conclusion - Offline-First AI Is the Future of Web Apps
We’re entering a new era where AI no longer requires servers, GPUs, or API keys.
With Next.js + WebGPU, you can build AI products that run:locally on the user's device
fast, private, and secure
without depending on the cloud
without paying for inference GPUs
and without worrying about rate limits or outages
The offline-first chatbot you built here is more than a cool demo - it’s the foundation of a new category of web apps:
local-only AI assistants
enterprise-secure internal chatbots
private productivity tools (notes, summarizers, generators)
AI-powered PWAs that work in airplane mode
fully edge-based WebGPU ML applications
As browsers continue to improve WebGPU support and more compact LLMs appear (e.g., TinyLlama, Phi-2, Gemma-2B, Qwen2-1.5B), we’re heading toward a world where your browser is the new AI runtime.
This is the perfect time to start building offline-first AI experiences. With Next.js, you already have the framework that developers love and Google ranks highly.



