LangSmith | Part 3/3 of Generative AI for JS Developers
Dive Deep into LangSmith and it's integration with JS/TS

Introduction
In the first two parts of this series, we explored LangChain (framework for building LLM-powered apps) and LangGraph (stateful reasoning and agent orchestration). Now, let’s dive into the final piece of the puzzle: LangSmith.
LangSmith is your observability and evaluation platform for LLM applications. It helps you debug, monitor, test, and improve chains, agents, and workflows built using LangChain (and beyond).
If LangChain is your code, and LangGraph is your control flow, then LangSmith is your debugging dashboard and analytics brain.
Why LangSmith?
When building with LLMs, things often break in non-obvious ways:
The model “hallucinates” incorrect answers.
Your prompt isn’t structured properly.
A single missing memory update derails an agent.
You don’t know why your pipeline is slow.
LangSmith solves this by giving you:
Tracing – Full visibility into every step of your chain/agent.
Dataset Management – Run systematic evaluations on test cases.
Feedback & Scoring – Collect user or automated feedback.
Monitoring – Track performance in production.
Setting Up LangSmith in a JavaScript Project
Package Installation
First, install the necessary packages:
npm install langchain @langchain/langsmith
Environment Variables
You’ll need to set up your LangSmith API key:
export LANGSMITH_API_KEY="your_api_key_here"
Or, in a .env file for Node.js projects:
LANGSMITH_API_KEY=your_api_key_here
LANGSMITH_TRACING=true
Enable tracing to automatically log runs to LangSmith.
Basic Example: Tracing a LangChain Call
Let’s start with a simple LLM chain and log it to LangSmith:
import { ChatOpenAI } from "@langchain/openai";
import { PromptTemplate } from "@langchain/core/prompts";
import { LLMChain } from "langchain/chains";
const model = new ChatOpenAI({
temperature: 0.7,
modelName: "gpt-4o-mini",
});
const prompt = new PromptTemplate({
template: "Translate the following sentence to French: {text}",
inputVariables: ["text"],
});
const chain = new LLMChain({
llm: model,
prompt,
});
const result = await chain.run("I love programming!");
console.log(result); // "J'aime programmer !"
With LANGSMITH_TRACING=true, this run is automatically logged in LangSmith. You can view:
Prompt sent
Response received
Latency
Tokens used
Viewing Traces in LangSmith
After running the above, head to the LangSmith dashboard. You’ll see:
A tree view of your run (chain → LLM call).
Full input/output history.
Logs of intermediate steps.
This is invaluable for debugging nested chains and agents.
Advanced Use Case: Tracing Agents with Tools
LangSmith shines when working with agents. Let’s trace an agent using tools:
import { initializeAgentExecutorWithOptions } from "langchain/agents";
import { SerpAPI } from "langchain/tools";
import { ChatOpenAI } from "@langchain/openai";
const model = new ChatOpenAI({ temperature: 0 });
const tools = [
new SerpAPI(process.env.SERPAPI_API_KEY, {
location: "San Francisco, California, United States",
hl: "en",
gl: "us",
}),
];
const executor = await initializeAgentExecutorWithOptions(tools, model, {
agentType: "openai-functions",
});
console.log("Agent loaded. Ask it something...");
const result = await executor.run("What is the latest news about LangChain?");
console.log(result);
With LangSmith tracing enabled, you’ll see:
Agent reasoning steps (thoughts).
Tool calls (e.g., SerpAPI queries).
Final answer.
Timing breakdown.
This makes agent debugging so much easier.
Evaluating Models with Datasets
One of the most powerful features of LangSmith is evaluation. You can upload a dataset of inputs and expected outputs, then run chains/agents against it.
Step 1: Create a Dataset
In the LangSmith dashboard, create a dataset, e.g., “Translation Tests”.
Step 2: Run Against Dataset in JS
import { RunEval } from "@langchain/langsmith/evaluation";
import { client } from "@langchain/langsmith";
const datasetName = "Translation Tests";
const runs = await client.runOnDataset(datasetName, chain, {
metadata: { purpose: "French translation testing" },
});
console.log("Evaluation run started:", runs);
Step 3: Add Feedback
LangSmith lets you add manual or automated feedback:
await client.createFeedback(runId, "accuracy", {
score: 0.9,
comment: "Close, but missed nuance in translation",
});
This makes it possible to quantitatively track improvements as you tweak prompts and models.
Monitoring in Production
LangSmith isn’t just for dev-time debugging. You can:
Log user interactions in production.
Track metrics (latency, tokens, costs).
Analyze failures by searching run history.
Example: logging custom metadata:
const result = await chain.run("Hello World", {
tags: ["prod", "user123"],
metadata: { sessionId: "abc-123", feature: "translation" },
});
Now, in LangSmith, you can filter all runs for user123.
Automated Evaluation Example
LangSmith supports LLM-as-a-judge evaluation. For example, checking if translations are “fluent”:
import { RunEvalConfig } from "@langchain/langsmith/evaluation";
const evalConfig = {
evaluators: [
{
evaluatorType: "criteria",
criteria: {
fluency: "Is the text fluent and grammatically correct?",
},
},
],
};
const evalResult = await client.runOnDataset("Translation Tests", chain, {
evaluationConfig: evalConfig,
});
console.log(evalResult);
This automatically scores runs using an LLM.
Best Practices for JS Developers
Enable tracing early – Don’t wait until production bugs appear.
Use metadata/tags – Organize runs by environment, user, or feature.
Automate evaluations – Don’t rely only on manual testing.
Log tool usage – Especially for agents, since they often fail silently.
Monitor costs – LangSmith shows token/cost breakdowns.
Conclusion
With LangSmith, your LangChain and LangGraph apps move from “black boxes” to transparent, measurable, improvable systems.
Use tracing to debug.
Use datasets and evaluations to improve.
Use monitoring to keep things reliable in production.
This completes our 3-part series:
LangGraph – Controlling agent workflows.
LangChain – Building chains and agents.
LangSmith – Debugging, monitoring, and improving.
Together, these tools form a full-stack framework for AI applications.




