February 2025
At MAindTec, the core of our AI product is a multi-step agent that can reason, use tools, search the web, query documents, and track its own token costs — all in a single conversation turn. We built this using LangGraph, and this post covers the key architectural decisions.
LangChain chains are linear. Real agent workflows are not — they branch, loop, retry, and pause for human approval. LangGraph models the agent as a directed graph of nodes (processing steps) and edges (transitions). This makes complex control flow explicit and easy to reason about.
The killer feature for us was persistent checkpointing: LangGraph can serialize the full graph state to a database after every node execution. If a request times out or fails, the agent can resume from the last checkpoint rather than starting over.
Our agent graph has four main nodes: a router that classifies intent, a RAG node that queries the document store, a web search node, and a response synthesizer. Edges are conditional — the router decides which node fires next based on the query type.
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.postgres import PostgresSaver
from typing import TypedDict, Annotated
import operator
class AgentState(TypedDict):
messages: Annotated[list, operator.add]
intent: str
retrieved_docs: list[str]
total_tokens: int
total_cost_usd: float
def build_graph(checkpointer: PostgresSaver) -> StateGraph:
graph = StateGraph(AgentState)
graph.add_node("router", route_intent)
graph.add_node("rag", rag_retrieval)
graph.add_node("web_search", web_search)
graph.add_node("synthesizer", synthesize_response)
graph.set_entry_point("router")
graph.add_conditional_edges("router", decide_next_node, {
"rag": "rag",
"web": "web_search",
"direct": "synthesizer",
})
graph.add_edge("rag", "synthesizer")
graph.add_edge("web_search", "synthesizer")
graph.add_edge("synthesizer", END)
return graph.compile(checkpointer=checkpointer)LangGraph ships with a PostgresSaver that writes checkpoints to a Postgres table. We use the same multi-schema database that stores tenant data — each tenant's agent runs are isolated by schema. This gives us full conversation history, retry capability, and an audit trail for every agent decision.
from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
import psycopg
async def get_checkpointer(tenant_schema: str):
conn = await psycopg.AsyncConnection.connect(
settings.DATABASE_URL,
options=f"-c search_path={tenant_schema}"
)
checkpointer = AsyncPostgresSaver(conn)
await checkpointer.setup()
return checkpointerEach tool is a Python function decorated with @tool. LangGraph passes the tool schemas to the LLM and routes tool call results back into the graph automatically. We built tools for: document RAG, web search (via Tavily), code execution, and calculator.
from langchain_core.tools import tool
@tool
async def search_documents(query: str, top_k: int = 5) -> str:
"""Search the tenant's uploaded documents for relevant information."""
chunks = await hybrid_rag_search(query, top_k=top_k)
return "\n\n".join(chunks)
@tool
async def web_search(query: str) -> str:
"""Search the web for up-to-date information."""
from tavily import TavilyClient
client = TavilyClient(api_key=settings.TAVILY_API_KEY)
results = client.search(query, max_results=5)
return "\n".join(r["content"] for r in results["results"])Every LLM call in the graph updates the AgentState with token usage and estimated cost. We map model names to per-token pricing and accumulate costs across all nodes. At the end of the run, the total is written to the billing ledger and deducted from the tenant's credit balance.
COST_PER_TOKEN = {
"gpt-4o": {"input": 5.00 / 1_000_000, "output": 15.00 / 1_000_000},
"gpt-4o-mini": {"input": 0.15 / 1_000_000, "output": 0.60 / 1_000_000},
}
def calculate_cost(model: str, usage: dict) -> float:
pricing = COST_PER_TOKEN.get(model, COST_PER_TOKEN["gpt-4o-mini"])
return (
usage["prompt_tokens"] * pricing["input"]
+ usage["completion_tokens"] * pricing["output"]
)LangGraph is the right abstraction for production agents. The graph model forces you to be explicit about control flow, checkpointing gives you durability for free, and the tool-calling integration with Azure OpenAI is seamless. The main gotcha: streaming responses through the graph adds complexity — plan for it from day one rather than retrofitting.