LangGraphAI AgentsFastAPI

Agentic Workflows with LangGraph: Tool Calling & Persistent Checkpointing

February 2025

At MAindTec, the core of our AI product is a multi-step agent that can reason, use tools, search the web, query documents, and track its own token costs — all in a single conversation turn. We built this using LangGraph, and this post covers the key architectural decisions.

Why LangGraph over Plain LangChain?

LangChain chains are linear. Real agent workflows are not — they branch, loop, retry, and pause for human approval. LangGraph models the agent as a directed graph of nodes (processing steps) and edges (transitions). This makes complex control flow explicit and easy to reason about.

The killer feature for us was persistent checkpointing: LangGraph can serialize the full graph state to a database after every node execution. If a request times out or fails, the agent can resume from the last checkpoint rather than starting over.

Graph Structure

Our agent graph has four main nodes: a router that classifies intent, a RAG node that queries the document store, a web search node, and a response synthesizer. Edges are conditional — the router decides which node fires next based on the query type.

python

from langgraph.graph import StateGraph, END
from langgraph.checkpoint.postgres import PostgresSaver
from typing import TypedDict, Annotated
import operator

class AgentState(TypedDict):
    messages: Annotated[list, operator.add]
    intent: str
    retrieved_docs: list[str]
    total_tokens: int
    total_cost_usd: float

def build_graph(checkpointer: PostgresSaver) -> StateGraph:
    graph = StateGraph(AgentState)

    graph.add_node("router", route_intent)
    graph.add_node("rag", rag_retrieval)
    graph.add_node("web_search", web_search)
    graph.add_node("synthesizer", synthesize_response)

    graph.set_entry_point("router")
    graph.add_conditional_edges("router", decide_next_node, {
        "rag": "rag",
        "web": "web_search",
        "direct": "synthesizer",
    })
    graph.add_edge("rag", "synthesizer")
    graph.add_edge("web_search", "synthesizer")
    graph.add_edge("synthesizer", END)

    return graph.compile(checkpointer=checkpointer)

Persistent Checkpointing with PostgreSQL

LangGraph ships with a PostgresSaver that writes checkpoints to a Postgres table. We use the same multi-schema database that stores tenant data — each tenant's agent runs are isolated by schema. This gives us full conversation history, retry capability, and an audit trail for every agent decision.

python

from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
import psycopg

async def get_checkpointer(tenant_schema: str):
    conn = await psycopg.AsyncConnection.connect(
        settings.DATABASE_URL,
        options=f"-c search_path={tenant_schema}"
    )
    checkpointer = AsyncPostgresSaver(conn)
    await checkpointer.setup()
    return checkpointer

Tool Calling

Each tool is a Python function decorated with @tool. LangGraph passes the tool schemas to the LLM and routes tool call results back into the graph automatically. We built tools for: document RAG, web search (via Tavily), code execution, and calculator.

python

from langchain_core.tools import tool

@tool
async def search_documents(query: str, top_k: int = 5) -> str:
    """Search the tenant's uploaded documents for relevant information."""
    chunks = await hybrid_rag_search(query, top_k=top_k)
    return "\n\n".join(chunks)

@tool
async def web_search(query: str) -> str:
    """Search the web for up-to-date information."""
    from tavily import TavilyClient
    client = TavilyClient(api_key=settings.TAVILY_API_KEY)
    results = client.search(query, max_results=5)
    return "\n".join(r["content"] for r in results["results"])

Token Cost Tracking

Every LLM call in the graph updates the AgentState with token usage and estimated cost. We map model names to per-token pricing and accumulate costs across all nodes. At the end of the run, the total is written to the billing ledger and deducted from the tenant's credit balance.

python

COST_PER_TOKEN = {
    "gpt-4o": {"input": 5.00 / 1_000_000, "output": 15.00 / 1_000_000},
    "gpt-4o-mini": {"input": 0.15 / 1_000_000, "output": 0.60 / 1_000_000},
}

def calculate_cost(model: str, usage: dict) -> float:
    pricing = COST_PER_TOKEN.get(model, COST_PER_TOKEN["gpt-4o-mini"])
    return (
        usage["prompt_tokens"] * pricing["input"]
        + usage["completion_tokens"] * pricing["output"]
    )

Key Takeaways

LangGraph is the right abstraction for production agents. The graph model forces you to be explicit about control flow, checkpointing gives you durability for free, and the tool-calling integration with Azure OpenAI is seamless. The main gotcha: streaming responses through the graph adds complexity — plan for it from day one rather than retrofitting.