langchain/docs/versioned_docs/version-0.2.x/how_to/llm_token_usage_tracking.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "90dff237-bc28-4185-a2c0-d5203bbdeacd",
   "metadata": {},
   "source": [
    "# How to track token usage for LLMs\n",
    "\n",
    "Tracking token usage to calculate cost is an important part of putting your app in production. This guide goes over how to obtain this information from your LangChain model calls.\n",
    "\n",
    "```{=mdx}\n",
    "import PrerequisiteLinks from \"@theme/PrerequisiteLinks\";\n",
    "\n",
    "<PrerequisiteLinks content={`\n",
    "- [LLMs](/docs/concepts/#llms)\n",
    "`} />\n",
    "```\n",
    "\n",
    "## Using LangSmith\n",
    "\n",
    "You can use [LangSmith](https://www.langchain.com/langsmith) to help track token usage in your LLM application. See the [LangSmith quick start guide](https://docs.smith.langchain.com/).\n",
    "\n",
    "## Using callbacks\n",
    "\n",
    "There are some API-specific callback context managers that allow you to track token usage across multiple calls. You'll need to check whether such an integration is available for your particular model.\n",
    "\n",
    "If such an integration is not available for your model, you can create a custom callback manager by adapting the implementation of the OpenAI callback manager (find it in the code base and adapt for your own LLMs.)\n",
    "\n",
    "### OpenAI\n",
    "\n",
    "Let's first look at an extremely simple example of tracking token usage for a single Chat model call.\n",
    "\n",
    ":::{.callout-danger}\n",
    "\n",
    "The callback handler does **NOT** work when **streaming** becauyse OpenAI does not return token information in its API response.\n",
    "\n",
    "::"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f790edd9-823e-4bc5-befa-e9529c7237a0",
   "metadata": {},
   "source": [
    "### Single call"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "757cbfd6-22a1-44a2-bbe3-65d85aee3fee",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Why don't scientists trust atoms?\n",
      "\n",
      "Because they make up everything!\n",
      "---\n",
      "\n",
      "Total Tokens: 18\n",
      "Prompt Tokens: 4\n",
      "Completion Tokens: 14\n",
      "Total Cost (USD): $3.4e-05\n"
     ]
    }
   ],
   "source": [
    "from langchain_community.callbacks import get_openai_callback\n",
    "from langchain_openai import OpenAI\n",
    "\n",
    "llm = OpenAI(model_name=\"gpt-3.5-turbo-instruct\")\n",
    "\n",
    "with get_openai_callback() as cb:\n",
    "    result = llm.invoke(\"Tell me a joke\")\n",
    "    print(result)\n",
    "    print(\"---\")\n",
    "print()\n",
    "\n",
    "print(f\"Total Tokens: {cb.total_tokens}\")\n",
    "print(f\"Prompt Tokens: {cb.prompt_tokens}\")\n",
    "print(f\"Completion Tokens: {cb.completion_tokens}\")\n",
    "print(f\"Total Cost (USD): ${cb.total_cost}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7df3be35-dd97-4e3a-bd51-52434ab2249d",
   "metadata": {},
   "source": [
    "### Multiple calls\n",
    "\n",
    "Anything inside the context manager will get tracked. Here's an example of using it to track multiple calls in sequence to a chain. This will also work for an agent which may use multiple steps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "7d7f8d26-d8f4-40f9-9539-7a6bafdc4e24",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "Why did the chicken go to the seance?\n",
      "\n",
      "To talk to the other side of the road!\n",
      "--\n",
      "\n",
      "\n",
      "Why did the fish blush?\n",
      "\n",
      "Because it saw the ocean's bottom!\n",
      "\n",
      "---\n",
      "Total Tokens: 49\n",
      "Prompt Tokens: 12\n",
      "Completion Tokens: 37\n",
      "Total Cost (USD): $9.200000000000001e-05\n"
     ]
    }
   ],
   "source": [
    "from langchain_community.callbacks import get_openai_callback\n",
    "from langchain_core.prompts import PromptTemplate\n",
    "from langchain_openai import OpenAI\n",
    "\n",
    "llm = OpenAI(model_name=\"gpt-3.5-turbo-instruct\")\n",
    "\n",
    "template = PromptTemplate.from_template(\"Tell me a joke about {topic}\")\n",
    "chain = template | llm\n",
    "\n",
    "with get_openai_callback() as cb:\n",
    "    response = chain.invoke({\"topic\": \"birds\"})\n",
    "    print(response)\n",
    "    response = chain.invoke({\"topic\": \"fish\"})\n",
    "    print(\"--\")\n",
    "    print(response)\n",
    "\n",
    "\n",
    "print()\n",
    "print(\"---\")\n",
    "print(f\"Total Tokens: {cb.total_tokens}\")\n",
    "print(f\"Prompt Tokens: {cb.prompt_tokens}\")\n",
    "print(f\"Completion Tokens: {cb.completion_tokens}\")\n",
    "print(f\"Total Cost (USD): ${cb.total_cost}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad7a3fba-9fac-4222-8f87-d1d276d27d6e",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Streaming\n",
    "\n",
    ":::{.callout-danger}\n",
    "\n",
    "`get_openai_callback` relies on metadata information in the API response to get\n",
    "information about the token counts; however, the `OpenAI` API does **NOT** return\n",
    "token counts when streaming.\n",
    "\n",
    "If you want to count token counts correctly, for streaming, either\n",
    "implement a custom callback handler that uses an appropriate tokenizers to\n",
    "count the tokens or use a monitoring platform like [LangSmith](https://www.langchain.com/langsmith).\n",
    ":::"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "cd61ed79-7858-49bb-afb5-d41291f597ba",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "Why couldn't the bicycle stand up by itself?\n",
      "\n",
      "Because it was two-tired!\n",
      "\n",
      "Why don't scientists trust atoms?\n",
      "\n",
      "Because they make up everything.\n",
      "---\n",
      "\n",
      "Total Tokens: 0\n",
      "Prompt Tokens: 0\n",
      "Completion Tokens: 0\n",
      "Total Cost (USD): $0.0\n"
     ]
    }
   ],
   "source": [
    "from langchain_community.callbacks import get_openai_callback\n",
    "from langchain_openai import OpenAI\n",
    "\n",
    "llm = OpenAI(model_name=\"gpt-3.5-turbo-instruct\")\n",
    "\n",
    "with get_openai_callback() as cb:\n",
    "    for chunk in llm.stream(\"Tell me a joke\"):\n",
    "        print(chunk, end=\"\", flush=True)\n",
    "    print(result)\n",
    "    print(\"---\")\n",
    "print()\n",
    "\n",
    "print(f\"Total Tokens: {cb.total_tokens}\")\n",
    "print(f\"Prompt Tokens: {cb.prompt_tokens}\")\n",
    "print(f\"Completion Tokens: {cb.completion_tokens}\")\n",
    "print(f\"Total Cost (USD): ${cb.total_cost}\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}