You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/docs/docs/how_to/example_selectors.ipynb

278 lines
7.8 KiB
Plaintext

{
"cells": [
{
"cell_type": "raw",
"id": "af408f61",
"metadata": {},
"source": [
"---\n",
"sidebar_position: 1\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "1a65e4c9",
"metadata": {},
"source": [
"# How to use example selectors\n",
"\n",
"If you have a large number of examples, you may need to select which ones to include in the prompt. The Example Selector is the class responsible for doing so.\n",
"\n",
"The base interface is defined as below:\n",
"\n",
"```python\n",
"class BaseExampleSelector(ABC):\n",
" \"\"\"Interface for selecting examples to include in prompts.\"\"\"\n",
"\n",
" @abstractmethod\n",
" def select_examples(self, input_variables: Dict[str, str]) -> List[dict]:\n",
" \"\"\"Select which examples to use based on the inputs.\"\"\"\n",
" \n",
" @abstractmethod\n",
" def add_example(self, example: Dict[str, str]) -> Any:\n",
" \"\"\"Add new example to store.\"\"\"\n",
"```\n",
"\n",
"The only method it needs to define is a ``select_examples`` method. This takes in the input variables and then returns a list of examples. It is up to each specific implementation as to how those examples are selected.\n",
"\n",
"LangChain has a few different types of example selectors. For an overview of all these types, see the below table.\n",
"\n",
"In this guide, we will walk through creating a custom example selector."
]
},
{
"cell_type": "markdown",
"id": "638e9039",
"metadata": {},
"source": [
"## Examples\n",
"\n",
"In order to use an example selector, we need to create a list of examples. These should generally be example inputs and outputs. For this demo purpose, let's imagine we are selecting examples of how to translate English to Italian."
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "48658d53",
"metadata": {},
"outputs": [],
"source": [
"examples = [\n",
" {\"input\": \"hi\", \"output\": \"ciao\"},\n",
" {\"input\": \"bye\", \"output\": \"arrivaderci\"},\n",
" {\"input\": \"soccer\", \"output\": \"calcio\"},\n",
"]"
]
},
{
"cell_type": "markdown",
"id": "c2830b49",
"metadata": {},
"source": [
"## Custom Example Selector\n",
"\n",
"Let's write an example selector that chooses what example to pick based on the length of the word."
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "56b740a1",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.example_selectors.base import BaseExampleSelector\n",
"\n",
"\n",
"class CustomExampleSelector(BaseExampleSelector):\n",
" def __init__(self, examples):\n",
" self.examples = examples\n",
"\n",
" def add_example(self, example):\n",
" self.examples.append(example)\n",
"\n",
" def select_examples(self, input_variables):\n",
" # This assumes knowledge that part of the input will be a 'text' key\n",
" new_word = input_variables[\"input\"]\n",
" new_word_length = len(new_word)\n",
"\n",
" # Initialize variables to store the best match and its length difference\n",
" best_match = None\n",
" smallest_diff = float(\"inf\")\n",
"\n",
" # Iterate through each example\n",
" for example in self.examples:\n",
" # Calculate the length difference with the first word of the example\n",
" current_diff = abs(len(example[\"input\"]) - new_word_length)\n",
"\n",
" # Update the best match if the current one is closer in length\n",
" if current_diff < smallest_diff:\n",
" smallest_diff = current_diff\n",
" best_match = example\n",
"\n",
" return [best_match]"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "ce928187",
"metadata": {},
"outputs": [],
"source": [
"example_selector = CustomExampleSelector(examples)"
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "37ef3149",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'input': 'bye', 'output': 'arrivaderci'}]"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"example_selector.select_examples({\"input\": \"okay\"})"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "c5ad9f35",
"metadata": {},
"outputs": [],
"source": [
"example_selector.add_example({\"input\": \"hand\", \"output\": \"mano\"})"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "e4127fe0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'input': 'hand', 'output': 'mano'}]"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"example_selector.select_examples({\"input\": \"okay\"})"
]
},
{
"cell_type": "markdown",
"id": "786c920c",
"metadata": {},
"source": [
"## Use in a Prompt\n",
"\n",
"We can now use this example selector in a prompt"
]
},
{
"cell_type": "code",
"execution_count": 42,
"id": "619090e2",
"metadata": {},
"outputs": [],
"source": [
"from langchain_core.prompts.few_shot import FewShotPromptTemplate\n",
"from langchain_core.prompts.prompt import PromptTemplate\n",
"\n",
"example_prompt = PromptTemplate.from_template(\"Input: {input} -> Output: {output}\")"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "5934c415",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Translate the following words from English to Italain:\n",
"\n",
"Input: hand -> Output: mano\n",
"\n",
"Input: word -> Output:\n"
]
}
],
"source": [
"prompt = FewShotPromptTemplate(\n",
" example_selector=example_selector,\n",
" example_prompt=example_prompt,\n",
" suffix=\"Input: {input} -> Output:\",\n",
" prefix=\"Translate the following words from English to Italain:\",\n",
" input_variables=[\"input\"],\n",
")\n",
"\n",
"print(prompt.format(input=\"word\"))"
]
},
{
"cell_type": "markdown",
"id": "e767f69d",
"metadata": {},
"source": [
"## Example Selector Types\n",
"\n",
"| Name | Description |\n",
"|------------|---------------------------------------------------------------------------------------------|\n",
"| Similarity | Uses semantic similarity between inputs and examples to decide which examples to choose. |\n",
"| MMR | Uses Max Marginal Relevance between inputs and examples to decide which examples to choose. |\n",
"| Length | Selects examples based on how many can fit within a certain length |\n",
"| Ngram | Uses ngram overlap between inputs and examples to decide which examples to choose. |"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8a6e0abe",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.1"
}
},
"nbformat": 4,
"nbformat_minor": 5
}