docs: update Elasticsearch strategy names (#21530)

Update documentation with the [new names for retrieval strategies](https://github.com/langchain-ai/langchain-elastic/pull/22) --------- Co-authored-by: Erick Friis <erick@langchain.dev>
2 weeks ago · e6b7a1769b
parent cdc8e2d0c2
commit e6b7a1769b
1 changed files with 73 additions and 52 deletions
--- a/docs/docs/integrations/vectorstores/elasticsearch.ipynb
+++ b/docs/docs/integrations/vectorstores/elasticsearch.ipynb
@ -161,7 +161,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 3,
   "id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
   "metadata": {
    "id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
@ -194,7 +194,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 4,
   "id": "aac9563e",
   "metadata": {
    "id": "aac9563e",
@ -208,7 +208,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 5,
   "id": "a3c3999a",
   "metadata": {
    "id": "a3c3999a",
@ -229,7 +229,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 6,
   "id": "12eb86d8",
   "metadata": {
    "id": "12eb86d8",
@ -271,7 +271,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 7,
   "id": "5d076412",
   "metadata": {},
   "outputs": [
@ -313,7 +313,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 8,
   "id": "b2a4bd1b",
   "metadata": {},
   "outputs": [
@ -345,7 +345,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 9,
   "id": "f3d294ff",
   "metadata": {},
   "outputs": [
@ -375,7 +375,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 59,
+   "execution_count": 10,
   "id": "55b63a61",
   "metadata": {},
   "outputs": [
@ -405,7 +405,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 60,
+   "execution_count": 11,
   "id": "9b831b3d",
   "metadata": {},
   "outputs": [
@ -435,7 +435,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 12,
   "id": "fb1482e7",
   "metadata": {},
   "outputs": [],
@ -504,27 +504,29 @@
   "metadata": {},
   "source": [
    "# Retrieval Strategies\n",
-    "Elasticsearch has big advantages over other vector only databases from its ability to support a wide range of retrieval strategies.  In this notebook we will configure `ElasticsearchStore` to support some of the most common retrieval strategies. \n",
+    "Elasticsearch has big advantages over other vector only databases from its ability to support a wide range of retrieval strategies. In this notebook we will configure `ElasticsearchStore` to support some of the most common retrieval strategies. \n",
    "\n",
-    "By default, `ElasticsearchStore` uses the `ApproxRetrievalStrategy`.\n",
+    "By default, `ElasticsearchStore` uses the `DenseVectorStrategy` (was called `ApproxRetrievalStrategy` prior to version 0.2.0).\n",
    "\n",
-    "## ApproxRetrievalStrategy\n",
-    "This will return the top `k` most similar vectors to the query vector.  The `k` parameter is set when the `ElasticsearchStore` is initialized.  The default value is `10`."
+    "## DenseVectorStrategy\n",
+    "This will return the top `k` most similar vectors to the query vector.  The `k` parameter is set when the `ElasticsearchStore` is initialized. The default value is `10`."
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 13,
   "id": "999b5ef5",
   "metadata": {},
   "outputs": [],
   "source": [
+    "from langchain_elasticsearch import DenseVectorStrategy\n",
+    "\n",
    "db = ElasticsearchStore.from_documents(\n",
    "    docs,\n",
    "    embeddings,\n",
    "    es_url=\"http://localhost:9200\",\n",
    "    index_name=\"test\",\n",
-    "    strategy=ElasticsearchStore.ApproxRetrievalStrategy(),\n",
+    "    strategy=DenseVectorStrategy(),\n",
    ")\n",
    "\n",
    "docs = db.similarity_search(\n",
@ -537,12 +539,12 @@
   "id": "9b651be5",
   "metadata": {},
   "source": [
-    "### Example: Approx with hybrid\n",
+    "### Example: Hybrid retrieval with dense vector and keyword search\n",
    "This example will show how to configure `ElasticsearchStore` to perform a hybrid retrieval, using a combination of approximate semantic search and keyword based search. \n",
    "\n",
    "We use RRF to balance the two scores from different retrieval methods.\n",
    "\n",
-    "To enable hybrid retrieval, we need to set `hybrid=True` in `ElasticsearchStore` `ApproxRetrievalStrategy` constructor.\n",
+    "To enable hybrid retrieval, we need to set `hybrid=True` in the `DenseVectorStrategy` constructor.\n",
    "\n",
    "```python\n",
    "\n",
@ -551,9 +553,7 @@
    "    embeddings, \n",
    "    es_url=\"http://localhost:9200\", \n",
    "    index_name=\"test\",\n",
-    "    strategy=ElasticsearchStore.ApproxRetrievalStrategy(\n",
-    "        hybrid=True,\n",
-    "    )\n",
+    "    strategy=DenseVectorStrategy(hybrid=True)\n",
    ")\n",
    "```\n",
    "\n",
@ -582,22 +582,22 @@
    "}\n",
    "```\n",
    "\n",
-    "### Example: Approx with Embedding Model in Elasticsearch\n",
-    "This example will show how to configure `ElasticsearchStore` to use the embedding model deployed in Elasticsearch for approximate retrieval. \n",
+    "### Example: Dense vector search with Embedding Model in Elasticsearch\n",
+    "This example will show how to configure `ElasticsearchStore` to use the embedding model deployed in Elasticsearch for dense vector retrieval.\n",
    "\n",
-    "To use this, specify the model_id in `ElasticsearchStore` `ApproxRetrievalStrategy` constructor via the `query_model_id` argument.\n",
+    "To use this, specify the model_id in `DenseVectorStrategy` constructor via the `query_model_id` argument.\n",
    "\n",
    "**NOTE** This requires the model to be deployed and running in Elasticsearch ml node. See [notebook example](https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/integrations/hugging-face/loading-model-from-hugging-face.ipynb) on how to deploy the model with eland.\n"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 14,
   "id": "0a0c85e7",
   "metadata": {},
   "outputs": [],
   "source": [
-    "APPROX_SELF_DEPLOYED_INDEX_NAME = \"test-approx-self-deployed\"\n",
+    "DENSE_SELF_DEPLOYED_INDEX_NAME = \"test-dense-self-deployed\"\n",
    "\n",
    "# Note: This does not have an embedding function specified\n",
    "# Instead, we will use the embedding model deployed in Elasticsearch\n",
@ -605,12 +605,10 @@
    "    es_cloud_id=\"<your cloud id>\",\n",
    "    es_user=\"elastic\",\n",
    "    es_password=\"<your password>\",\n",
-    "    index_name=APPROX_SELF_DEPLOYED_INDEX_NAME,\n",
+    "    index_name=DENSE_SELF_DEPLOYED_INDEX_NAME,\n",
    "    query_field=\"text_field\",\n",
    "    vector_query_field=\"vector_query_field.predicted_value\",\n",
-    "    strategy=ElasticsearchStore.ApproxRetrievalStrategy(\n",
-    "        query_model_id=\"sentence-transformers__all-minilm-l6-v2\"\n",
-    "    ),\n",
+    "    strategy=DenseVectorStrategy(model_id=\"sentence-transformers__all-minilm-l6-v2\"),\n",
    ")\n",
    "\n",
    "# Setup a Ingest Pipeline to perform the embedding\n",
@ -631,7 +629,7 @@
    "# creating a new index with the pipeline,\n",
    "# not relying on langchain to create the index\n",
    "db.client.indices.create(\n",
-    "    index=APPROX_SELF_DEPLOYED_INDEX_NAME,\n",
+    "    index=DENSE_SELF_DEPLOYED_INDEX_NAME,\n",
    "    mappings={\n",
    "        \"properties\": {\n",
    "            \"text_field\": {\"type\": \"text\"},\n",
@ -655,12 +653,10 @@
    "    es_cloud_id=\"<cloud id>\",\n",
    "    es_user=\"elastic\",\n",
    "    es_password=\"<cloud password>\",\n",
-    "    index_name=APPROX_SELF_DEPLOYED_INDEX_NAME,\n",
+    "    index_name=DENSE_SELF_DEPLOYED_INDEX_NAME,\n",
    "    query_field=\"text_field\",\n",
    "    vector_query_field=\"vector_query_field.predicted_value\",\n",
-    "    strategy=ElasticsearchStore.ApproxRetrievalStrategy(\n",
-    "        query_model_id=\"sentence-transformers__all-minilm-l6-v2\"\n",
-    "    ),\n",
+    "    strategy=DenseVectorStrategy(model_id=\"sentence-transformers__all-minilm-l6-v2\"),\n",
    ")\n",
    "\n",
    "# Perform search\n",
@ -672,12 +668,12 @@
   "id": "53959de6",
   "metadata": {},
   "source": [
-    "## SparseVectorRetrievalStrategy (ELSER)\n",
+    "## SparseVectorStrategy (ELSER)\n",
    "This strategy uses Elasticsearch's sparse vector retrieval to retrieve the top-k results. We only support our own \"ELSER\" embedding model for now.\n",
    "\n",
    "**NOTE** This requires the ELSER model to be deployed and running in Elasticsearch ml node. \n",
    "\n",
-    "To use this, specify `SparseVectorRetrievalStrategy` in `ElasticsearchStore` constructor."
+    "To use this, specify `SparseVectorStrategy` (was called `SparseVectorRetrievalStrategy` prior to version 0.2.0) in the `ElasticsearchStore` constructor. You will need to provide a model ID."
   ]
  },
  {
@ -695,15 +691,17 @@
    }
   ],
   "source": [
+    "from langchain_elasticsearch import SparseVectorStrategy\n",
+    "\n",
    "# Note that this example doesn't have an embedding function. This is because we infer the tokens at index time and at query time within Elasticsearch.\n",
    "# This requires the ELSER model to be loaded and running in Elasticsearch.\n",
    "db = ElasticsearchStore.from_documents(\n",
    "    docs,\n",
-    "    es_cloud_id=\"My_deployment:dXMtY2VudHJhbDEuZ2NwLmNsb3VkLmVzLmlvOjQ0MyQ2OGJhMjhmNDc1M2Y0MWVjYTk2NzI2ZWNkMmE5YzRkNyQ3NWI4ODRjNWQ2OTU0MTYzODFjOTkxNmQ1YzYxMGI1Mw==\",\n",
+    "    es_cloud_id=\"<cloud id>\",\n",
    "    es_user=\"elastic\",\n",
-    "    es_password=\"GgUPiWKwEzgHIYdHdgPk1Lwi\",\n",
+    "    es_password=\"<cloud password>\",\n",
    "    index_name=\"test-elser\",\n",
-    "    strategy=ElasticsearchStore.SparseVectorRetrievalStrategy(),\n",
+    "    strategy=SparseVectorStrategy(model_id=\".elser_model_2\"),\n",
    ")\n",
    "\n",
    "db.client.indices.refresh(index=\"test-elser\")\n",
@ -719,19 +717,42 @@
   "id": "edf3a093",
   "metadata": {},
   "source": [
-    "## ExactRetrievalStrategy\n",
-    "This strategy uses Elasticsearch's exact retrieval (also known as brute force) to retrieve the top-k results.\n",
+    "## DenseVectorScriptScoreStrategy\n",
+    "This strategy uses Elasticsearch's script score query to perform exact vector retrieval (also known as brute force) to retrieve the top-k results. (This strategy was called `ExactRetrievalStrategy` prior to version 0.2.0.)\n",
    "\n",
-    "To use this, specify `ExactRetrievalStrategy` in `ElasticsearchStore` constructor.\n",
+    "To use this, specify `DenseVectorScriptScoreStrategy` in `ElasticsearchStore` constructor.\n",
    "\n",
    "```python\n",
+    "from langchain_elasticsearch import SparseVectorStrategy\n",
    "\n",
    "db = ElasticsearchStore.from_documents(\n",
    "    docs, \n",
    "    embeddings, \n",
    "    es_url=\"http://localhost:9200\", \n",
    "    index_name=\"test\",\n",
-    "    strategy=ElasticsearchStore.ExactRetrievalStrategy()\n",
+    "    strategy=DenseVectorScriptScoreStrategy(),\n",
+    ")\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11b51c47",
+   "metadata": {},
+   "source": [
+    "## BM25Strategy\n",
+    "Finally, you can use full-text keyword search.\n",
+    "\n",
+    "To use this, specify `BM25Strategy` in `ElasticsearchStore` constructor.\n",
+    "\n",
+    "```python\n",
+    "from langchain_elasticsearch import BM25Strategy\n",
+    "\n",
+    "db = ElasticsearchStore.from_documents(\n",
+    "    docs, \n",
+    "    es_url=\"http://localhost:9200\", \n",
+    "    index_name=\"test\",\n",
+    "    strategy=BM25Strategy(),\n",
    ")\n",
    "```"
   ]
@ -924,9 +945,9 @@
    "\n",
    "## What's new?\n",
    "\n",
-    "The new implementation is now one class called `ElasticsearchStore` which can be used for approx, exact, and ELSER search retrieval, via strategies.\n",
+    "The new implementation is now one class called `ElasticsearchStore` which can be used for approximate dense vector, exact dense vector, sparse vector (ELSER), BM25 retrieval and hybrid retrieval, via strategies.\n",
    "\n",
-    "## Im using ElasticKNNSearch\n",
+    "## I am using ElasticKNNSearch\n",
    "\n",
    "Old implementation:\n",
    "\n",
@ -946,21 +967,21 @@
    "\n",
    "```python\n",
    "\n",
-    "from langchain_elasticsearch import ElasticsearchStore\n",
+    "from langchain_elasticsearch import ElasticsearchStore, DenseVectorStrategy\n",
    "\n",
    "db = ElasticsearchStore(\n",
    "  es_url=\"http://localhost:9200\",\n",
    "  index_name=\"test_index\",\n",
    "  embedding=embedding,\n",
    "  # if you use the model_id\n",
-    "  # strategy=ElasticsearchStore.ApproxRetrievalStrategy( query_model_id=\"test_model\" )\n",
+    "  # strategy=DenseVectorStrategy(model_id=\"test_model\")\n",
    "  # if you use hybrid search\n",
-    "  # strategy=ElasticsearchStore.ApproxRetrievalStrategy( hybrid=True )\n",
+    "  # strategy=DenseVectorStrategy(hybrid=True)\n",
    ")\n",
    "\n",
    "```\n",
    "\n",
-    "## Im using ElasticVectorSearch\n",
+    "## I am using ElasticVectorSearch\n",
    "\n",
    "Old implementation:\n",
    "\n",
@ -980,13 +1001,13 @@
    "\n",
    "```python\n",
    "\n",
-    "from langchain_elasticsearch import ElasticsearchStore\n",
+    "from langchain_elasticsearch import ElasticsearchStore, DenseVectorScriptScoreStrategy\n",
    "\n",
    "db = ElasticsearchStore(\n",
    "  es_url=\"http://localhost:9200\",\n",
    "  index_name=\"test_index\",\n",
    "  embedding=embedding,\n",
-    "  strategy=ElasticsearchStore.ExactRetrievalStrategy()\n",
+    "  strategy=DenseVectorScriptScoreStrategy()\n",
    ")\n",
    "\n",
    "```"