docs: update Elasticsearch strategy names (#21530)

Update documentation with the [new names for retrieval
strategies](https://github.com/langchain-ai/langchain-elastic/pull/22)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
pull/21852/head
Max Jakob 2 weeks ago committed by GitHub
parent cdc8e2d0c2
commit e6b7a1769b
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -161,7 +161,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 3,
"id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
"metadata": {
"id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
@ -194,7 +194,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 4,
"id": "aac9563e",
"metadata": {
"id": "aac9563e",
@ -208,7 +208,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 5,
"id": "a3c3999a",
"metadata": {
"id": "a3c3999a",
@ -229,7 +229,7 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 6,
"id": "12eb86d8",
"metadata": {
"id": "12eb86d8",
@ -271,7 +271,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 7,
"id": "5d076412",
"metadata": {},
"outputs": [
@ -313,7 +313,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 8,
"id": "b2a4bd1b",
"metadata": {},
"outputs": [
@ -345,7 +345,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 9,
"id": "f3d294ff",
"metadata": {},
"outputs": [
@ -375,7 +375,7 @@
},
{
"cell_type": "code",
"execution_count": 59,
"execution_count": 10,
"id": "55b63a61",
"metadata": {},
"outputs": [
@ -405,7 +405,7 @@
},
{
"cell_type": "code",
"execution_count": 60,
"execution_count": 11,
"id": "9b831b3d",
"metadata": {},
"outputs": [
@ -435,7 +435,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 12,
"id": "fb1482e7",
"metadata": {},
"outputs": [],
@ -504,27 +504,29 @@
"metadata": {},
"source": [
"# Retrieval Strategies\n",
"Elasticsearch has big advantages over other vector only databases from its ability to support a wide range of retrieval strategies. In this notebook we will configure `ElasticsearchStore` to support some of the most common retrieval strategies. \n",
"Elasticsearch has big advantages over other vector only databases from its ability to support a wide range of retrieval strategies. In this notebook we will configure `ElasticsearchStore` to support some of the most common retrieval strategies. \n",
"\n",
"By default, `ElasticsearchStore` uses the `ApproxRetrievalStrategy`.\n",
"By default, `ElasticsearchStore` uses the `DenseVectorStrategy` (was called `ApproxRetrievalStrategy` prior to version 0.2.0).\n",
"\n",
"## ApproxRetrievalStrategy\n",
"This will return the top `k` most similar vectors to the query vector. The `k` parameter is set when the `ElasticsearchStore` is initialized. The default value is `10`."
"## DenseVectorStrategy\n",
"This will return the top `k` most similar vectors to the query vector. The `k` parameter is set when the `ElasticsearchStore` is initialized. The default value is `10`."
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 13,
"id": "999b5ef5",
"metadata": {},
"outputs": [],
"source": [
"from langchain_elasticsearch import DenseVectorStrategy\n",
"\n",
"db = ElasticsearchStore.from_documents(\n",
" docs,\n",
" embeddings,\n",
" es_url=\"http://localhost:9200\",\n",
" index_name=\"test\",\n",
" strategy=ElasticsearchStore.ApproxRetrievalStrategy(),\n",
" strategy=DenseVectorStrategy(),\n",
")\n",
"\n",
"docs = db.similarity_search(\n",
@ -537,12 +539,12 @@
"id": "9b651be5",
"metadata": {},
"source": [
"### Example: Approx with hybrid\n",
"### Example: Hybrid retrieval with dense vector and keyword search\n",
"This example will show how to configure `ElasticsearchStore` to perform a hybrid retrieval, using a combination of approximate semantic search and keyword based search. \n",
"\n",
"We use RRF to balance the two scores from different retrieval methods.\n",
"\n",
"To enable hybrid retrieval, we need to set `hybrid=True` in `ElasticsearchStore` `ApproxRetrievalStrategy` constructor.\n",
"To enable hybrid retrieval, we need to set `hybrid=True` in the `DenseVectorStrategy` constructor.\n",
"\n",
"```python\n",
"\n",
@ -551,9 +553,7 @@
" embeddings, \n",
" es_url=\"http://localhost:9200\", \n",
" index_name=\"test\",\n",
" strategy=ElasticsearchStore.ApproxRetrievalStrategy(\n",
" hybrid=True,\n",
" )\n",
" strategy=DenseVectorStrategy(hybrid=True)\n",
")\n",
"```\n",
"\n",
@ -582,22 +582,22 @@
"}\n",
"```\n",
"\n",
"### Example: Approx with Embedding Model in Elasticsearch\n",
"This example will show how to configure `ElasticsearchStore` to use the embedding model deployed in Elasticsearch for approximate retrieval. \n",
"### Example: Dense vector search with Embedding Model in Elasticsearch\n",
"This example will show how to configure `ElasticsearchStore` to use the embedding model deployed in Elasticsearch for dense vector retrieval.\n",
"\n",
"To use this, specify the model_id in `ElasticsearchStore` `ApproxRetrievalStrategy` constructor via the `query_model_id` argument.\n",
"To use this, specify the model_id in `DenseVectorStrategy` constructor via the `query_model_id` argument.\n",
"\n",
"**NOTE** This requires the model to be deployed and running in Elasticsearch ml node. See [notebook example](https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/integrations/hugging-face/loading-model-from-hugging-face.ipynb) on how to deploy the model with eland.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 14,
"id": "0a0c85e7",
"metadata": {},
"outputs": [],
"source": [
"APPROX_SELF_DEPLOYED_INDEX_NAME = \"test-approx-self-deployed\"\n",
"DENSE_SELF_DEPLOYED_INDEX_NAME = \"test-dense-self-deployed\"\n",
"\n",
"# Note: This does not have an embedding function specified\n",
"# Instead, we will use the embedding model deployed in Elasticsearch\n",
@ -605,12 +605,10 @@
" es_cloud_id=\"<your cloud id>\",\n",
" es_user=\"elastic\",\n",
" es_password=\"<your password>\",\n",
" index_name=APPROX_SELF_DEPLOYED_INDEX_NAME,\n",
" index_name=DENSE_SELF_DEPLOYED_INDEX_NAME,\n",
" query_field=\"text_field\",\n",
" vector_query_field=\"vector_query_field.predicted_value\",\n",
" strategy=ElasticsearchStore.ApproxRetrievalStrategy(\n",
" query_model_id=\"sentence-transformers__all-minilm-l6-v2\"\n",
" ),\n",
" strategy=DenseVectorStrategy(model_id=\"sentence-transformers__all-minilm-l6-v2\"),\n",
")\n",
"\n",
"# Setup a Ingest Pipeline to perform the embedding\n",
@ -631,7 +629,7 @@
"# creating a new index with the pipeline,\n",
"# not relying on langchain to create the index\n",
"db.client.indices.create(\n",
" index=APPROX_SELF_DEPLOYED_INDEX_NAME,\n",
" index=DENSE_SELF_DEPLOYED_INDEX_NAME,\n",
" mappings={\n",
" \"properties\": {\n",
" \"text_field\": {\"type\": \"text\"},\n",
@ -655,12 +653,10 @@
" es_cloud_id=\"<cloud id>\",\n",
" es_user=\"elastic\",\n",
" es_password=\"<cloud password>\",\n",
" index_name=APPROX_SELF_DEPLOYED_INDEX_NAME,\n",
" index_name=DENSE_SELF_DEPLOYED_INDEX_NAME,\n",
" query_field=\"text_field\",\n",
" vector_query_field=\"vector_query_field.predicted_value\",\n",
" strategy=ElasticsearchStore.ApproxRetrievalStrategy(\n",
" query_model_id=\"sentence-transformers__all-minilm-l6-v2\"\n",
" ),\n",
" strategy=DenseVectorStrategy(model_id=\"sentence-transformers__all-minilm-l6-v2\"),\n",
")\n",
"\n",
"# Perform search\n",
@ -672,12 +668,12 @@
"id": "53959de6",
"metadata": {},
"source": [
"## SparseVectorRetrievalStrategy (ELSER)\n",
"## SparseVectorStrategy (ELSER)\n",
"This strategy uses Elasticsearch's sparse vector retrieval to retrieve the top-k results. We only support our own \"ELSER\" embedding model for now.\n",
"\n",
"**NOTE** This requires the ELSER model to be deployed and running in Elasticsearch ml node. \n",
"\n",
"To use this, specify `SparseVectorRetrievalStrategy` in `ElasticsearchStore` constructor."
"To use this, specify `SparseVectorStrategy` (was called `SparseVectorRetrievalStrategy` prior to version 0.2.0) in the `ElasticsearchStore` constructor. You will need to provide a model ID."
]
},
{
@ -695,15 +691,17 @@
}
],
"source": [
"from langchain_elasticsearch import SparseVectorStrategy\n",
"\n",
"# Note that this example doesn't have an embedding function. This is because we infer the tokens at index time and at query time within Elasticsearch.\n",
"# This requires the ELSER model to be loaded and running in Elasticsearch.\n",
"db = ElasticsearchStore.from_documents(\n",
" docs,\n",
" es_cloud_id=\"My_deployment:dXMtY2VudHJhbDEuZ2NwLmNsb3VkLmVzLmlvOjQ0MyQ2OGJhMjhmNDc1M2Y0MWVjYTk2NzI2ZWNkMmE5YzRkNyQ3NWI4ODRjNWQ2OTU0MTYzODFjOTkxNmQ1YzYxMGI1Mw==\",\n",
" es_cloud_id=\"<cloud id>\",\n",
" es_user=\"elastic\",\n",
" es_password=\"GgUPiWKwEzgHIYdHdgPk1Lwi\",\n",
" es_password=\"<cloud password>\",\n",
" index_name=\"test-elser\",\n",
" strategy=ElasticsearchStore.SparseVectorRetrievalStrategy(),\n",
" strategy=SparseVectorStrategy(model_id=\".elser_model_2\"),\n",
")\n",
"\n",
"db.client.indices.refresh(index=\"test-elser\")\n",
@ -719,19 +717,42 @@
"id": "edf3a093",
"metadata": {},
"source": [
"## ExactRetrievalStrategy\n",
"This strategy uses Elasticsearch's exact retrieval (also known as brute force) to retrieve the top-k results.\n",
"## DenseVectorScriptScoreStrategy\n",
"This strategy uses Elasticsearch's script score query to perform exact vector retrieval (also known as brute force) to retrieve the top-k results. (This strategy was called `ExactRetrievalStrategy` prior to version 0.2.0.)\n",
"\n",
"To use this, specify `ExactRetrievalStrategy` in `ElasticsearchStore` constructor.\n",
"To use this, specify `DenseVectorScriptScoreStrategy` in `ElasticsearchStore` constructor.\n",
"\n",
"```python\n",
"from langchain_elasticsearch import SparseVectorStrategy\n",
"\n",
"db = ElasticsearchStore.from_documents(\n",
" docs, \n",
" embeddings, \n",
" es_url=\"http://localhost:9200\", \n",
" index_name=\"test\",\n",
" strategy=ElasticsearchStore.ExactRetrievalStrategy()\n",
" strategy=DenseVectorScriptScoreStrategy(),\n",
")\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "11b51c47",
"metadata": {},
"source": [
"## BM25Strategy\n",
"Finally, you can use full-text keyword search.\n",
"\n",
"To use this, specify `BM25Strategy` in `ElasticsearchStore` constructor.\n",
"\n",
"```python\n",
"from langchain_elasticsearch import BM25Strategy\n",
"\n",
"db = ElasticsearchStore.from_documents(\n",
" docs, \n",
" es_url=\"http://localhost:9200\", \n",
" index_name=\"test\",\n",
" strategy=BM25Strategy(),\n",
")\n",
"```"
]
@ -924,9 +945,9 @@
"\n",
"## What's new?\n",
"\n",
"The new implementation is now one class called `ElasticsearchStore` which can be used for approx, exact, and ELSER search retrieval, via strategies.\n",
"The new implementation is now one class called `ElasticsearchStore` which can be used for approximate dense vector, exact dense vector, sparse vector (ELSER), BM25 retrieval and hybrid retrieval, via strategies.\n",
"\n",
"## Im using ElasticKNNSearch\n",
"## I am using ElasticKNNSearch\n",
"\n",
"Old implementation:\n",
"\n",
@ -946,21 +967,21 @@
"\n",
"```python\n",
"\n",
"from langchain_elasticsearch import ElasticsearchStore\n",
"from langchain_elasticsearch import ElasticsearchStore, DenseVectorStrategy\n",
"\n",
"db = ElasticsearchStore(\n",
" es_url=\"http://localhost:9200\",\n",
" index_name=\"test_index\",\n",
" embedding=embedding,\n",
" # if you use the model_id\n",
" # strategy=ElasticsearchStore.ApproxRetrievalStrategy( query_model_id=\"test_model\" )\n",
" # strategy=DenseVectorStrategy(model_id=\"test_model\")\n",
" # if you use hybrid search\n",
" # strategy=ElasticsearchStore.ApproxRetrievalStrategy( hybrid=True )\n",
" # strategy=DenseVectorStrategy(hybrid=True)\n",
")\n",
"\n",
"```\n",
"\n",
"## Im using ElasticVectorSearch\n",
"## I am using ElasticVectorSearch\n",
"\n",
"Old implementation:\n",
"\n",
@ -980,13 +1001,13 @@
"\n",
"```python\n",
"\n",
"from langchain_elasticsearch import ElasticsearchStore\n",
"from langchain_elasticsearch import ElasticsearchStore, DenseVectorScriptScoreStrategy\n",
"\n",
"db = ElasticsearchStore(\n",
" es_url=\"http://localhost:9200\",\n",
" index_name=\"test_index\",\n",
" embedding=embedding,\n",
" strategy=ElasticsearchStore.ExactRetrievalStrategy()\n",
" strategy=DenseVectorScriptScoreStrategy()\n",
")\n",
"\n",
"```"

Loading…
Cancel
Save