pull/21287/head
Bagatur 2 weeks ago
commit d39e9f2542

@ -57,3 +57,4 @@ Notebook | Description
[two_agent_debate_tools.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/two_agent_debate_tools.ipynb) | Simulate multi-agent dialogues where the agents can utilize various tools.
[two_player_dnd.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/two_player_dnd.ipynb) | Simulate a two-player dungeons & dragons game, where a dialogue simulator class is used to coordinate the dialogue between the protagonist and the dungeon master.
[wikibase_agent.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/wikibase_agent.ipynb) | Create a simple wikibase agent that utilizes sparql generation, with testing done on http://wikidata.org.
[oracleai_demo.ipynb](https://github.com/langchain-ai/langchain/tree/master/cookbook/oracleai_demo.ipynb) | This guide outlines how to utilize Oracle AI Vector Search alongside Langchain for an end-to-end RAG pipeline, providing step-by-step examples. The process includes loading documents from various sources using OracleDocLoader, summarizing them either within or outside the database with OracleSummary, and generating embeddings similarly through OracleEmbeddings. It also covers chunking documents according to specific requirements using Advanced Oracle Capabilities from OracleTextSplitter, and finally, storing and indexing these documents in a Vector Store for querying with OracleVS.

@ -0,0 +1,872 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Oracle AI Vector Search with Document Processing\n",
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords.\n",
"One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system. This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
"\n",
"In addition, because Oracle has been building database technologies for so long, your vectors can benefit from all of Oracle Database's most powerful features, like the following:\n",
"\n",
" * Partitioning Support\n",
" * Real Application Clusters scalability\n",
" * Exadata smart scans\n",
" * Shard processing across geographically distributed databases\n",
" * Transactions\n",
" * Parallel SQL\n",
" * Disaster recovery\n",
" * Security\n",
" * Oracle Machine Learning\n",
" * Oracle Graph Database\n",
" * Oracle Spatial and Graph\n",
" * Oracle Blockchain\n",
" * JSON\n",
"\n",
"This guide demonstrates how Oracle AI Vector Search can be used with Langchain to serve an end-to-end RAG pipeline. This guide goes through examples of:\n",
"\n",
" * Loading the documents from various sources using OracleDocLoader\n",
" * Summarizing them within/outside the database using OracleSummary\n",
" * Generating embeddings for them within/outside the database using OracleEmbeddings\n",
" * Chunking them according to different requirements using Advanced Oracle Capabilities from OracleTextSplitter\n",
" * Storing and Indexing them in a Vector Store and querying them for queries in OracleVS"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prerequisites\n",
"\n",
"Please install Oracle Python Client driver to use Langchain with Oracle AI Vector Search. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# pip install oracledb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Demo User\n",
"First, create a demo user with all the required privileges. "
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Connection successful!\n",
"User setup done!\n"
]
}
],
"source": [
"import sys\n",
"\n",
"import oracledb\n",
"\n",
"# please update with your username, password, hostname and service_name\n",
"# please make sure this user has sufficient privileges to perform all below\n",
"username = \"\"\n",
"password = \"\"\n",
"dsn = \"\"\n",
"\n",
"try:\n",
" conn = oracledb.connect(user=username, password=password, dsn=dsn)\n",
" print(\"Connection successful!\")\n",
"\n",
" cursor = conn.cursor()\n",
" cursor.execute(\n",
" \"\"\"\n",
" begin\n",
" -- drop user\n",
" begin\n",
" execute immediate 'drop user testuser cascade';\n",
" exception\n",
" when others then\n",
" dbms_output.put_line('Error setting up user.');\n",
" end;\n",
" execute immediate 'create user testuser identified by testuser';\n",
" execute immediate 'grant connect, unlimited tablespace, create credential, create procedure, create any index to testuser';\n",
" execute immediate 'create or replace directory DEMO_PY_DIR as ''/scratch/hroy/view_storage/hroy_devstorage/demo/orachain''';\n",
" execute immediate 'grant read, write on directory DEMO_PY_DIR to public';\n",
" execute immediate 'grant create mining model to testuser';\n",
"\n",
" -- network access\n",
" begin\n",
" DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE(\n",
" host => '*',\n",
" ace => xs$ace_type(privilege_list => xs$name_list('connect'),\n",
" principal_name => 'testuser',\n",
" principal_type => xs_acl.ptype_db));\n",
" end;\n",
" end;\n",
" \"\"\"\n",
" )\n",
" print(\"User setup done!\")\n",
" cursor.close()\n",
" conn.close()\n",
"except Exception as e:\n",
" print(\"User setup failed!\")\n",
" cursor.close()\n",
" conn.close()\n",
" sys.exit(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Process Documents using Oracle AI\n",
"Let's think about a scenario that the users have some documents in Oracle Database or in a file system. They want to use the data for Oracle AI Vector Search using Langchain.\n",
"\n",
"For that, the users need to do some document preprocessing. The first step would be to read the documents, generate their summary(if needed) and then chunk/split them if needed. After that, they need to generate the embeddings for those chunks and store into Oracle AI Vector Store. Finally, the users will perform some semantic queries on those data. \n",
"\n",
"Oracle AI Vector Search Langchain library provides a range of document processing functionalities including document loading, splitting, generating summary and embeddings.\n",
"\n",
"In the following sections, we will go through how to use Oracle AI Langchain APIs to achieve each of these functionalities individually. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Connect to Demo User\n",
"The following sample code will show how to connect to Oracle Database. "
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Connection successful!\n"
]
}
],
"source": [
"import sys\n",
"\n",
"import oracledb\n",
"\n",
"# please update with your username, password, hostname and service_name\n",
"username = \"\"\n",
"password = \"\"\n",
"dsn = \"\"\n",
"\n",
"try:\n",
" conn = oracledb.connect(user=username, password=password, dsn=dsn)\n",
" print(\"Connection successful!\")\n",
"except Exception as e:\n",
" print(\"Connection failed!\")\n",
" sys.exit(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Populate a Demo Table\n",
"Create a demo table and insert some sample documents."
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Table created and populated.\n"
]
}
],
"source": [
"try:\n",
" cursor = conn.cursor()\n",
"\n",
" drop_table_sql = \"\"\"drop table demo_tab\"\"\"\n",
" cursor.execute(drop_table_sql)\n",
"\n",
" create_table_sql = \"\"\"create table demo_tab (id number, data clob)\"\"\"\n",
" cursor.execute(create_table_sql)\n",
"\n",
" insert_row_sql = \"\"\"insert into demo_tab values (:1, :2)\"\"\"\n",
" rows_to_insert = [\n",
" (\n",
" 1,\n",
" \"If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.\",\n",
" ),\n",
" (\n",
" 2,\n",
" \"A tablespace can be online (accessible) or offline (not accessible) whenever the database is open.\\nA tablespace is usually online so that its data is available to users. The SYSTEM tablespace and temporary tablespaces cannot be taken offline.\",\n",
" ),\n",
" (\n",
" 3,\n",
" \"The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table.\\nSometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.\",\n",
" ),\n",
" ]\n",
" cursor.executemany(insert_row_sql, rows_to_insert)\n",
"\n",
" conn.commit()\n",
"\n",
" print(\"Table created and populated.\")\n",
" cursor.close()\n",
"except Exception as e:\n",
" print(\"Table creation failed.\")\n",
" cursor.close()\n",
" conn.close()\n",
" sys.exit(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"Now that we have a demo user and a demo table with some data, we just need to do one more setup. For embedding and summary, we have a few provider options that the users can choose from such as database, 3rd party providers like ocigenai, huggingface, openai, etc. If the users choose to use 3rd party provider, they need to create a credential with corresponding authentication information. On the other hand, if the users choose to use 'database' as provider, they need to load an onnx model to Oracle Database for embeddings; however, for summary, they don't need to do anything."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load ONNX Model\n",
"\n",
"To generate embeddings, Oracle provides a few provider options for users to choose from. The users can choose 'database' provider or some 3rd party providers like OCIGENAI, HuggingFace, etc.\n",
"\n",
"***Note*** If the users choose database option, they need to load an ONNX model to Oracle Database. The users do not need to load an ONNX model to Oracle Database if they choose to use 3rd party provider to generate embeddings.\n",
"\n",
"One of the core benefits of using an ONNX model is that the users do not need to transfer their data to 3rd party to generate embeddings. And also, since it does not involve any network or REST API calls, it may provide better performance.\n",
"\n",
"Here is the sample code to load an ONNX model to Oracle Database:"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"ONNX model loaded.\n"
]
}
],
"source": [
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
"\n",
"# please update with your related information\n",
"# make sure that you have onnx file in the system\n",
"onnx_dir = \"DEMO_PY_DIR\"\n",
"onnx_file = \"tinybert.onnx\"\n",
"model_name = \"demo_model\"\n",
"\n",
"try:\n",
" OracleEmbeddings.load_onnx_model(conn, onnx_dir, onnx_file, model_name)\n",
" print(\"ONNX model loaded.\")\n",
"except Exception as e:\n",
" print(\"ONNX model loading failed!\")\n",
" sys.exit(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Credential\n",
"\n",
"On the other hand, if the users choose to use 3rd party provider to generate embeddings and summary, they need to create credential to access 3rd party provider's end points.\n",
"\n",
"***Note:*** The users do not need to create any credential if they choose to use 'database' provider to generate embeddings and summary. Should the users choose to 3rd party provider, they need to create credential for the 3rd party provider they want to use. \n",
"\n",
"Here is a sample example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"try:\n",
" cursor = conn.cursor()\n",
" cursor.execute(\n",
" \"\"\"\n",
" declare\n",
" jo json_object_t;\n",
" begin\n",
" -- HuggingFace\n",
" dbms_vector_chain.drop_credential(credential_name => 'HF_CRED');\n",
" jo := json_object_t();\n",
" jo.put('access_token', '<access_token>');\n",
" dbms_vector_chain.create_credential(\n",
" credential_name => 'HF_CRED',\n",
" params => json(jo.to_string));\n",
"\n",
" -- OCIGENAI\n",
" dbms_vector_chain.drop_credential(credential_name => 'OCI_CRED');\n",
" jo := json_object_t();\n",
" jo.put('user_ocid','<user_ocid>');\n",
" jo.put('tenancy_ocid','<tenancy_ocid>');\n",
" jo.put('compartment_ocid','<compartment_ocid>');\n",
" jo.put('private_key','<private_key>');\n",
" jo.put('fingerprint','<fingerprint>');\n",
" dbms_vector_chain.create_credential(\n",
" credential_name => 'OCI_CRED',\n",
" params => json(jo.to_string));\n",
" end;\n",
" \"\"\"\n",
" )\n",
" cursor.close()\n",
" print(\"Credentials created.\")\n",
"except Exception as ex:\n",
" cursor.close()\n",
" raise"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load Documents\n",
"The users can load the documents from Oracle Database or a file system or both. They just need to set the loader parameters accordingly. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters.\n",
"\n",
"The main benefit of using OracleDocLoader is that it can handle 150+ different file formats. You don't need to use different types of loader for different file formats. Here is the list formats that we support: [Oracle Text Supported Document Formats](https://docs.oracle.com/en/database/oracle/oracle-database/23/ccref/oracle-text-supported-document-formats.html)\n",
"\n",
"The following sample code will show how to do that:"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of docs loaded: 3\n"
]
}
],
"source": [
"from langchain_community.document_loaders.oracleai import OracleDocLoader\n",
"from langchain_core.documents import Document\n",
"\n",
"# loading from Oracle Database table\n",
"# make sure you have the table with this specification\n",
"loader_params = {}\n",
"loader_params = {\n",
" \"owner\": \"testuser\",\n",
" \"tablename\": \"demo_tab\",\n",
" \"colname\": \"data\",\n",
"}\n",
"\n",
"\"\"\" load the docs \"\"\"\n",
"loader = OracleDocLoader(conn=conn, params=loader_params)\n",
"docs = loader.load()\n",
"\n",
"\"\"\" verify \"\"\"\n",
"print(f\"Number of docs loaded: {len(docs)}\")\n",
"# print(f\"Document-0: {docs[0].page_content}\") # content"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Generate Summary\n",
"Now that the user loaded the documents, they may want to generate a summary for each document. The Oracle AI Vector Search Langchain library provides an API to do that. There are a few summary generation provider options including Database, OCIGENAI, HuggingFace and so on. The users can choose their preferred provider to generate a summary. Like before, they just need to set the summary parameters accordingly. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***Note:*** The users may need to set proxy if they want to use some 3rd party summary generation providers other than Oracle's in-house and default provider: 'database'. If you don't have proxy, please remove the proxy parameter when you instantiate the OracleSummary."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"# proxy to be used when we instantiate summary and embedder object\n",
"proxy = \"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following sample code will show how to generate summary:"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of Summaries: 3\n"
]
}
],
"source": [
"from langchain_community.utilities.oracleai import OracleSummary\n",
"from langchain_core.documents import Document\n",
"\n",
"# using 'database' provider\n",
"summary_params = {\n",
" \"provider\": \"database\",\n",
" \"glevel\": \"S\",\n",
" \"numParagraphs\": 1,\n",
" \"language\": \"english\",\n",
"}\n",
"\n",
"# get the summary instance\n",
"# Remove proxy if not required\n",
"summ = OracleSummary(conn=conn, params=summary_params, proxy=proxy)\n",
"\n",
"list_summary = []\n",
"for doc in docs:\n",
" summary = summ.get_summary(doc.page_content)\n",
" list_summary.append(summary)\n",
"\n",
"\"\"\" verify \"\"\"\n",
"print(f\"Number of Summaries: {len(list_summary)}\")\n",
"# print(f\"Summary-0: {list_summary[0]}\") #content"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Split Documents\n",
"The documents can be in different sizes: small, medium, large, or very large. The users like to split/chunk their documents into smaller pieces to generate embeddings. There are lots of different splitting customizations the users can do. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters.\n",
"\n",
"The following sample code will show how to do that:"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of Chunks: 3\n"
]
}
],
"source": [
"from langchain_community.document_loaders.oracleai import OracleTextSplitter\n",
"from langchain_core.documents import Document\n",
"\n",
"# split by default parameters\n",
"splitter_params = {\"normalize\": \"all\"}\n",
"\n",
"\"\"\" get the splitter instance \"\"\"\n",
"splitter = OracleTextSplitter(conn=conn, params=splitter_params)\n",
"\n",
"list_chunks = []\n",
"for doc in docs:\n",
" chunks = splitter.split_text(doc.page_content)\n",
" list_chunks.extend(chunks)\n",
"\n",
"\"\"\" verify \"\"\"\n",
"print(f\"Number of Chunks: {len(list_chunks)}\")\n",
"# print(f\"Chunk-0: {list_chunks[0]}\") # content"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Generate Embeddings\n",
"Now that the documents are chunked as per requirements, the users may want to generate embeddings for these chunks. Oracle AI Vector Search provides a number of ways to generate embeddings. The users can load an ONNX embedding model to Oracle Database and use it to generate embeddings or use some 3rd party API's end points to generate embeddings. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***Note:*** The users may need to set proxy if they want to use some 3rd party embedding generation providers other than 'database' provider (aka using ONNX model)."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"# proxy to be used when we instantiate summary and embedder object\n",
"proxy = \"\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following sample code will show how to generate embeddings:"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of embeddings: 3\n"
]
}
],
"source": [
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
"from langchain_core.documents import Document\n",
"\n",
"# using ONNX model loaded to Oracle Database\n",
"embedder_params = {\"provider\": \"database\", \"model\": \"demo_model\"}\n",
"\n",
"# get the embedding instance\n",
"# Remove proxy if not required\n",
"embedder = OracleEmbeddings(conn=conn, params=embedder_params, proxy=proxy)\n",
"\n",
"embeddings = []\n",
"for doc in docs:\n",
" chunks = splitter.split_text(doc.page_content)\n",
" for chunk in chunks:\n",
" embed = embedder.embed_query(chunk)\n",
" embeddings.append(embed)\n",
"\n",
"\"\"\" verify \"\"\"\n",
"print(f\"Number of embeddings: {len(embeddings)}\")\n",
"# print(f\"Embedding-0: {embeddings[0]}\") # content"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Oracle AI Vector Store\n",
"Now that you know how to use Oracle AI Langchain library APIs individually to process the documents, let us show how to integrate with Oracle AI Vector Store to facilitate the semantic searches."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, let's import all the dependencies."
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"import oracledb\n",
"from langchain_community.document_loaders.oracleai import (\n",
" OracleDocLoader,\n",
" OracleTextSplitter,\n",
")\n",
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
"from langchain_community.utilities.oracleai import OracleSummary\n",
"from langchain_community.vectorstores import oraclevs\n",
"from langchain_community.vectorstores.oraclevs import OracleVS\n",
"from langchain_community.vectorstores.utils import DistanceStrategy\n",
"from langchain_core.documents import Document"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, let's combine all document processing stages together. Here is the sample code below:"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Connection successful!\n",
"ONNX model loaded.\n",
"Number of total chunks with metadata: 3\n"
]
}
],
"source": [
"\"\"\"\n",
"In this sample example, we will use 'database' provider for both summary and embeddings.\n",
"So, we don't need to do the followings:\n",
" - set proxy for 3rd party providers\n",
" - create credential for 3rd party providers\n",
"\n",
"If you choose to use 3rd party provider, \n",
"please follow the necessary steps for proxy and credential.\n",
"\"\"\"\n",
"\n",
"# oracle connection\n",
"# please update with your username, password, hostname, and service_name\n",
"username = \"\"\n",
"password = \"\"\n",
"dsn = \"\"\n",
"\n",
"try:\n",
" conn = oracledb.connect(user=username, password=password, dsn=dsn)\n",
" print(\"Connection successful!\")\n",
"except Exception as e:\n",
" print(\"Connection failed!\")\n",
" sys.exit(1)\n",
"\n",
"\n",
"# load onnx model\n",
"# please update with your related information\n",
"onnx_dir = \"DEMO_PY_DIR\"\n",
"onnx_file = \"tinybert.onnx\"\n",
"model_name = \"demo_model\"\n",
"try:\n",
" OracleEmbeddings.load_onnx_model(conn, onnx_dir, onnx_file, model_name)\n",
" print(\"ONNX model loaded.\")\n",
"except Exception as e:\n",
" print(\"ONNX model loading failed!\")\n",
" sys.exit(1)\n",
"\n",
"\n",
"# params\n",
"# please update necessary fields with related information\n",
"loader_params = {\n",
" \"owner\": \"testuser\",\n",
" \"tablename\": \"demo_tab\",\n",
" \"colname\": \"data\",\n",
"}\n",
"summary_params = {\n",
" \"provider\": \"database\",\n",
" \"glevel\": \"S\",\n",
" \"numParagraphs\": 1,\n",
" \"language\": \"english\",\n",
"}\n",
"splitter_params = {\"normalize\": \"all\"}\n",
"embedder_params = {\"provider\": \"database\", \"model\": \"demo_model\"}\n",
"\n",
"# instantiate loader, summary, splitter, and embedder\n",
"loader = OracleDocLoader(conn=conn, params=loader_params)\n",
"summary = OracleSummary(conn=conn, params=summary_params)\n",
"splitter = OracleTextSplitter(conn=conn, params=splitter_params)\n",
"embedder = OracleEmbeddings(conn=conn, params=embedder_params)\n",
"\n",
"# process the documents\n",
"chunks_with_mdata = []\n",
"for id, doc in enumerate(docs, start=1):\n",
" summ = summary.get_summary(doc.page_content)\n",
" chunks = splitter.split_text(doc.page_content)\n",
" for ic, chunk in enumerate(chunks, start=1):\n",
" chunk_metadata = doc.metadata.copy()\n",
" chunk_metadata[\"id\"] = chunk_metadata[\"_oid\"] + \"$\" + str(id) + \"$\" + str(ic)\n",
" chunk_metadata[\"document_id\"] = str(id)\n",
" chunk_metadata[\"document_summary\"] = str(summ[0])\n",
" chunks_with_mdata.append(\n",
" Document(page_content=str(chunk), metadata=chunk_metadata)\n",
" )\n",
"\n",
"\"\"\" verify \"\"\"\n",
"print(f\"Number of total chunks with metadata: {len(chunks_with_mdata)}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At this point, we have processed the documents and generated chunks with metadata. Next, we will create Oracle AI Vector Store with those chunks.\n",
"\n",
"Here is the sample code how to do that:"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Vector Store Table: oravs\n"
]
}
],
"source": [
"# create Oracle AI Vector Store\n",
"vectorstore = OracleVS.from_documents(\n",
" chunks_with_mdata,\n",
" embedder,\n",
" client=conn,\n",
" table_name=\"oravs\",\n",
" distance_strategy=DistanceStrategy.DOT_PRODUCT,\n",
")\n",
"\n",
"\"\"\" verify \"\"\"\n",
"print(f\"Vector Store Table: {vectorstore.table_name}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The above example creates a vector store with DOT_PRODUCT distance strategy. \n",
"\n",
"However, the users can create Oracle AI Vector Store provides different distance strategies. Please see the [comprehensive guide](/docs/integrations/vectorstores/oracle) for more information."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that we have embeddings stored in vector stores, let's create an index on them to get better semantic search performance during query time.\n",
"\n",
"***Note*** If you are getting some insufficient memory error, please increase ***vector_memory_size*** in your database.\n",
"\n",
"Here is the sample code to create an index:"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [],
"source": [
"oraclevs.create_index(\n",
" conn, vectorstore, params={\"idx_name\": \"hnsw_oravs\", \"idx_type\": \"HNSW\"}\n",
")\n",
"\n",
"print(\"Index created.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The above example creates a default HNSW index on the embeddings stored in 'oravs' table. The users can set different parameters as per their requirements. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters.\n",
"\n",
"Also, there are different types of vector indices that the users can create. Please see the [comprehensive guide](/docs/integrations/vectorstores/oracle) for more information.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Perform Semantic Search\n",
"All set!\n",
"\n",
"We have processed the documents, stored them to vector store, and then created index to get better query performance. Now let's do some semantic searches.\n",
"\n",
"Here is the sample code for this:"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[Document(page_content='The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table. Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.', metadata={'_oid': '662f2f257677f3c2311a8ff999fd34e5', '_rowid': 'AAAR/xAAEAAAAAnAAC', 'id': '662f2f257677f3c2311a8ff999fd34e5$3$1', 'document_id': '3', 'document_summary': 'Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.\\n\\n'})]\n",
"[]\n",
"[(Document(page_content='The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table. Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.', metadata={'_oid': '662f2f257677f3c2311a8ff999fd34e5', '_rowid': 'AAAR/xAAEAAAAAnAAC', 'id': '662f2f257677f3c2311a8ff999fd34e5$3$1', 'document_id': '3', 'document_summary': 'Sometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.\\n\\n'}), 0.055675752460956573)]\n",
"[]\n",
"[Document(page_content='If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.', metadata={'_oid': '662f2f253acf96b33b430b88699490a2', '_rowid': 'AAAR/xAAEAAAAAnAAA', 'id': '662f2f253acf96b33b430b88699490a2$1$1', 'document_id': '1', 'document_summary': 'If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.\\n\\n'})]\n",
"[Document(page_content='If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.', metadata={'_oid': '662f2f253acf96b33b430b88699490a2', '_rowid': 'AAAR/xAAEAAAAAnAAA', 'id': '662f2f253acf96b33b430b88699490a2$1$1', 'document_id': '1', 'document_summary': 'If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.\\n\\n'})]\n"
]
}
],
"source": [
"query = \"What is Oracle AI Vector Store?\"\n",
"filter = {\"document_id\": [\"1\"]}\n",
"\n",
"# Similarity search without a filter\n",
"print(vectorstore.similarity_search(query, 1))\n",
"\n",
"# Similarity search with a filter\n",
"print(vectorstore.similarity_search(query, 1, filter=filter))\n",
"\n",
"# Similarity search with relevance score\n",
"print(vectorstore.similarity_search_with_score(query, 1))\n",
"\n",
"# Similarity search with relevance score with filter\n",
"print(vectorstore.similarity_search_with_score(query, 1, filter=filter))\n",
"\n",
"# Max marginal relevance search\n",
"print(vectorstore.max_marginal_relevance_search(query, 1, fetch_k=20, lambda_mult=0.5))\n",
"\n",
"# Max marginal relevance search with filter\n",
"print(\n",
" vectorstore.max_marginal_relevance_search(\n",
" query, 1, fetch_k=20, lambda_mult=0.5, filter=filter\n",
" )\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

@ -0,0 +1,236 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Oracle AI Vector Search: Document Processing\n",
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords. One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system. This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
"\n",
"The guide demonstrates how to use Document Processing Capabilities within Oracle AI Vector Search to load and chunk documents using OracleDocLoader and OracleTextSplitter respectively."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prerequisites\n",
"\n",
"Please install Oracle Python Client driver to use Langchain with Oracle AI Vector Search. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# pip install oracledb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Connect to Oracle Database\n",
"The following sample code will show how to connect to Oracle Database. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"import oracledb\n",
"\n",
"# please update with your username, password, hostname and service_name\n",
"username = \"<username>\"\n",
"password = \"<password>\"\n",
"dsn = \"<hostname>/<service_name>\"\n",
"\n",
"try:\n",
" conn = oracledb.connect(user=username, password=password, dsn=dsn)\n",
" print(\"Connection successful!\")\n",
"except Exception as e:\n",
" print(\"Connection failed!\")\n",
" sys.exit(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's create a table and insert some sample docs to test."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"try:\n",
" cursor = conn.cursor()\n",
"\n",
" drop_table_sql = \"\"\"drop table if exists demo_tab\"\"\"\n",
" cursor.execute(drop_table_sql)\n",
"\n",
" create_table_sql = \"\"\"create table demo_tab (id number, data clob)\"\"\"\n",
" cursor.execute(create_table_sql)\n",
"\n",
" insert_row_sql = \"\"\"insert into demo_tab values (:1, :2)\"\"\"\n",
" rows_to_insert = [\n",
" (\n",
" 1,\n",
" \"If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.\",\n",
" ),\n",
" (\n",
" 2,\n",
" \"A tablespace can be online (accessible) or offline (not accessible) whenever the database is open.\\nA tablespace is usually online so that its data is available to users. The SYSTEM tablespace and temporary tablespaces cannot be taken offline.\",\n",
" ),\n",
" (\n",
" 3,\n",
" \"The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table.\\nSometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.\",\n",
" ),\n",
" ]\n",
" cursor.executemany(insert_row_sql, rows_to_insert)\n",
"\n",
" conn.commit()\n",
"\n",
" print(\"Table created and populated.\")\n",
" cursor.close()\n",
"except Exception as e:\n",
" print(\"Table creation failed.\")\n",
" cursor.close()\n",
" conn.close()\n",
" sys.exit(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load Documents\n",
"The users can load the documents from Oracle Database or a file system or both. They just need to set the loader parameters accordingly. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters.\n",
"\n",
"The main benefit of using OracleDocLoader is that it can handle 150+ different file formats. You don't need to use different types of loader for different file formats. Here is the list of the formats that we support: [Oracle Text Supported Document Formats](https://docs.oracle.com/en/database/oracle/oracle-database/23/ccref/oracle-text-supported-document-formats.html)\n",
"\n",
"The following sample code will show how to do that:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders.oracleai import OracleDocLoader\n",
"from langchain_core.documents import Document\n",
"\n",
"\"\"\"\n",
"# loading a local file\n",
"loader_params = {}\n",
"loader_params[\"file\"] = \"<file>\"\n",
"\n",
"# loading from a local directory\n",
"loader_params = {}\n",
"loader_params[\"dir\"] = \"<directory>\"\n",
"\"\"\"\n",
"\n",
"# loading from Oracle Database table\n",
"loader_params = {\n",
" \"owner\": \"<owner>\",\n",
" \"tablename\": \"demo_tab\",\n",
" \"colname\": \"data\",\n",
"}\n",
"\n",
"\"\"\" load the docs \"\"\"\n",
"loader = OracleDocLoader(conn=conn, params=loader_params)\n",
"docs = loader.load()\n",
"\n",
"\"\"\" verify \"\"\"\n",
"print(f\"Number of docs loaded: {len(docs)}\")\n",
"# print(f\"Document-0: {docs[0].page_content}\") # content"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Split Documents\n",
"The documents can be in different sizes: small, medium, large, or very large. The users like to split/chunk their documents into smaller pieces to generate embeddings. There are lots of different splitting customizations the users can do. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters.\n",
"\n",
"The following sample code will show how to do that:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.document_loaders.oracleai import OracleTextSplitter\n",
"from langchain_core.documents import Document\n",
"\n",
"\"\"\"\n",
"# Some examples\n",
"# split by chars, max 500 chars\n",
"splitter_params = {\"split\": \"chars\", \"max\": 500, \"normalize\": \"all\"}\n",
"\n",
"# split by words, max 100 words\n",
"splitter_params = {\"split\": \"words\", \"max\": 100, \"normalize\": \"all\"}\n",
"\n",
"# split by sentence, max 20 sentences\n",
"splitter_params = {\"split\": \"sentence\", \"max\": 20, \"normalize\": \"all\"}\n",
"\"\"\"\n",
"\n",
"# split by default parameters\n",
"splitter_params = {\"normalize\": \"all\"}\n",
"\n",
"# get the splitter instance\n",
"splitter = OracleTextSplitter(conn=conn, params=splitter_params)\n",
"\n",
"list_chunks = []\n",
"for doc in docs:\n",
" chunks = splitter.split_text(doc.page_content)\n",
" list_chunks.extend(chunks)\n",
"\n",
"\"\"\" verify \"\"\"\n",
"print(f\"Number of Chunks: {len(list_chunks)}\")\n",
"# print(f\"Chunk-0: {list_chunks[0]}\") # content"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### End to End Demo\n",
"Please refer to our complete demo guide [Oracle AI Vector Search End-to-End Demo Guide](https://github.com/langchain-ai/langchain/tree/master/cookbook/oracleai_demo.ipynb) to build an end to end RAG pipeline with the help of Oracle AI Vector Search.\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

@ -0,0 +1,65 @@
# OracleAI Vector Search
Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords. One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.
This is not only powerful but also significantly more effective because you dont need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.
In addition, because Oracle has been building database technologies for so long, your vectors can benefit from all of Oracle Database's most powerful features, like the following:
* Partitioning Support
* Real Application Clusters scalability
* Exadata smart scans
* Shard processing across geographically distributed databases
* Transactions
* Parallel SQL
* Disaster recovery
* Security
* Oracle Machine Learning
* Oracle Graph Database
* Oracle Spatial and Graph
* Oracle Blockchain
* JSON
## Document Loaders
Please check the [usage example](/docs/integrations/document_loaders/oracleai).
```python
from langchain_community.document_loaders.oracleai import OracleDocLoader
```
## Text Splitter
Please check the [usage example](/docs/integrations/document_loaders/oracleai).
```python
from langchain_community.document_loaders.oracleai import OracleTextSplitter
```
## Embeddings
Please check the [usage example](/docs/integrations/text_embedding/oracleai).
```python
from langchain_community.embeddings.oracleai import OracleEmbeddings
```
## Summary
Please check the [usage example](/docs/integrations/tools/oracleai).
```python
from langchain_community.utilities.oracleai import OracleSummary
```
## Vector Store
Please check the [usage example](/docs/integrations/vectorstores/oracle).
```python
from langchain_community.vectorstores.oraclevs import OracleVS
```
## End to End Demo
Please check the [Oracle AI Vector Search End-to-End Demo Guide](https://github.com/langchain-ai/langchain/blob/master/cookbook/oracleai_demo).

@ -0,0 +1,262 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Oracle AI Vector Search: Generate Embeddings\n",
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords. One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system. This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
"\n",
"The guide demonstrates how to use Embedding Capabilities within Oracle AI Vector Search to generate embeddings for your documents using OracleEmbeddings."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prerequisites\n",
"\n",
"Please install Oracle Python Client driver to use Langchain with Oracle AI Vector Search. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# pip install oracledb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Connect to Oracle Database\n",
"The following sample code will show how to connect to Oracle Database. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"import oracledb\n",
"\n",
"# please update with your username, password, hostname and service_name\n",
"username = \"<username>\"\n",
"password = \"<password>\"\n",
"dsn = \"<hostname>/<service_name>\"\n",
"\n",
"try:\n",
" conn = oracledb.connect(user=username, password=password, dsn=dsn)\n",
" print(\"Connection successful!\")\n",
"except Exception as e:\n",
" print(\"Connection failed!\")\n",
" sys.exit(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For embedding, we have a few provider options that the users can choose from such as database, 3rd party providers like ocigenai, huggingface, openai, etc. If the users choose to use 3rd party provider, they need to create a credential with corresponding authentication information. On the other hand, if the users choose to use 'database' as provider, they need to load an onnx model to Oracle Database for embeddings."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load ONNX Model\n",
"\n",
"To generate embeddings, Oracle provides a few provider options for users to choose from. The users can choose 'database' provider or some 3rd party providers like OCIGENAI, HuggingFace, etc.\n",
"\n",
"***Note*** If the users choose database option, they need to load an ONNX model to Oracle Database. The users do not need to load an ONNX model to Oracle Database if they choose to use 3rd party provider to generate embeddings.\n",
"\n",
"One of the core benefits of using an ONNX model is that the users do not need to transfer their data to 3rd party to generate embeddings. And also, since it does not involve any network or REST API calls, it may provide better performance.\n",
"\n",
"Here is the sample code to load an ONNX model to Oracle Database:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
"\n",
"# please update with your related information\n",
"# make sure that you have onnx file in the system\n",
"onnx_dir = \"DEMO_DIR\"\n",
"onnx_file = \"tinybert.onnx\"\n",
"model_name = \"demo_model\"\n",
"\n",
"try:\n",
" OracleEmbeddings.load_onnx_model(conn, onnx_dir, onnx_file, model_name)\n",
" print(\"ONNX model loaded.\")\n",
"except Exception as e:\n",
" print(\"ONNX model loading failed!\")\n",
" sys.exit(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Credential\n",
"\n",
"On the other hand, if the users choose to use 3rd party provider to generate embeddings, they need to create credential to access 3rd party provider's end points.\n",
"\n",
"***Note:*** The users do not need to create any credential if they choose to use 'database' provider to generate embeddings. Should the users choose to 3rd party provider, they need to create credential for the 3rd party provider they want to use. \n",
"\n",
"Here is a sample example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"try:\n",
" cursor = conn.cursor()\n",
" cursor.execute(\n",
" \"\"\"\n",
" declare\n",
" jo json_object_t;\n",
" begin\n",
" -- HuggingFace\n",
" dbms_vector_chain.drop_credential(credential_name => 'HF_CRED');\n",
" jo := json_object_t();\n",
" jo.put('access_token', '<access_token>');\n",
" dbms_vector_chain.create_credential(\n",
" credential_name => 'HF_CRED',\n",
" params => json(jo.to_string));\n",
"\n",
" -- OCIGENAI\n",
" dbms_vector_chain.drop_credential(credential_name => 'OCI_CRED');\n",
" jo := json_object_t();\n",
" jo.put('user_ocid','<user_ocid>');\n",
" jo.put('tenancy_ocid','<tenancy_ocid>');\n",
" jo.put('compartment_ocid','<compartment_ocid>');\n",
" jo.put('private_key','<private_key>');\n",
" jo.put('fingerprint','<fingerprint>');\n",
" dbms_vector_chain.create_credential(\n",
" credential_name => 'OCI_CRED',\n",
" params => json(jo.to_string));\n",
" end;\n",
" \"\"\"\n",
" )\n",
" cursor.close()\n",
" print(\"Credentials created.\")\n",
"except Exception as ex:\n",
" cursor.close()\n",
" raise"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Generate Embeddings\n",
"Oracle AI Vector Search provides a number of ways to generate embeddings. The users can load an ONNX embedding model to Oracle Database and use it to generate embeddings or use some 3rd party API's end points to generate embeddings. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***Note:*** The users may need to set proxy if they want to use some 3rd party embedding generation providers other than 'database' provider (aka using ONNX model)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# proxy to be used when we instantiate summary and embedder object\n",
"proxy = \"<proxy>\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following sample code will show how to generate embeddings:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.embeddings.oracleai import OracleEmbeddings\n",
"from langchain_core.documents import Document\n",
"\n",
"\"\"\"\n",
"# using ocigenai\n",
"embedder_params = {\n",
" \"provider\": \"ocigenai\",\n",
" \"credential_name\": \"OCI_CRED\",\n",
" \"url\": \"https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/embedText\",\n",
" \"model\": \"cohere.embed-english-light-v3.0\",\n",
"}\n",
"\n",
"# using huggingface\n",
"embedder_params = {\n",
" \"provider\": \"huggingface\", \n",
" \"credential_name\": \"HF_CRED\", \n",
" \"url\": \"https://api-inference.huggingface.co/pipeline/feature-extraction/\", \n",
" \"model\": \"sentence-transformers/all-MiniLM-L6-v2\", \n",
" \"wait_for_model\": \"true\"\n",
"}\n",
"\"\"\"\n",
"\n",
"# using ONNX model loaded to Oracle Database\n",
"embedder_params = {\"provider\": \"database\", \"model\": \"demo_model\"}\n",
"\n",
"# Remove proxy if not required\n",
"embedder = OracleEmbeddings(conn=conn, params=embedder_params, proxy=proxy)\n",
"embed = embedder.embed_query(\"Hello World!\")\n",
"\n",
"\"\"\" verify \"\"\"\n",
"print(f\"Embedding generated by OracleEmbeddings: {embed}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### End to End Demo\n",
"Please refer to our complete demo guide [Oracle AI Vector Search End-to-End Demo Guide](https://github.com/langchain-ai/langchain/tree/master/cookbook/oracleai_demo.ipynb) to build an end to end RAG pipeline with the help of Oracle AI Vector Search.\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

@ -0,0 +1,174 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Oracle AI Vector Search: Generate Summary\n",
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords. One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system. This is not only powerful but also significantly more effective because you don't need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
"\n",
"The guide demonstrates how to use Summary Capabilities within Oracle AI Vector Search to generate summary for your documents using OracleSummary."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prerequisites\n",
"\n",
"Please install Oracle Python Client driver to use Langchain with Oracle AI Vector Search. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# pip install oracledb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Connect to Oracle Database\n",
"The following sample code will show how to connect to Oracle Database. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"\n",
"import oracledb\n",
"\n",
"# please update with your username, password, hostname and service_name\n",
"username = \"<username>\"\n",
"password = \"<password>\"\n",
"dsn = \"<hostname>/<service_name>\"\n",
"\n",
"try:\n",
" conn = oracledb.connect(user=username, password=password, dsn=dsn)\n",
" print(\"Connection successful!\")\n",
"except Exception as e:\n",
" print(\"Connection failed!\")\n",
" sys.exit(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Generate Summary\n",
"The Oracle AI Vector Search Langchain library provides APIs to generate summaries of documents. There are a few summary generation provider options including Database, OCIGENAI, HuggingFace and so on. The users can choose their preferred provider to generate a summary. They just need to set the summary parameters accordingly. Please refer to the Oracle AI Vector Search Guide book for complete information about these parameters."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"***Note:*** The users may need to set proxy if they want to use some 3rd party summary generation providers other than Oracle's in-house and default provider: 'database'. If you don't have proxy, please remove the proxy parameter when you instantiate the OracleSummary."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# proxy to be used when we instantiate summary and embedder object\n",
"proxy = \"<proxy>\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following sample code will show how to generate summary:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.utilities.oracleai import OracleSummary\n",
"from langchain_core.documents import Document\n",
"\n",
"\"\"\"\n",
"# using 'ocigenai' provider\n",
"summary_params = {\n",
" \"provider\": \"ocigenai\",\n",
" \"credential_name\": \"OCI_CRED\",\n",
" \"url\": \"https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/summarizeText\",\n",
" \"model\": \"cohere.command\",\n",
"}\n",
"\n",
"# using 'huggingface' provider\n",
"summary_params = {\n",
" \"provider\": \"huggingface\",\n",
" \"credential_name\": \"HF_CRED\",\n",
" \"url\": \"https://api-inference.huggingface.co/models/\",\n",
" \"model\": \"facebook/bart-large-cnn\",\n",
" \"wait_for_model\": \"true\"\n",
"}\n",
"\"\"\"\n",
"\n",
"# using 'database' provider\n",
"summary_params = {\n",
" \"provider\": \"database\",\n",
" \"glevel\": \"S\",\n",
" \"numParagraphs\": 1,\n",
" \"language\": \"english\",\n",
"}\n",
"\n",
"# get the summary instance\n",
"# Remove proxy if not required\n",
"summ = OracleSummary(conn=conn, params=summary_params, proxy=proxy)\n",
"summary = summ.get_summary(\n",
" \"In the heart of the forest, \"\n",
" + \"a lone fox ventured out at dusk, seeking a lost treasure. \"\n",
" + \"With each step, memories flooded back, guiding its path. \"\n",
" + \"As the moon rose high, illuminating the night, the fox unearthed \"\n",
" + \"not gold, but a forgotten friendship, worth more than any riches.\"\n",
")\n",
"\n",
"print(f\"Summary generated by OracleSummary: {summary}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### End to End Demo\n",
"Please refer to our complete demo guide [Oracle AI Vector Search End-to-End Demo Guide](https://github.com/langchain-ai/langchain/tree/master/cookbook/oracleai_demo.ipynb) to build an end to end RAG pipeline with the help of Oracle AI Vector Search.\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

@ -0,0 +1,469 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "dd33e9d5-9dba-4aac-9f7f-4cf9e6686593",
"metadata": {},
"source": [
"# Oracle AI Vector Search: Vector Store\n",
"\n",
"Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads that allows you to query data based on semantics, rather than keywords.\n",
"One of the biggest benefit of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data in one single system.\n",
"This is not only powerful but also significantly more effective because you dont need to add a specialized vector database, eliminating the pain of data fragmentation between multiple systems.\n",
"\n",
"In addition, because Oracle has been building database technologies for so long, your vectors can benefit from all of Oracle Database's most powerful features, like the following:\n",
"\n",
" * Partitioning Support\n",
" * Real Application Clusters scalability\n",
" * Exadata smart scans\n",
" * Shard processing across geographically distributed databases\n",
" * Transactions\n",
" * Parallel SQL\n",
" * Disaster recovery\n",
" * Security\n",
" * Oracle Machine Learning\n",
" * Oracle Graph Database\n",
" * Oracle Spatial and Graph\n",
" * Oracle Blockchain\n",
" * JSON"
]
},
{
"cell_type": "markdown",
"id": "7bd80054-c803-47e1-a259-c40ed073c37d",
"metadata": {},
"source": [
"### Prerequisites for using Langchain with Oracle AI Vector Search\n",
"\n",
"Please install Oracle Python Client driver to use Langchain with Oracle AI Vector Search. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2bbb989d-c6fb-4ab9-bafd-a95fd48538d0",
"metadata": {},
"outputs": [],
"source": [
"# pip install oracledb"
]
},
{
"cell_type": "markdown",
"id": "0fceaa5a-95da-4ebd-8b8d-5e73bb653172",
"metadata": {},
"source": [
"### Connect to Oracle AI Vector Search"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4421e4b7-2c7e-4bcd-82b3-9576595edd0f",
"metadata": {},
"outputs": [],
"source": [
"import oracledb\n",
"\n",
"username = \"username\"\n",
"password = \"password\"\n",
"dsn = \"ipaddress:port/orclpdb1\"\n",
"\n",
"try:\n",
" connection = oracledb.connect(user=username, password=password, dsn=dsn)\n",
" print(\"Connection successful!\")\n",
"except Exception as e:\n",
" print(\"Connection failed!\")"
]
},
{
"cell_type": "markdown",
"id": "b11cf362-01b0-485d-8527-31b0fbb5028e",
"metadata": {},
"source": [
"### Import the required dependencies to play with Oracle AI Vector Search"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "43ea59e3-2910-45a6-b195-5f06094bb7c9",
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.embeddings import HuggingFaceEmbeddings\n",
"from langchain_community.vectorstores import oraclevs\n",
"from langchain_community.vectorstores.oraclevs import OracleVS\n",
"from langchain_community.vectorstores.utils import DistanceStrategy\n",
"from langchain_core.documents import Document"
]
},
{
"cell_type": "markdown",
"id": "0aac10dc-a9cc-4fdb-901c-1b7a4bbbe5a7",
"metadata": {},
"source": [
"### Load Documents"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "70ac6982-b13a-4e8c-9c47-57c6d136ac60",
"metadata": {},
"outputs": [],
"source": [
"# Define a list of documents (These dummy examples are 5 random documents from Oracle Concepts Manual )\n",
"\n",
"documents_json_list = [\n",
" {\n",
" \"id\": \"cncpt_15.5.3.2.2_P4\",\n",
" \"text\": \"If the answer to any preceding questions is yes, then the database stops the search and allocates space from the specified tablespace; otherwise, space is allocated from the database default shared temporary tablespace.\",\n",
" \"link\": \"https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/logical-storage-structures.html#GUID-5387D7B2-C0CA-4C1E-811B-C7EB9B636442\",\n",
" },\n",
" {\n",
" \"id\": \"cncpt_15.5.5_P1\",\n",
" \"text\": \"A tablespace can be online (accessible) or offline (not accessible) whenever the database is open.\\nA tablespace is usually online so that its data is available to users. The SYSTEM tablespace and temporary tablespaces cannot be taken offline.\",\n",
" \"link\": \"https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/logical-storage-structures.html#GUID-D02B2220-E6F5-40D9-AFB5-BC69BCEF6CD4\",\n",
" },\n",
" {\n",
" \"id\": \"cncpt_22.3.4.3.1_P2\",\n",
" \"text\": \"The database stores LOBs differently from other data types. Creating a LOB column implicitly creates a LOB segment and a LOB index. The tablespace containing the LOB segment and LOB index, which are always stored together, may be different from the tablespace containing the table.\\nSometimes the database can store small amounts of LOB data in the table itself rather than in a separate LOB segment.\",\n",
" \"link\": \"https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/concepts-for-database-developers.html#GUID-3C50EAB8-FC39-4BB3-B680-4EACCE49E866\",\n",
" },\n",
" {\n",
" \"id\": \"cncpt_22.3.4.3.1_P3\",\n",
" \"text\": \"The LOB segment stores data in pieces called chunks. A chunk is a logically contiguous set of data blocks and is the smallest unit of allocation for a LOB. A row in the table stores a pointer called a LOB locator, which points to the LOB index. When the table is queried, the database uses the LOB index to quickly locate the LOB chunks.\",\n",
" \"link\": \"https://docs.oracle.com/en/database/oracle/oracle-database/23/cncpt/concepts-for-database-developers.html#GUID-3C50EAB8-FC39-4BB3-B680-4EACCE49E866\",\n",
" },\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eaa942d6-5954-4898-8c32-3627b923a3a5",
"metadata": {},
"outputs": [],
"source": [
"# Create Langchain Documents\n",
"\n",
"documents_langchain = []\n",
"\n",
"for doc in documents_json_list:\n",
" metadata = {\"id\": doc[\"id\"], \"link\": doc[\"link\"]}\n",
" doc_langchain = Document(page_content=doc[\"text\"], metadata=metadata)\n",
" documents_langchain.append(doc_langchain)"
]
},
{
"cell_type": "markdown",
"id": "6823f5e6-997c-4f15-927b-bd44c61f105f",
"metadata": {},
"source": [
"### Using AI Vector Search Create a bunch of Vector Stores with different distance strategies\n",
"\n",
"First we will create three vector stores each with different distance functions. Since we have not created indices in them yet, they will just create tables for now. Later we will use these vector stores to create HNSW indicies.\n",
"\n",
"You can manually connect to the Oracle Database and will see three tables \n",
"Documents_DOT, Documents_COSINE and Documents_EUCLIDEAN. \n",
"\n",
"We will then create three additional tables Documents_DOT_IVF, Documents_COSINE_IVF and Documents_EUCLIDEAN_IVF which will be used\n",
"to create IVF indicies on the tables instead of HNSW indices. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ed1b253e-5f5c-4a81-983c-74645213a170",
"metadata": {},
"outputs": [],
"source": [
"# Ingest documents into Oracle Vector Store using different distance strategies\n",
"\n",
"model = HuggingFaceEmbeddings(model_name=\"sentence-transformers/all-mpnet-base-v2\")\n",
"\n",
"vector_store_dot = OracleVS.from_documents(\n",
" documents_langchain,\n",
" model,\n",
" client=connection,\n",
" table_name=\"Documents_DOT\",\n",
" distance_strategy=DistanceStrategy.DOT_PRODUCT,\n",
")\n",
"vector_store_max = OracleVS.from_documents(\n",
" documents_langchain,\n",
" model,\n",
" client=connection,\n",
" table_name=\"Documents_COSINE\",\n",
" distance_strategy=DistanceStrategy.COSINE,\n",
")\n",
"vector_store_euclidean = OracleVS.from_documents(\n",
" documents_langchain,\n",
" model,\n",
" client=connection,\n",
" table_name=\"Documents_EUCLIDEAN\",\n",
" distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,\n",
")\n",
"\n",
"# Ingest documents into Oracle Vector Store using different distance strategies\n",
"vector_store_dot_ivf = OracleVS.from_documents(\n",
" documents_langchain,\n",
" model,\n",
" client=connection,\n",
" table_name=\"Documents_DOT_IVF\",\n",
" distance_strategy=DistanceStrategy.DOT_PRODUCT,\n",
")\n",
"vector_store_max_ivf = OracleVS.from_documents(\n",
" documents_langchain,\n",
" model,\n",
" client=connection,\n",
" table_name=\"Documents_COSINE_IVF\",\n",
" distance_strategy=DistanceStrategy.COSINE,\n",
")\n",
"vector_store_euclidean_ivf = OracleVS.from_documents(\n",
" documents_langchain,\n",
" model,\n",
" client=connection,\n",
" table_name=\"Documents_EUCLIDEAN_IVF\",\n",
" distance_strategy=DistanceStrategy.EUCLIDEAN_DISTANCE,\n",
")"
]
},
{
"cell_type": "markdown",
"id": "77c29505-8688-4b87-9a99-e648fbb2d425",
"metadata": {},
"source": [
"### Demonstrating add, delete operations for texts, and basic similarity search\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "306563ae-577b-4bc7-8a92-3dd6a59310f5",
"metadata": {},
"outputs": [],
"source": [
"def manage_texts(vector_stores):\n",
" \"\"\"\n",
" Adds texts to each vector store, demonstrates error handling for duplicate additions,\n",
" and performs deletion of texts. Showcases similarity searches and index creation for each vector store.\n",
"\n",
" Args:\n",
" - vector_stores (list): A list of OracleVS instances.\n",
" \"\"\"\n",
" texts = [\"Rohan\", \"Shailendra\"]\n",
" metadata = [\n",
" {\"id\": \"100\", \"link\": \"Document Example Test 1\"},\n",
" {\"id\": \"101\", \"link\": \"Document Example Test 2\"},\n",
" ]\n",
"\n",
" for i, vs in enumerate(vector_stores, start=1):\n",
" # Adding texts\n",
" try:\n",
" vs.add_texts(texts, metadata)\n",
" print(f\"\\n\\n\\nAdd texts complete for vector store {i}\\n\\n\\n\")\n",
" except Exception as ex:\n",
" print(f\"\\n\\n\\nExpected error on duplicate add for vector store {i}\\n\\n\\n\")\n",
"\n",
" # Deleting texts using the value of 'id'\n",
" vs.delete([metadata[0][\"id\"]])\n",
" print(f\"\\n\\n\\nDelete texts complete for vector store {i}\\n\\n\\n\")\n",
"\n",
" # Similarity search\n",
" results = vs.similarity_search(\"How are LOBS stored in Oracle Database\", 2)\n",
" print(f\"\\n\\n\\nSimilarity search results for vector store {i}: {results}\\n\\n\\n\")\n",
"\n",
"\n",
"vector_store_list = [\n",
" vector_store_dot,\n",
" vector_store_max,\n",
" vector_store_euclidean,\n",
" vector_store_dot_ivf,\n",
" vector_store_max_ivf,\n",
" vector_store_euclidean_ivf,\n",
"]\n",
"manage_texts(vector_store_list)"
]
},
{
"cell_type": "markdown",
"id": "0980cb33-69cf-4547-842a-afdc4d6fa7d3",
"metadata": {},
"source": [
"### Demonstrating index creation with specific parameters for each distance strategy\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "46298a27-e309-456e-b2b8-771d9cb3be29",
"metadata": {},
"outputs": [],
"source": [
"def create_search_indices(connection):\n",
" \"\"\"\n",
" Creates search indices for the vector stores, each with specific parameters tailored to their distance strategy.\n",
" \"\"\"\n",
" # Index for DOT_PRODUCT strategy\n",
" # Notice we are creating a HNSW index with default parameters\n",
" # This will default to creating a HNSW index with 8 Parallel Workers and use the Default Accuracy used by Oracle AI Vector Search\n",
" oraclevs.create_index(\n",
" connection,\n",
" vector_store_dot,\n",
" params={\"idx_name\": \"hnsw_idx1\", \"idx_type\": \"HNSW\"},\n",
" )\n",
"\n",
" # Index for COSINE strategy with specific parameters\n",
" # Notice we are creating a HNSW index with parallel 16 and Target Accuracy Specification as 97 percent\n",
" oraclevs.create_index(\n",
" connection,\n",
" vector_store_max,\n",
" params={\n",
" \"idx_name\": \"hnsw_idx2\",\n",
" \"idx_type\": \"HNSW\",\n",
" \"accuracy\": 97,\n",
" \"parallel\": 16,\n",
" },\n",
" )\n",
"\n",
" # Index for EUCLIDEAN_DISTANCE strategy with specific parameters\n",
" # Notice we are creating a HNSW index by specifying Power User Parameters which are neighbors = 64 and efConstruction = 100\n",
" oraclevs.create_index(\n",
" connection,\n",
" vector_store_euclidean,\n",
" params={\n",
" \"idx_name\": \"hnsw_idx3\",\n",
" \"idx_type\": \"HNSW\",\n",
" \"neighbors\": 64,\n",
" \"efConstruction\": 100,\n",
" },\n",
" )\n",
"\n",
" # Index for DOT_PRODUCT strategy with specific parameters\n",
" # Notice we are creating an IVF index with default parameters\n",
" # This will default to creating an IVF index with 8 Parallel Workers and use the Default Accuracy used by Oracle AI Vector Search\n",
" oraclevs.create_index(\n",
" connection,\n",
" vector_store_dot_ivf,\n",
" params={\n",
" \"idx_name\": \"ivf_idx1\",\n",
" \"idx_type\": \"IVF\",\n",
" },\n",
" )\n",
"\n",
" # Index for COSINE strategy with specific parameters\n",
" # Notice we are creating an IVF index with parallel 32 and Target Accuracy Specification as 90 percent\n",
" oraclevs.create_index(\n",
" connection,\n",
" vector_store_max_ivf,\n",
" params={\n",
" \"idx_name\": \"ivf_idx2\",\n",
" \"idx_type\": \"IVF\",\n",
" \"accuracy\": 90,\n",
" \"parallel\": 32,\n",
" },\n",
" )\n",
"\n",
" # Index for EUCLIDEAN_DISTANCE strategy with specific parameters\n",
" # Notice we are creating an IVF index by specifying Power User Parameters which is neighbor_part = 64\n",
" oraclevs.create_index(\n",
" connection,\n",
" vector_store_euclidean_ivf,\n",
" params={\"idx_name\": \"ivf_idx3\", \"idx_type\": \"IVF\", \"neighbor_part\": 64},\n",
" )\n",
"\n",
" print(\"Index creation complete.\")\n",
"\n",
"\n",
"create_search_indices(connection)"
]
},
{
"cell_type": "markdown",
"id": "7223d048-5c0b-4e91-a91b-a7daa9f86758",
"metadata": {},
"source": [
"### Now we will conduct a bunch of advanced searches on all six vector stores. Each of these three searches have a with and without filter version. The filter only selects the document with id 101 out and filters out everything else"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "37ca2e7d-9803-4260-95e7-62776d4fb820",
"metadata": {},
"outputs": [],
"source": [
"# Conduct advanced searches after creating the indices\n",
"def conduct_advanced_searches(vector_stores):\n",
" query = \"How are LOBS stored in Oracle Database\"\n",
" # Constructing a filter for direct comparison against document metadata\n",
" # This filter aims to include documents whose metadata 'id' is exactly '2'\n",
" filter_criteria = {\"id\": [\"101\"]} # Direct comparison filter\n",
"\n",
" for i, vs in enumerate(vector_stores, start=1):\n",
" print(f\"\\n--- Vector Store {i} Advanced Searches ---\")\n",
" # Similarity search without a filter\n",
" print(\"\\nSimilarity search results without filter:\")\n",
" print(vs.similarity_search(query, 2))\n",
"\n",
" # Similarity search with a filter\n",
" print(\"\\nSimilarity search results with filter:\")\n",
" print(vs.similarity_search(query, 2, filter=filter_criteria))\n",
"\n",
" # Similarity search with relevance score\n",
" print(\"\\nSimilarity search with relevance score:\")\n",
" print(vs.similarity_search_with_score(query, 2))\n",
"\n",
" # Similarity search with relevance score with filter\n",
" print(\"\\nSimilarity search with relevance score with filter:\")\n",
" print(vs.similarity_search_with_score(query, 2, filter=filter_criteria))\n",
"\n",
" # Max marginal relevance search\n",
" print(\"\\nMax marginal relevance search results:\")\n",
" print(vs.max_marginal_relevance_search(query, 2, fetch_k=20, lambda_mult=0.5))\n",
"\n",
" # Max marginal relevance search with filter\n",
" print(\"\\nMax marginal relevance search results with filter:\")\n",
" print(\n",
" vs.max_marginal_relevance_search(\n",
" query, 2, fetch_k=20, lambda_mult=0.5, filter=filter_criteria\n",
" )\n",
" )\n",
"\n",
"\n",
"conduct_advanced_searches(vector_store_list)"
]
},
{
"cell_type": "markdown",
"id": "0da8c7e2-0db0-4363-b31b-a7a5e3f83717",
"metadata": {},
"source": [
"### End to End Demo\n",
"Please refer to our complete demo guide [Oracle AI Vector Search End-to-End Demo Guide](https://github.com/langchain-ai/langchain/tree/master/cookbook/oracleai_demo.ipynb) to build an end to end RAG pipeline with the help of Oracle AI Vector Search.\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -331,6 +331,10 @@ if TYPE_CHECKING:
from langchain_community.document_loaders.oracleadb_loader import (
OracleAutonomousDatabaseLoader,
)
from langchain_community.document_loaders.oracleai import (
OracleDocLoader, # noqa: F401
OracleTextSplitter, # noqa: F401
)
from langchain_community.document_loaders.org_mode import (
UnstructuredOrgModeLoader,
)
@ -624,6 +628,8 @@ _module_lookup = {
"OnlinePDFLoader": "langchain_community.document_loaders.pdf",
"OpenCityDataLoader": "langchain_community.document_loaders.open_city_data",
"OracleAutonomousDatabaseLoader": "langchain_community.document_loaders.oracleadb_loader", # noqa: E501
"OracleDocLoader": "langchain_community.document_loaders.oracleai",
"OracleTextSplitter": "langchain_community.document_loaders.oracleai",
"OutlookMessageLoader": "langchain_community.document_loaders.email",
"PDFMinerLoader": "langchain_community.document_loaders.pdf",
"PDFMinerPDFasHTMLLoader": "langchain_community.document_loaders.pdf",
@ -822,6 +828,8 @@ __all__ = [
"OnlinePDFLoader",
"OpenCityDataLoader",
"OracleAutonomousDatabaseLoader",
"OracleDocLoader",
"OracleTextSplitter",
"OutlookMessageLoader",
"PDFMinerLoader",
"PDFMinerPDFasHTMLLoader",

@ -0,0 +1,447 @@
# Authors:
# Harichandan Roy (hroy)
# David Jiang (ddjiang)
#
# -----------------------------------------------------------------------------
# oracleai.py
# -----------------------------------------------------------------------------
from __future__ import annotations
import hashlib
import json
import logging
import os
import random
import struct
import time
import traceback
from html.parser import HTMLParser
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Union
from langchain_core.document_loaders import BaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import TextSplitter
if TYPE_CHECKING:
from oracledb import Connection
logger = logging.getLogger(__name__)
"""ParseOracleDocMetadata class"""
class ParseOracleDocMetadata(HTMLParser):
"""Parse Oracle doc metadata..."""
def __init__(self) -> None:
super().__init__()
self.reset()
self.match = False
self.metadata: Dict[str, Any] = {}
def handle_starttag(self, tag: str, attrs: List[Tuple[str, Optional[str]]]) -> None:
if tag == "meta":
entry: Optional[str] = ""
for name, value in attrs:
if name == "name":
entry = value
if name == "content":
if entry:
self.metadata[entry] = value
elif tag == "title":
self.match = True
def handle_data(self, data: str) -> None:
if self.match:
self.metadata["title"] = data
self.match = False
def get_metadata(self) -> Dict[str, Any]:
return self.metadata
"""OracleDocReader class"""
class OracleDocReader:
"""Read a file"""
@staticmethod
def generate_object_id(input_string: Union[str, None] = None) -> str:
out_length = 32 # output length
hash_len = 8 # hash value length
if input_string is None:
input_string = "".join(
random.choices(
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789",
k=16,
)
)
# timestamp
timestamp = int(time.time())
timestamp_bin = struct.pack(">I", timestamp) # 4 bytes
# hash_value
hashval_bin = hashlib.sha256(input_string.encode()).digest()
hashval_bin = hashval_bin[:hash_len] # 8 bytes
# counter
counter_bin = struct.pack(">I", random.getrandbits(32)) # 4 bytes
# binary object id
object_id = timestamp_bin + hashval_bin + counter_bin # 16 bytes
object_id_hex = object_id.hex() # 32 bytes
object_id_hex = object_id_hex.zfill(
out_length
) # fill with zeros if less than 32 bytes
object_id_hex = object_id_hex[:out_length]
return object_id_hex
@staticmethod
def read_file(
conn: Connection, file_path: str, params: dict
) -> Union[Document, None]:
"""Read a file using OracleReader
Args:
conn: Oracle Connection,
file_path: Oracle Directory,
params: ONNX file name.
Returns:
Plain text and metadata as Langchain Document.
"""
metadata: Dict[str, Any] = {}
try:
import oracledb
except ImportError as e:
raise ImportError(
"Unable to import oracledb, please install with "
"`pip install -U oracledb`."
) from e
try:
oracledb.defaults.fetch_lobs = False
cursor = conn.cursor()
with open(file_path, "rb") as f:
data = f.read()
if data is None:
return Document(page_content="", metadata=metadata)
mdata = cursor.var(oracledb.DB_TYPE_CLOB)
text = cursor.var(oracledb.DB_TYPE_CLOB)
cursor.execute(
"""
declare
input blob;
begin
input := :blob;
:mdata := dbms_vector_chain.utl_to_text(input, json(:pref));
:text := dbms_vector_chain.utl_to_text(input);
end;""",
blob=data,
pref=json.dumps(params),
mdata=mdata,
text=text,
)
cursor.close()
if mdata is None:
metadata = {}
else:
doc_data = str(mdata.getvalue())
if doc_data.startswith("<!DOCTYPE html") or doc_data.startswith(
"<HTML>"
):
p = ParseOracleDocMetadata()
p.feed(doc_data)
metadata = p.get_metadata()
doc_id = OracleDocReader.generate_object_id(conn.username + "$" + file_path)
metadata["_oid"] = doc_id
metadata["_file"] = file_path
if text is None:
return Document(page_content="", metadata=metadata)
else:
return Document(page_content=str(text.getvalue()), metadata=metadata)
except Exception as ex:
logger.info(f"An exception occurred :: {ex}")
logger.info(f"Skip processing {file_path}")
cursor.close()
return None
"""OracleDocLoader class"""
class OracleDocLoader(BaseLoader):
"""Read documents using OracleDocLoader
Args:
conn: Oracle Connection,
params: Loader parameters.
"""
def __init__(self, conn: Connection, params: Dict[str, Any], **kwargs: Any):
self.conn = conn
self.params = json.loads(json.dumps(params))
super().__init__(**kwargs)
def load(self) -> List[Document]:
"""Load data into LangChain Document objects..."""
try:
import oracledb
except ImportError as e:
raise ImportError(
"Unable to import oracledb, please install with "
"`pip install -U oracledb`."
) from e
ncols = 0
results: List[Document] = []
metadata: Dict[str, Any] = {}
m_params = {"plaintext": "false"}
try:
# extract the parameters
if self.params is not None:
self.file = self.params.get("file")
self.dir = self.params.get("dir")
self.owner = self.params.get("owner")
self.tablename = self.params.get("tablename")
self.colname = self.params.get("colname")
else:
raise Exception("Missing loader parameters")
oracledb.defaults.fetch_lobs = False
if self.file:
doc = OracleDocReader.read_file(self.conn, self.file, m_params)
if doc is None:
return results
results.append(doc)
if self.dir:
skip_count = 0
for file_name in os.listdir(self.dir):
file_path = os.path.join(self.dir, file_name)
if os.path.isfile(file_path):
doc = OracleDocReader.read_file(self.conn, file_path, m_params)
if doc is None:
skip_count = skip_count + 1
logger.info(f"Total skipped: {skip_count}\n")
else:
results.append(doc)
if self.tablename:
try:
if self.owner is None or self.colname is None:
raise Exception("Missing owner or column name or both.")
cursor = self.conn.cursor()
self.mdata_cols = self.params.get("mdata_cols")
if self.mdata_cols is not None:
if len(self.mdata_cols) > 3:
raise Exception(
"Exceeds the max number of columns "
+ "you can request for metadata."
)
# execute a query to get column data types
sql = (
"select column_name, data_type from all_tab_columns "
+ "where owner = :ownername and "
+ "table_name = :tablename"
)
cursor.execute(
sql,
ownername=self.owner.upper(),
tablename=self.tablename.upper(),
)
# cursor.execute(sql)
rows = cursor.fetchall()
for row in rows:
if row[0] in self.mdata_cols:
if row[1] not in [
"NUMBER",
"BINARY_DOUBLE",
"BINARY_FLOAT",
"LONG",
"DATE",
"TIMESTAMP",
"VARCHAR2",
]:
raise Exception(
"The datatype for the column requested "
+ "for metadata is not supported."
)
self.mdata_cols_sql = ", rowid"
if self.mdata_cols is not None:
for col in self.mdata_cols:
self.mdata_cols_sql = self.mdata_cols_sql + ", " + col
# [TODO] use bind variables
sql = (
"select dbms_vector_chain.utl_to_text(t."
+ self.colname
+ ", json('"
+ json.dumps(m_params)
+ "')) mdata, dbms_vector_chain.utl_to_text(t."
+ self.colname
+ ") text"
+ self.mdata_cols_sql
+ " from "
+ self.owner
+ "."
+ self.tablename
+ " t"
)
cursor.execute(sql)
for row in cursor:
metadata = {}
if row is None:
doc_id = OracleDocReader.generate_object_id(
self.conn.username
+ "$"
+ self.owner
+ "$"
+ self.tablename
+ "$"
+ self.colname
)
metadata["_oid"] = doc_id
results.append(Document(page_content="", metadata=metadata))
else:
if row[0] is not None:
data = str(row[0])
if data.startswith("<!DOCTYPE html") or data.startswith(
"<HTML>"
):
p = ParseOracleDocMetadata()
p.feed(data)
metadata = p.get_metadata()
doc_id = OracleDocReader.generate_object_id(
self.conn.username
+ "$"
+ self.owner
+ "$"
+ self.tablename
+ "$"
+ self.colname
+ "$"
+ str(row[2])
)
metadata["_oid"] = doc_id
metadata["_rowid"] = row[2]
# process projected metadata cols
if self.mdata_cols is not None:
ncols = len(self.mdata_cols)
for i in range(0, ncols):
metadata[self.mdata_cols[i]] = row[i + 2]
if row[1] is None:
results.append(
Document(page_content="", metadata=metadata)
)
else:
results.append(
Document(
page_content=str(row[1]), metadata=metadata
)
)
except Exception as ex:
logger.info(f"An exception occurred :: {ex}")
traceback.print_exc()
cursor.close()
raise
return results
except Exception as ex:
logger.info(f"An exception occurred :: {ex}")
traceback.print_exc()
raise
class OracleTextSplitter(TextSplitter):
"""Splitting text using Oracle chunker."""
def __init__(self, conn: Connection, params: Dict[str, Any], **kwargs: Any) -> None:
"""Initialize."""
self.conn = conn
self.params = params
super().__init__(**kwargs)
try:
import json
try:
import oracledb
except ImportError as e:
raise ImportError(
"Unable to import oracledb, please install with "
"`pip install -U oracledb`."
) from e
self._oracledb = oracledb
self._json = json
except ImportError:
raise ImportError(
"oracledb or json or both are not installed. "
+ "Please install them. "
+ "Recommendations: `pip install oracledb`. "
)
def split_text(self, text: str) -> List[str]:
"""Split incoming text and return chunks."""
try:
import oracledb
except ImportError as e:
raise ImportError(
"Unable to import oracledb, please install with "
"`pip install -U oracledb`."
) from e
splits = []
try:
# returns strings or bytes instead of a locator
self._oracledb.defaults.fetch_lobs = False
cursor = self.conn.cursor()
cursor.setinputsizes(content=oracledb.CLOB)
cursor.execute(
"select t.column_value from "
+ "dbms_vector_chain.utl_to_chunks(:content, json(:params)) t",
content=text,
params=self._json.dumps(self.params),
)
while True:
row = cursor.fetchone()
if row is None:
break
d = self._json.loads(row[0])
splits.append(d["chunk_data"])
return splits
except Exception as ex:
logger.info(f"An exception occurred :: {ex}")
traceback.print_exc()
raise

@ -169,6 +169,9 @@ if TYPE_CHECKING:
from langchain_community.embeddings.optimum_intel import (
QuantizedBiEncoderEmbeddings,
)
from langchain_community.embeddings.oracleai import (
OracleEmbeddings, # noqa: F401
)
from langchain_community.embeddings.premai import (
PremAIEmbeddings,
)
@ -267,6 +270,7 @@ __all__ = [
"OpenAIEmbeddings",
"OpenVINOBgeEmbeddings",
"OpenVINOEmbeddings",
"OracleEmbeddings",
"PremAIEmbeddings",
"QianfanEmbeddingsEndpoint",
"QuantizedBgeEmbeddings",
@ -344,6 +348,7 @@ _module_lookup = {
"QianfanEmbeddingsEndpoint": "langchain_community.embeddings.baidu_qianfan_endpoint", # noqa: E501
"QuantizedBgeEmbeddings": "langchain_community.embeddings.itrex",
"QuantizedBiEncoderEmbeddings": "langchain_community.embeddings.optimum_intel",
"OracleEmbeddings": "langchain_community.embeddings.oracleai",
"SagemakerEndpointEmbeddings": "langchain_community.embeddings.sagemaker_endpoint",
"SelfHostedEmbeddings": "langchain_community.embeddings.self_hosted",
"SelfHostedHuggingFaceEmbeddings": "langchain_community.embeddings.self_hosted_hugging_face", # noqa: E501

@ -0,0 +1,182 @@
# Authors:
# Harichandan Roy (hroy)
# David Jiang (ddjiang)
#
# -----------------------------------------------------------------------------
# oracleai.py
# -----------------------------------------------------------------------------
from __future__ import annotations
import json
import logging
import traceback
from typing import TYPE_CHECKING, Any, Dict, List, Optional
from langchain_core.embeddings import Embeddings
from langchain_core.pydantic_v1 import BaseModel, Extra
if TYPE_CHECKING:
from oracledb import Connection
logger = logging.getLogger(__name__)
"""OracleEmbeddings class"""
class OracleEmbeddings(BaseModel, Embeddings):
"""Get Embeddings"""
"""Oracle Connection"""
conn: Any
"""Embedding Parameters"""
params: Dict[str, Any]
"""Proxy"""
proxy: Optional[str] = None
def __init__(self, **kwargs: Any):
super().__init__(**kwargs)
class Config:
"""Configuration for this pydantic object."""
extra = Extra.forbid
"""
1 - user needs to have create procedure,
create mining model, create any directory privilege.
2 - grant create procedure, create mining model,
create any directory to <user>;
"""
@staticmethod
def load_onnx_model(
conn: Connection, dir: str, onnx_file: str, model_name: str
) -> None:
"""Load an ONNX model to Oracle Database.
Args:
conn: Oracle Connection,
dir: Oracle Directory,
onnx_file: ONNX file name,
model_name: Name of the model.
"""
try:
if conn is None or dir is None or onnx_file is None or model_name is None:
raise Exception("Invalid input")
cursor = conn.cursor()
cursor.execute(
"""
begin
dbms_data_mining.drop_model(model_name => :model, force => true);
SYS.DBMS_VECTOR.load_onnx_model(:path, :filename, :model,
json('{"function" : "embedding",
"embeddingOutput" : "embedding",
"input": {"input": ["DATA"]}}'));
end;""",
path=dir,
filename=onnx_file,
model=model_name,
)
cursor.close()
except Exception as ex:
logger.info(f"An exception occurred :: {ex}")
traceback.print_exc()
cursor.close()
raise
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Compute doc embeddings using an OracleEmbeddings.
Args:
texts: The list of texts to embed.
Returns:
List of embeddings, one for each input text.
"""
try:
import oracledb
except ImportError as e:
raise ImportError(
"Unable to import oracledb, please install with "
"`pip install -U oracledb`."
) from e
if texts is None:
return None
embeddings: List[List[float]] = []
try:
# returns strings or bytes instead of a locator
oracledb.defaults.fetch_lobs = False
cursor = self.conn.cursor()
if self.proxy:
cursor.execute(
"begin utl_http.set_proxy(:proxy); end;", proxy=self.proxy
)
for text in texts:
cursor.execute(
"select t.* "
+ "from dbms_vector_chain.utl_to_embeddings(:content, "
+ "json(:params)) t",
content=text,
params=json.dumps(self.params),
)
for row in cursor:
if row is None:
embeddings.append([])
else:
rdata = json.loads(row[0])
# dereference string as array
vec = json.loads(rdata["embed_vector"])
embeddings.append(vec)
cursor.close()
return embeddings
except Exception as ex:
logger.info(f"An exception occurred :: {ex}")
traceback.print_exc()
cursor.close()
raise
def embed_query(self, text: str) -> List[float]:
"""Compute query embedding using an OracleEmbeddings.
Args:
text: The text to embed.
Returns:
Embedding for the text.
"""
return self.embed_documents([text])[0]
# uncomment the following code block to run the test
"""
# A sample unit test.
''' get the Oracle connection '''
conn = oracledb.connect(
user="",
password="",
dsn="")
print("Oracle connection is established...")
''' params '''
embedder_params = {"provider":"database", "model":"demo_model"}
proxy = ""
''' instance '''
embedder = OracleEmbeddings(conn=conn, params=embedder_params, proxy=proxy)
embed = embedder.embed_query("Hello World!")
print(f"Embedding generated by OracleEmbeddings: {embed}")
conn.close()
print("Connection is closed.")
"""

@ -99,6 +99,9 @@ if TYPE_CHECKING:
from langchain_community.utilities.openweathermap import (
OpenWeatherMapAPIWrapper,
)
from langchain_community.utilities.oracleai import (
OracleSummary, # noqa: F401
)
from langchain_community.utilities.outline import (
OutlineAPIWrapper,
)
@ -199,6 +202,7 @@ __all__ = [
"NasaAPIWrapper",
"NutritionAIAPI",
"OpenWeatherMapAPIWrapper",
"OracleSummary",
"OutlineAPIWrapper",
"Portkey",
"PowerBIDataset",
@ -260,6 +264,7 @@ _module_lookup = {
"NasaAPIWrapper": "langchain_community.utilities.nasa",
"NutritionAIAPI": "langchain_community.utilities.passio_nutrition_ai",
"OpenWeatherMapAPIWrapper": "langchain_community.utilities.openweathermap",
"OracleSummary": "langchain_community.utilities.oracleai",
"OutlineAPIWrapper": "langchain_community.utilities.outline",
"Portkey": "langchain_community.utilities.portkey",
"PowerBIDataset": "langchain_community.utilities.powerbi",

@ -0,0 +1,201 @@
# Authors:
# Harichandan Roy (hroy)
# David Jiang (ddjiang)
#
# -----------------------------------------------------------------------------
# oracleai.py
# -----------------------------------------------------------------------------
from __future__ import annotations
import json
import logging
import traceback
from typing import TYPE_CHECKING, Any, Dict, List, Optional
from langchain_core.documents import Document
if TYPE_CHECKING:
from oracledb import Connection
logger = logging.getLogger(__name__)
"""OracleSummary class"""
class OracleSummary:
"""Get Summary
Args:
conn: Oracle Connection,
params: Summary parameters,
proxy: Proxy
"""
def __init__(
self, conn: Connection, params: Dict[str, Any], proxy: Optional[str] = None
):
self.conn = conn
self.proxy = proxy
self.summary_params = params
def get_summary(self, docs: Any) -> List[str]:
"""Get the summary of the input docs.
Args:
docs: The documents to generate summary for.
Allowed input types: str, Document, List[str], List[Document]
Returns:
List of summary text, one for each input doc.
"""
try:
import oracledb
except ImportError as e:
raise ImportError(
"Unable to import oracledb, please install with "
"`pip install -U oracledb`."
) from e
if docs is None:
return []
results: List[str] = []
try:
oracledb.defaults.fetch_lobs = False
cursor = self.conn.cursor()
if self.proxy:
cursor.execute(
"begin utl_http.set_proxy(:proxy); end;", proxy=self.proxy
)
if isinstance(docs, str):
results = []
summary = cursor.var(oracledb.DB_TYPE_CLOB)
cursor.execute(
"""
declare
input clob;
begin
input := :data;
:summ := dbms_vector_chain.utl_to_summary(input, json(:params));
end;""",
data=docs,
params=json.dumps(self.summary_params),
summ=summary,
)
if summary is None:
results.append("")
else:
results.append(str(summary.getvalue()))
elif isinstance(docs, Document):
results = []
summary = cursor.var(oracledb.DB_TYPE_CLOB)
cursor.execute(
"""
declare
input clob;
begin
input := :data;
:summ := dbms_vector_chain.utl_to_summary(input, json(:params));
end;""",
data=docs.page_content,
params=json.dumps(self.summary_params),
summ=summary,
)
if summary is None:
results.append("")
else:
results.append(str(summary.getvalue()))
elif isinstance(docs, List):
results = []
for doc in docs:
summary = cursor.var(oracledb.DB_TYPE_CLOB)
if isinstance(doc, str):
cursor.execute(
"""
declare
input clob;
begin
input := :data;
:summ := dbms_vector_chain.utl_to_summary(input,
json(:params));
end;""",
data=doc,
params=json.dumps(self.summary_params),
summ=summary,
)
elif isinstance(doc, Document):
cursor.execute(
"""
declare
input clob;
begin
input := :data;
:summ := dbms_vector_chain.utl_to_summary(input,
json(:params));
end;""",
data=doc.page_content,
params=json.dumps(self.summary_params),
summ=summary,
)
else:
raise Exception("Invalid input type")
if summary is None:
results.append("")
else:
results.append(str(summary.getvalue()))
else:
raise Exception("Invalid input type")
cursor.close()
return results
except Exception as ex:
logger.info(f"An exception occurred :: {ex}")
traceback.print_exc()
cursor.close()
raise
# uncomment the following code block to run the test
"""
# A sample unit test.
''' get the Oracle connection '''
conn = oracledb.connect(
user="",
password="",
dsn="")
print("Oracle connection is established...")
''' params '''
summary_params = {"provider": "database","glevel": "S",
"numParagraphs": 1,"language": "english"}
proxy = ""
''' instance '''
summ = OracleSummary(conn=conn, params=summary_params, proxy=proxy)
summary = summ.get_summary("In the heart of the forest, " +
"a lone fox ventured out at dusk, seeking a lost treasure. " +
"With each step, memories flooded back, guiding its path. " +
"As the moon rose high, illuminating the night, the fox unearthed " +
"not gold, but a forgotten friendship, worth more than any riches.")
print(f"Summary generated by OracleSummary: {summary}")
conn.close()
print("Connection is closed.")
"""

@ -178,6 +178,9 @@ if TYPE_CHECKING:
from langchain_community.vectorstores.opensearch_vector_search import (
OpenSearchVectorSearch,
)
from langchain_community.vectorstores.oraclevs import (
OracleVS, # noqa: F401
)
from langchain_community.vectorstores.pathway import (
PathwayVectorClient,
)
@ -343,6 +346,7 @@ __all__ = [
"MyScaleSettings",
"Neo4jVector",
"NeuralDBVectorStore",
"OracleVS",
"OpenSearchVectorSearch",
"PGEmbedding",
"PGVector",
@ -439,6 +443,7 @@ _module_lookup = {
"Neo4jVector": "langchain_community.vectorstores.neo4j_vector",
"NeuralDBVectorStore": "langchain_community.vectorstores.thirdai_neuraldb",
"OpenSearchVectorSearch": "langchain_community.vectorstores.opensearch_vector_search", # noqa: E501
"OracleVS": "langchain_community.vectorstores.oraclevs",
"PathwayVectorClient": "langchain_community.vectorstores.pathway",
"PGEmbedding": "langchain_community.vectorstores.pgembedding",
"PGVector": "langchain_community.vectorstores.pgvector",

@ -0,0 +1,930 @@
from __future__ import annotations
import array
import functools
import hashlib
import json
import logging
import os
import uuid
from typing import (
TYPE_CHECKING,
Any,
Callable,
Dict,
Iterable,
List,
Optional,
Tuple,
Type,
TypeVar,
Union,
cast,
)
if TYPE_CHECKING:
from oracledb import Connection
import numpy as np
from langchain_core.documents import Document
from langchain_core.embeddings import Embeddings
from langchain_core.vectorstores import VectorStore
from langchain_community.vectorstores.utils import (
DistanceStrategy,
maximal_marginal_relevance,
)
logger = logging.getLogger(__name__)
log_level = os.getenv("LOG_LEVEL", "ERROR").upper()
logging.basicConfig(
level=getattr(logging, log_level),
format="%(asctime)s - %(levelname)s - %(message)s",
)
# Define a type variable that can be any kind of function
T = TypeVar("T", bound=Callable[..., Any])
def _handle_exceptions(func: T) -> T:
@functools.wraps(func)
def wrapper(*args: Any, **kwargs: Any) -> Any:
try:
return func(*args, **kwargs)
except RuntimeError as db_err:
# Handle a known type of error (e.g., DB-related) specifically
logger.exception("DB-related error occurred.")
raise RuntimeError(
"Failed due to a DB issue: {}".format(db_err)
) from db_err
except ValueError as val_err:
# Handle another known type of error specifically
logger.exception("Validation error.")
raise ValueError("Validation failed: {}".format(val_err)) from val_err
except Exception as e:
# Generic handler for all other exceptions
logger.exception("An unexpected error occurred: {}".format(e))
raise RuntimeError("Unexpected error: {}".format(e)) from e
return cast(T, wrapper)
def _table_exists(client: Connection, table_name: str) -> bool:
try:
import oracledb
except ImportError as e:
raise ImportError(
"Unable to import oracledb, please install with "
"`pip install -U oracledb`."
) from e
try:
with client.cursor() as cursor:
cursor.execute(f"SELECT COUNT(*) FROM {table_name}")
return True
except oracledb.DatabaseError as ex:
err_obj = ex.args
if err_obj[0].code == 942:
return False
raise
@_handle_exceptions
def _index_exists(client: Connection, index_name: str) -> bool:
# Check if the index exists
query = """
SELECT index_name
FROM all_indexes
WHERE upper(index_name) = upper(:idx_name)
"""
with client.cursor() as cursor:
# Execute the query
cursor.execute(query, idx_name=index_name.upper())
result = cursor.fetchone()
# Check if the index exists
return result is not None
def _get_distance_function(distance_strategy: DistanceStrategy) -> str:
# Dictionary to map distance strategies to their corresponding function
# names
distance_strategy2function = {
DistanceStrategy.EUCLIDEAN_DISTANCE: "EUCLIDEAN",
DistanceStrategy.DOT_PRODUCT: "DOT",
DistanceStrategy.COSINE: "COSINE",
}
# Attempt to return the corresponding distance function
if distance_strategy in distance_strategy2function:
return distance_strategy2function[distance_strategy]
# If it's an unsupported distance strategy, raise an error
raise ValueError(f"Unsupported distance strategy: {distance_strategy}")
def _get_index_name(base_name: str) -> str:
unique_id = str(uuid.uuid4()).replace("-", "")
return f"{base_name}_{unique_id}"
@_handle_exceptions
def _create_table(client: Connection, table_name: str, embedding_dim: int) -> None:
cols_dict = {
"id": "RAW(16) DEFAULT SYS_GUID() PRIMARY KEY",
"text": "CLOB",
"metadata": "CLOB",
"embedding": f"vector({embedding_dim}, FLOAT32)",
}
if not _table_exists(client, table_name):
with client.cursor() as cursor:
ddl_body = ", ".join(
f"{col_name} {col_type}" for col_name, col_type in cols_dict.items()
)
ddl = f"CREATE TABLE {table_name} ({ddl_body})"
cursor.execute(ddl)
logger.info("Table created successfully...")
else:
logger.info("Table already exists...")
@_handle_exceptions
def create_index(
client: Connection,
vector_store: OracleVS,
params: Optional[dict[str, Any]] = None,
) -> None:
if params:
if params["idx_type"] == "HNSW":
_create_hnsw_index(
client, vector_store.table_name, vector_store.distance_strategy, params
)
elif params["idx_type"] == "IVF":
_create_ivf_index(
client, vector_store.table_name, vector_store.distance_strategy, params
)
else:
_create_hnsw_index(
client, vector_store.table_name, vector_store.distance_strategy, params
)
else:
_create_hnsw_index(
client, vector_store.table_name, vector_store.distance_strategy, params
)
return
@_handle_exceptions
def _create_hnsw_index(
client: Connection,
table_name: str,
distance_strategy: DistanceStrategy,
params: Optional[dict[str, Any]] = None,
) -> None:
defaults = {
"idx_name": "HNSW",
"idx_type": "HNSW",
"neighbors": 32,
"efConstruction": 200,
"accuracy": 90,
"parallel": 8,
}
if params:
config = params.copy()
# Ensure compulsory parts are included
for compulsory_key in ["idx_name", "parallel"]:
if compulsory_key not in config:
if compulsory_key == "idx_name":
config[compulsory_key] = _get_index_name(
str(defaults[compulsory_key])
)
else:
config[compulsory_key] = defaults[compulsory_key]
# Validate keys in config against defaults
for key in config:
if key not in defaults:
raise ValueError(f"Invalid parameter: {key}")
else:
config = defaults
# Base SQL statement
idx_name = config["idx_name"]
base_sql = (
f"create vector index {idx_name} on {table_name}(embedding) "
f"ORGANIZATION INMEMORY NEIGHBOR GRAPH"
)
# Optional parts depending on parameters
accuracy_part = " WITH TARGET ACCURACY {accuracy}" if ("accuracy" in config) else ""
distance_part = f" DISTANCE {_get_distance_function(distance_strategy)}"
parameters_part = ""
if "neighbors" in config and "efConstruction" in config:
parameters_part = (
" parameters (type {idx_type}, neighbors {"
"neighbors}, efConstruction {efConstruction})"
)
elif "neighbors" in config and "efConstruction" not in config:
config["efConstruction"] = defaults["efConstruction"]
parameters_part = (
" parameters (type {idx_type}, neighbors {"
"neighbors}, efConstruction {efConstruction})"
)
elif "neighbors" not in config and "efConstruction" in config:
config["neighbors"] = defaults["neighbors"]
parameters_part = (
" parameters (type {idx_type}, neighbors {"
"neighbors}, efConstruction {efConstruction})"
)
# Always included part for parallel
parallel_part = " parallel {parallel}"
# Combine all parts
ddl_assembly = (
base_sql + accuracy_part + distance_part + parameters_part + parallel_part
)
# Format the SQL with values from the params dictionary
ddl = ddl_assembly.format(**config)
# Check if the index exists
if not _index_exists(client, config["idx_name"]):
with client.cursor() as cursor:
cursor.execute(ddl)
logger.info("Index created successfully...")
else:
logger.info("Index already exists...")
@_handle_exceptions
def _create_ivf_index(
client: Connection,
table_name: str,
distance_strategy: DistanceStrategy,
params: Optional[dict[str, Any]] = None,
) -> None:
# Default configuration
defaults = {
"idx_name": "IVF",
"idx_type": "IVF",
"neighbor_part": 32,
"accuracy": 90,
"parallel": 8,
}
if params:
config = params.copy()
# Ensure compulsory parts are included
for compulsory_key in ["idx_name", "parallel"]:
if compulsory_key not in config:
if compulsory_key == "idx_name":
config[compulsory_key] = _get_index_name(
str(defaults[compulsory_key])
)
else:
config[compulsory_key] = defaults[compulsory_key]
# Validate keys in config against defaults
for key in config:
if key not in defaults:
raise ValueError(f"Invalid parameter: {key}")
else:
config = defaults
# Base SQL statement
idx_name = config["idx_name"]
base_sql = (
f"CREATE VECTOR INDEX {idx_name} ON {table_name}(embedding) "
f"ORGANIZATION NEIGHBOR PARTITIONS"
)
# Optional parts depending on parameters
accuracy_part = " WITH TARGET ACCURACY {accuracy}" if ("accuracy" in config) else ""
distance_part = f" DISTANCE {_get_distance_function(distance_strategy)}"
parameters_part = ""
if "idx_type" in config and "neighbor_part" in config:
parameters_part = (
f" PARAMETERS (type {config['idx_type']}, neighbor"
f" partitions {config['neighbor_part']})"
)
# Always included part for parallel
parallel_part = f" PARALLEL {config['parallel']}"
# Combine all parts
ddl_assembly = (
base_sql + accuracy_part + distance_part + parameters_part + parallel_part
)
# Format the SQL with values from the params dictionary
ddl = ddl_assembly.format(**config)
# Check if the index exists
if not _index_exists(client, config["idx_name"]):
with client.cursor() as cursor:
cursor.execute(ddl)
logger.info("Index created successfully...")
else:
logger.info("Index already exists...")
@_handle_exceptions
def drop_table_purge(client: Connection, table_name: str) -> None:
if _table_exists(client, table_name):
cursor = client.cursor()
with cursor:
ddl = f"DROP TABLE {table_name} PURGE"
cursor.execute(ddl)
logger.info("Table dropped successfully...")
else:
logger.info("Table not found...")
return
@_handle_exceptions
def drop_index_if_exists(client: Connection, index_name: str) -> None:
if _index_exists(client, index_name):
drop_query = f"DROP INDEX {index_name}"
with client.cursor() as cursor:
cursor.execute(drop_query)
logger.info(f"Index {index_name} has been dropped.")
else:
logger.exception(f"Index {index_name} does not exist.")
return
class OracleVS(VectorStore):
"""`OracleVS` vector store.
To use, you should have both:
- the ``oracledb`` python package installed
- a connection string associated with a OracleDBCluster having deployed an
Search index
Example:
.. code-block:: python
from langchain.vectorstores import OracleVS
from langchain.embeddings.openai import OpenAIEmbeddings
import oracledb
with oracledb.connect(user = user, passwd = pwd, dsn = dsn) as
connection:
print ("Database version:", connection.version)
embeddings = OpenAIEmbeddings()
query = ""
vectors = OracleVS(connection, table_name, embeddings, query)
"""
def __init__(
self,
client: Connection,
embedding_function: Union[
Callable[[str], List[float]],
Embeddings,
],
table_name: str,
distance_strategy: DistanceStrategy = DistanceStrategy.EUCLIDEAN_DISTANCE,
query: Optional[str] = "What is a Oracle database",
params: Optional[Dict[str, Any]] = None,
):
try:
import oracledb
except ImportError as e:
raise ImportError(
"Unable to import oracledb, please install with "
"`pip install -U oracledb`."
) from e
try:
"""Initialize with oracledb client."""
self.client = client
"""Initialize with necessary components."""
if not isinstance(embedding_function, Embeddings):
logger.warning(
"`embedding_function` is expected to be an Embeddings "
"object, support "
"for passing in a function will soon be removed."
)
self.embedding_function = embedding_function
self.query = query
embedding_dim = self.get_embedding_dimension()
self.table_name = table_name
self.distance_strategy = distance_strategy
self.params = params
_create_table(client, table_name, embedding_dim)
except oracledb.DatabaseError as db_err:
logger.exception(f"Database error occurred while create table: {db_err}")
raise RuntimeError(
"Failed to create table due to a database error."
) from db_err
except ValueError as val_err:
logger.exception(f"Validation error: {val_err}")
raise RuntimeError(
"Failed to create table due to a validation error."
) from val_err
except Exception as ex:
logger.exception("An unexpected error occurred while creating the index.")
raise RuntimeError(
"Failed to create table due to an unexpected error."
) from ex
@property
def embeddings(self) -> Optional[Embeddings]:
"""
A property that returns an Embeddings instance embedding_function
is an instance of Embeddings, otherwise returns None.
Returns:
Optional[Embeddings]: The embedding function if it's an instance of
Embeddings, otherwise None.
"""
return (
self.embedding_function
if isinstance(self.embedding_function, Embeddings)
else None
)
def get_embedding_dimension(self) -> int:
# Embed the single document by wrapping it in a list
embedded_document = self._embed_documents(
[self.query if self.query is not None else ""]
)
# Get the first (and only) embedding's dimension
return len(embedded_document[0])
def _embed_documents(self, texts: List[str]) -> List[List[float]]:
if isinstance(self.embedding_function, Embeddings):
return self.embedding_function.embed_documents(texts)
elif callable(self.embedding_function):
return [self.embedding_function(text) for text in texts]
else:
raise TypeError(
"The embedding_function is neither Embeddings nor callable."
)
def _embed_query(self, text: str) -> List[float]:
if isinstance(self.embedding_function, Embeddings):
return self.embedding_function.embed_query(text)
else:
return self.embedding_function(text)
@_handle_exceptions
def add_texts(
self,
texts: Iterable[str],
metadatas: Optional[List[Dict[Any, Any]]] = None,
ids: Optional[List[str]] = None,
**kwargs: Any,
) -> List[str]:
"""Add more texts to the vectorstore index.
Args:
texts: Iterable of strings to add to the vectorstore.
metadatas: Optional list of metadatas associated with the texts.
ids: Optional list of ids for the texts that are being added to
the vector store.
kwargs: vectorstore specific parameters
"""
texts = list(texts)
if ids:
# If ids are provided, hash them to maintain consistency
processed_ids = [
hashlib.sha256(_id.encode()).hexdigest()[:16].upper() for _id in ids
]
elif metadatas and all("id" in metadata for metadata in metadatas):
# If no ids are provided but metadatas with ids are, generate
# ids from metadatas
processed_ids = [
hashlib.sha256(metadata["id"].encode()).hexdigest()[:16].upper()
for metadata in metadatas
]
else:
# Generate new ids if none are provided
generated_ids = [
str(uuid.uuid4()) for _ in texts
] # uuid4 is more standard for random UUIDs
processed_ids = [
hashlib.sha256(_id.encode()).hexdigest()[:16].upper()
for _id in generated_ids
]
embeddings = self._embed_documents(texts)
if not metadatas:
metadatas = [{} for _ in texts]
docs = [
(id_, text, json.dumps(metadata), array.array("f", embedding))
for id_, text, metadata, embedding in zip(
processed_ids, texts, metadatas, embeddings
)
]
with self.client.cursor() as cursor:
cursor.executemany(
f"INSERT INTO {self.table_name} (id, text, metadata, "
f"embedding) VALUES (:1, :2, :3, :4)",
docs,
)
self.client.commit()
return processed_ids
def similarity_search(
self,
query: str,
k: int = 4,
filter: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> List[Document]:
"""Return docs most similar to query."""
if isinstance(self.embedding_function, Embeddings):
embedding = self.embedding_function.embed_query(query)
documents = self.similarity_search_by_vector(
embedding=embedding, k=k, filter=filter, **kwargs
)
return documents
def similarity_search_by_vector(
self,
embedding: List[float],
k: int = 4,
filter: Optional[dict[str, Any]] = None,
**kwargs: Any,
) -> List[Document]:
docs_and_scores = self.similarity_search_by_vector_with_relevance_scores(
embedding=embedding, k=k, filter=filter, **kwargs
)
return [doc for doc, _ in docs_and_scores]
def similarity_search_with_score(
self,
query: str,
k: int = 4,
filter: Optional[dict[str, Any]] = None,
**kwargs: Any,
) -> List[Tuple[Document, float]]:
"""Return docs most similar to query."""
if isinstance(self.embedding_function, Embeddings):
embedding = self.embedding_function.embed_query(query)
docs_and_scores = self.similarity_search_by_vector_with_relevance_scores(
embedding=embedding, k=k, filter=filter, **kwargs
)
return docs_and_scores
@_handle_exceptions
def _get_clob_value(self, result: Any) -> str:
try:
import oracledb
except ImportError as e:
raise ImportError(
"Unable to import oracledb, please install with "
"`pip install -U oracledb`."
) from e
clob_value = ""
if result:
if isinstance(result, oracledb.LOB):
raw_data = result.read()
if isinstance(raw_data, bytes):
clob_value = raw_data.decode(
"utf-8"
) # Specify the correct encoding
else:
clob_value = raw_data
elif isinstance(result, str):
clob_value = result
else:
raise Exception("Unexpected type:", type(result))
return clob_value
@_handle_exceptions
def similarity_search_by_vector_with_relevance_scores(
self,
embedding: List[float],
k: int = 4,
filter: Optional[dict[str, Any]] = None,
**kwargs: Any,
) -> List[Tuple[Document, float]]:
docs_and_scores = []
embedding_arr = array.array("f", embedding)
query = f"""
SELECT id,
text,
metadata,
vector_distance(embedding, :embedding,
{_get_distance_function(self.distance_strategy)}) as distance
FROM {self.table_name}
ORDER BY distance
FETCH APPROX FIRST :k ROWS ONLY
"""
# Execute the query
with self.client.cursor() as cursor:
cursor.execute(query, embedding=embedding_arr, k=k)
results = cursor.fetchall()
# Filter results if filter is provided
for result in results:
metadata = json.loads(
self._get_clob_value(result[2]) if result[2] is not None else "{}"
)
# Apply filtering based on the 'filter' dictionary
if filter:
if all(metadata.get(key) in value for key, value in filter.items()):
doc = Document(
page_content=(
self._get_clob_value(result[1])
if result[1] is not None
else ""
),
metadata=metadata,
)
distance = result[3]
docs_and_scores.append((doc, distance))
else:
doc = Document(
page_content=(
self._get_clob_value(result[1])
if result[1] is not None
else ""
),
metadata=metadata,
)
distance = result[3]
docs_and_scores.append((doc, distance))
return docs_and_scores
@_handle_exceptions
def similarity_search_by_vector_returning_embeddings(
self,
embedding: List[float],
k: int,
filter: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> List[Tuple[Document, float, np.ndarray[np.float32, Any]]]:
documents = []
embedding_arr = array.array("f", embedding)
query = f"""
SELECT id,
text,
metadata,
vector_distance(embedding, :embedding, {_get_distance_function(
self.distance_strategy)}) as distance,
embedding
FROM {self.table_name}
ORDER BY distance
FETCH APPROX FIRST :k ROWS ONLY
"""
# Execute the query
with self.client.cursor() as cursor:
cursor.execute(query, embedding=embedding_arr, k=k)
results = cursor.fetchall()
for result in results:
page_content_str = self._get_clob_value(result[1])
metadata_str = self._get_clob_value(result[2])
metadata = json.loads(metadata_str)
# Apply filter if provided and matches; otherwise, add all
# documents
if not filter or all(
metadata.get(key) in value for key, value in filter.items()
):
document = Document(
page_content=page_content_str, metadata=metadata
)
distance = result[3]
# Assuming result[4] is already in the correct format;
# adjust if necessary
current_embedding = (
np.array(result[4], dtype=np.float32)
if result[4]
else np.empty(0, dtype=np.float32)
)
documents.append((document, distance, current_embedding))
return documents # type: ignore
@_handle_exceptions
def max_marginal_relevance_search_with_score_by_vector(
self,
embedding: List[float],
*,
k: int = 4,
fetch_k: int = 20,
lambda_mult: float = 0.5,
filter: Optional[Dict[str, Any]] = None,
) -> List[Tuple[Document, float]]:
"""Return docs and their similarity scores selected using the
maximal marginal
relevance.
Maximal marginal relevance optimizes for similarity to query AND
diversity
among selected documents.
Args:
self: An instance of the class
embedding: Embedding to look up documents similar to.
k: Number of Documents to return. Defaults to 4.
fetch_k: Number of Documents to fetch before filtering to
pass to MMR algorithm.
filter: (Optional[Dict[str, str]]): Filter by metadata. Defaults
to None.
lambda_mult: Number between 0 and 1 that determines the degree
of diversity among the results with 0 corresponding
to maximum diversity and 1 to minimum diversity.
Defaults to 0.5.
Returns:
List of Documents and similarity scores selected by maximal
marginal
relevance and score for each.
"""
# Fetch documents and their scores
docs_scores_embeddings = self.similarity_search_by_vector_returning_embeddings(
embedding, fetch_k, filter=filter
)
# Assuming documents_with_scores is a list of tuples (Document, score)
# If you need to split documents and scores for processing (e.g.,
# for MMR calculation)
documents, scores, embeddings = (
zip(*docs_scores_embeddings) if docs_scores_embeddings else ([], [], [])
)
# Assume maximal_marginal_relevance method accepts embeddings and
# scores, and returns indices of selected docs
mmr_selected_indices = maximal_marginal_relevance(
np.array(embedding, dtype=np.float32),
list(embeddings),
k=k,
lambda_mult=lambda_mult,
)
# Filter documents based on MMR-selected indices and map scores
mmr_selected_documents_with_scores = [
(documents[i], scores[i]) for i in mmr_selected_indices
]
return mmr_selected_documents_with_scores
@_handle_exceptions
def max_marginal_relevance_search_by_vector(
self,
embedding: List[float],
k: int = 4,
fetch_k: int = 20,
lambda_mult: float = 0.5,
filter: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> List[Document]:
"""Return docs selected using the maximal marginal relevance.
Maximal marginal relevance optimizes for similarity to query AND
diversity
among selected documents.
Args:
self: An instance of the class
embedding: Embedding to look up documents similar to.
k: Number of Documents to return. Defaults to 4.
fetch_k: Number of Documents to fetch to pass to MMR algorithm.
lambda_mult: Number between 0 and 1 that determines the degree
of diversity among the results with 0 corresponding
to maximum diversity and 1 to minimum diversity.
Defaults to 0.5.
filter: Optional[Dict[str, Any]]
**kwargs: Any
Returns:
List of Documents selected by maximal marginal relevance.
"""
docs_and_scores = self.max_marginal_relevance_search_with_score_by_vector(
embedding, k=k, fetch_k=fetch_k, lambda_mult=lambda_mult, filter=filter
)
return [doc for doc, _ in docs_and_scores]
@_handle_exceptions
def max_marginal_relevance_search(
self,
query: str,
k: int = 4,
fetch_k: int = 20,
lambda_mult: float = 0.5,
filter: Optional[Dict[str, Any]] = None,
**kwargs: Any,
) -> List[Document]:
"""Return docs selected using the maximal marginal relevance.
Maximal marginal relevance optimizes for similarity to query AND
diversity
among selected documents.
Args:
self: An instance of the class
query: Text to look up documents similar to.
k: Number of Documents to return. Defaults to 4.
fetch_k: Number of Documents to fetch to pass to MMR algorithm.
lambda_mult: Number between 0 and 1 that determines the degree
of diversity among the results with 0 corresponding
to maximum diversity and 1 to minimum diversity.
Defaults to 0.5.
filter: Optional[Dict[str, Any]]
**kwargs
Returns:
List of Documents selected by maximal marginal relevance.
`max_marginal_relevance_search` requires that `query` returns matched
embeddings alongside the match documents.
"""
embedding = self._embed_query(query)
documents = self.max_marginal_relevance_search_by_vector(
embedding,
k=k,
fetch_k=fetch_k,
lambda_mult=lambda_mult,
filter=filter,
**kwargs,
)
return documents
@_handle_exceptions
def delete(self, ids: Optional[List[str]] = None, **kwargs: Any) -> None:
"""Delete by vector IDs.
Args:
self: An instance of the class
ids: List of ids to delete.
**kwargs
"""
if ids is None:
raise ValueError("No ids provided to delete.")
# Compute SHA-256 hashes of the ids and truncate them
hashed_ids = [
hashlib.sha256(_id.encode()).hexdigest()[:16].upper() for _id in ids
]
# Constructing the SQL statement with individual placeholders
placeholders = ", ".join([":id" + str(i + 1) for i in range(len(hashed_ids))])
ddl = f"DELETE FROM {self.table_name} WHERE id IN ({placeholders})"
# Preparing bind variables
bind_vars = {
f"id{i}": hashed_id for i, hashed_id in enumerate(hashed_ids, start=1)
}
with self.client.cursor() as cursor:
cursor.execute(ddl, bind_vars)
self.client.commit()
@classmethod
@_handle_exceptions
def from_texts(
cls: Type[OracleVS],
texts: Iterable[str],
embedding: Embeddings,
metadatas: Optional[List[dict]] = None,
**kwargs: Any,
) -> OracleVS:
"""Return VectorStore initialized from texts and embeddings."""
client = kwargs.get("client")
if client is None:
raise ValueError("client parameter is required...")
params = kwargs.get("params", {})
table_name = str(kwargs.get("table_name", "langchain"))
distance_strategy = cast(
DistanceStrategy, kwargs.get("distance_strategy", None)
)
if not isinstance(distance_strategy, DistanceStrategy):
raise TypeError(
f"Expected DistanceStrategy got " f"{type(distance_strategy).__name__} "
)
query = kwargs.get("query", "What is a Oracle database")
drop_table_purge(client, table_name)
vss = cls(
client=client,
embedding_function=embedding,
table_name=table_name,
distance_strategy=distance_strategy,
query=query,
params=params,
)
vss.add_texts(texts=list(texts), metadatas=metadatas)
return vss

@ -5442,6 +5442,49 @@ text = ["spacy", "wordcloud (>=1.8.1)"]
torch = ["oracle_ads[viz]", "torch", "torchvision"]
viz = ["bokeh (>=3.0.0,<3.2.0)", "folium (>=0.12.1)", "graphviz (<0.17)", "scipy (>=1.5.4)", "seaborn (>=0.11.0)"]
[[package]]
name = "oracledb"
version = "2.2.0"
description = "Python interface to Oracle Database"
optional = true
python-versions = ">=3.7"
files = [
{file = "oracledb-2.2.0-cp310-cp310-macosx_10_9_universal2.whl", hash = "sha256:253a85eef53d97815b4d838e5275d0a99e33ec340eb4b945cd2371e2bcede46b"},
{file = "oracledb-2.2.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:fa5c2982076366f59dade28b554b43a257ad426e55359124bc37f191f51c2d46"},
{file = "oracledb-2.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:19408844bd4af5b4d40f06c3e5b88c6bfce4a749f61ab766f41b22c4070c5c15"},
{file = "oracledb-2.2.0-cp310-cp310-win32.whl", hash = "sha256:c2e2e3f00d7eb7f4dabfa8996dc70db03bd7dbe474d2d1dc381daeff54cfdeff"},
{file = "oracledb-2.2.0-cp310-cp310-win_amd64.whl", hash = "sha256:efed536635b0fec5c1484eda55fad4affa57672b87596ec6273123a3133ba5b6"},
{file = "oracledb-2.2.0-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:c4b7e14b04dc2af4697ca561f9bcac110a67a7be2ccf868d789e92771017feca"},
{file = "oracledb-2.2.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:61bbf9cd64a2f3b65a12550329b2f0caed7d9aa5e892c0ce69d9ea7b3cb3cb8e"},
{file = "oracledb-2.2.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:4e461d1c7ef4d3f03d84595a13754390a62300976782d7c29efc07fcc915e1b3"},
{file = "oracledb-2.2.0-cp311-cp311-win32.whl", hash = "sha256:6c7da69d18cf02e469e15215af9c6f219256972a172c0e544a2ecc2a5cab9aa5"},
{file = "oracledb-2.2.0-cp311-cp311-win_amd64.whl", hash = "sha256:d0245f677e27ee0990eb0213485031dacdc837a89569563f1594b82ccb362255"},
{file = "oracledb-2.2.0-cp312-cp312-macosx_10_9_universal2.whl", hash = "sha256:10d2cd354a15e2b7e191256a0179874068fc64fa6543b2e20c9c1c38f0dd0839"},
{file = "oracledb-2.2.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:fbf07e0e88c9ff1555c9301d95c69e0d48263cf7df63172043fe0a042539e687"},
{file = "oracledb-2.2.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:c6a1365d3e05ca73b638ef939f9a609fed0ae5da75d13b2cfb75601ab8b85fce"},
{file = "oracledb-2.2.0-cp312-cp312-win32.whl", hash = "sha256:3fe57091a1463efac692b352e99f9daeab5ab375bab2060c5caba9a3a7743c15"},
{file = "oracledb-2.2.0-cp312-cp312-win_amd64.whl", hash = "sha256:e5ca9c050e18b2b1005b40d44a2098155445836071253ee5d547c7f285fc7729"},
{file = "oracledb-2.2.0-cp37-cp37m-macosx_10_9_x86_64.whl", hash = "sha256:b5ad105aabc8ff32e3d3a343a92cf84976cf2454b6a6ff02065383fc3863e68d"},
{file = "oracledb-2.2.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:14a7f2572c358604186d857c80f384ad03226e372731770911856541a06bdd34"},
{file = "oracledb-2.2.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:aa1fe78ed0cbf98593c1f3f620f751b725b189f8c845577e39a372f44b2bf384"},
{file = "oracledb-2.2.0-cp37-cp37m-win32.whl", hash = "sha256:bcef115bd147d6f267e3b09cbc3fc04189bff69e94d05c1e266c698668061e8d"},
{file = "oracledb-2.2.0-cp37-cp37m-win_amd64.whl", hash = "sha256:1272bf562bcd6ff5e23b1e1fe8c3363d7a66fe8f48b1e00c4fb081d5436e1df5"},
{file = "oracledb-2.2.0-cp38-cp38-macosx_11_0_universal2.whl", hash = "sha256:e0010aee0ed0a57964ce9f6cb0e2315a4ffce947121e0bb1c618e5091e64bab4"},
{file = "oracledb-2.2.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:437d7c5a36f7e72ca36e1ac3f1a7c087bffa1cd0ba3a84471e54506c8572a5ad"},
{file = "oracledb-2.2.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:581b7067283910a53b1ac1a50c0046058a21bd5c073d529bf695113db6d25f62"},
{file = "oracledb-2.2.0-cp38-cp38-win32.whl", hash = "sha256:97fdc27a15f6441434a7ef563f522c8ceac19c2933f2da1082125670a2e2fc6b"},
{file = "oracledb-2.2.0-cp38-cp38-win_amd64.whl", hash = "sha256:c22a2052997a01e59a4c9c33c9c0593eebcb1d893addeda9cd57003c2e088a85"},
{file = "oracledb-2.2.0-cp39-cp39-macosx_10_9_universal2.whl", hash = "sha256:b924ee3e7d41edb367e5bb4cbb30990ad447fedda9ef0fe29b691d36a8d338c2"},
{file = "oracledb-2.2.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:de3f9fa10b5f5c5dbe80dc7bdea5e5746abd411217e812fae66cc61c68f3f8f6"},
{file = "oracledb-2.2.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ba96a450275bceb5e0928e0dc01b5fb200e81ba04e99499d4930ccba681fd88a"},
{file = "oracledb-2.2.0-cp39-cp39-win32.whl", hash = "sha256:35b6524b57979dbe8463af06648ad9972bce06e014a292ad96fec34c62665a8b"},
{file = "oracledb-2.2.0-cp39-cp39-win_amd64.whl", hash = "sha256:0b4968f39871d501ab16a2fe05b5b4ae954e338e6b9dcefeb9bced998ddd4c4b"},
{file = "oracledb-2.2.0.tar.gz", hash = "sha256:f52c7df38b13243b5ce583457b80748a34682b9bb8370da2497868b71976798b"},
]
[package.dependencies]
cryptography = ">=3.2.1"
[[package]]
name = "orjson"
version = "3.9.15"
@ -10001,9 +10044,9 @@ testing = ["big-O", "jaraco.functools", "jaraco.itertools", "more-itertools", "p
[extras]
cli = ["typer"]
extended-testing = ["aiosqlite", "aleph-alpha-client", "anthropic", "arxiv", "assemblyai", "atlassian-python-api", "azure-ai-documentintelligence", "azure-identity", "azure-search-documents", "beautifulsoup4", "bibtexparser", "cassio", "chardet", "cloudpickle", "cloudpickle", "cohere", "databricks-vectorsearch", "datasets", "dgml-utils", "elasticsearch", "esprima", "faiss-cpu", "feedparser", "fireworks-ai", "friendli-client", "geopandas", "gitpython", "google-cloud-documentai", "gql", "gradientai", "hdbcli", "hologres-vector", "html2text", "httpx", "httpx-sse", "javelin-sdk", "jinja2", "jq", "jsonschema", "lxml", "markdownify", "motor", "msal", "mwparserfromhell", "mwxml", "newspaper3k", "numexpr", "nvidia-riva-client", "oci", "openai", "openapi-pydantic", "oracle-ads", "pandas", "pdfminer-six", "pgvector", "praw", "premai", "psychicapi", "py-trello", "pyjwt", "pymupdf", "pypdf", "pypdfium2", "pyspark", "rank-bm25", "rapidfuzz", "rapidocr-onnxruntime", "rdflib", "requests-toolbelt", "rspace_client", "scikit-learn", "sqlite-vss", "streamlit", "sympy", "telethon", "tidb-vector", "timescale-vector", "tqdm", "tree-sitter", "tree-sitter-languages", "upstash-redis", "vdms", "xata", "xmltodict"]
extended-testing = ["aiosqlite", "aleph-alpha-client", "anthropic", "arxiv", "assemblyai", "atlassian-python-api", "azure-ai-documentintelligence", "azure-identity", "azure-search-documents", "beautifulsoup4", "bibtexparser", "cassio", "chardet", "cloudpickle", "cloudpickle", "cohere", "databricks-vectorsearch", "datasets", "dgml-utils", "elasticsearch", "esprima", "faiss-cpu", "feedparser", "fireworks-ai", "friendli-client", "geopandas", "gitpython", "google-cloud-documentai", "gql", "gradientai", "hdbcli", "hologres-vector", "html2text", "httpx", "httpx-sse", "javelin-sdk", "jinja2", "jq", "jsonschema", "lxml", "markdownify", "motor", "msal", "mwparserfromhell", "mwxml", "newspaper3k", "numexpr", "nvidia-riva-client", "oci", "openai", "openapi-pydantic", "oracle-ads", "oracledb", "pandas", "pdfminer-six", "pgvector", "praw", "premai", "psychicapi", "py-trello", "pyjwt", "pymupdf", "pypdf", "pypdfium2", "pyspark", "rank-bm25", "rapidfuzz", "rapidocr-onnxruntime", "rdflib", "requests-toolbelt", "rspace_client", "scikit-learn", "sqlite-vss", "streamlit", "sympy", "telethon", "tidb-vector", "timescale-vector", "tqdm", "tree-sitter", "tree-sitter-languages", "upstash-redis", "vdms", "xata", "xmltodict"]
[metadata]
lock-version = "2.0"
python-versions = ">=3.8.1,<4.0"
content-hash = "981274d216c60197e6e1025870f85f167388ace0a5c5f30765c58ea40da9bff0"
content-hash = "a252d01de15be98dd394d8c848afdaf25217b8582767288e429793a56a40d42c"

@ -102,6 +102,7 @@ premai = {version = "^0.3.25", optional = true}
vdms = {version = "^0.0.20", optional = true}
httpx-sse = {version = "^0.4.0", optional = true}
pyjwt = {version = "^2.8.0", optional = true}
oracledb = {version = "^2.2.0", optional = true}
[tool.poetry.group.test]
optional = true
@ -279,7 +280,8 @@ extended_testing = [
"premai",
"vdms",
"httpx-sse",
"pyjwt"
"pyjwt",
"oracledb"
]
[tool.ruff]

@ -0,0 +1,447 @@
# Authors:
# Sudhir Kumar (sudhirkk)
#
# -----------------------------------------------------------------------------
# test_oracleds.py
# -----------------------------------------------------------------------------
import sys
from langchain_community.document_loaders.oracleai import (
OracleDocLoader,
OracleTextSplitter,
)
from langchain_community.utilities.oracleai import OracleSummary
from langchain_community.vectorstores.oraclevs import (
_table_exists,
drop_table_purge,
)
uname = "hr"
passwd = "hr"
# uname = "LANGCHAINUSER"
# passwd = "langchainuser"
v_dsn = "100.70.107.245:1521/cdb1_pdb1.regress.rdbms.dev.us.oracle.com"
### Test loader #####
def test_loader_test() -> None:
try:
import oracledb
except ImportError:
return
try:
# oracle connection
connection = oracledb.connect(user=uname, password=passwd, dsn=v_dsn)
cursor = connection.cursor()
if _table_exists(connection, "LANGCHAIN_DEMO"):
drop_table_purge(connection, "LANGCHAIN_DEMO")
cursor.execute("CREATE TABLE langchain_demo(id number, text varchar2(25))")
rows = [
(1, "First"),
(2, "Second"),
(3, "Third"),
(4, "Fourth"),
(5, "Fifth"),
(6, "Sixth"),
(7, "Seventh"),
]
cursor.executemany("insert into LANGCHAIN_DEMO(id, text) values (:1, :2)", rows)
connection.commit()
# local file, local directory, database column
loader_params = {
"owner": uname,
"tablename": "LANGCHAIN_DEMO",
"colname": "TEXT",
}
# instantiate
loader = OracleDocLoader(conn=connection, params=loader_params)
# load
docs = loader.load()
# verify
if len(docs) == 0:
sys.exit(1)
if _table_exists(connection, "LANGCHAIN_DEMO"):
drop_table_purge(connection, "LANGCHAIN_DEMO")
except Exception:
sys.exit(1)
try:
# expectation : ORA-00942
loader_params = {
"owner": uname,
"tablename": "COUNTRIES1",
"colname": "COUNTRY_NAME",
}
# instantiate
loader = OracleDocLoader(conn=connection, params=loader_params)
# load
docs = loader.load()
if len(docs) == 0:
pass
except Exception:
pass
try:
# expectation : file "SUDHIR" doesn't exist.
loader_params = {"file": "SUDHIR"}
# instantiate
loader = OracleDocLoader(conn=connection, params=loader_params)
# load
docs = loader.load()
if len(docs) == 0:
pass
except Exception:
pass
try:
# expectation : path "SUDHIR" doesn't exist.
loader_params = {"dir": "SUDHIR"}
# instantiate
loader = OracleDocLoader(conn=connection, params=loader_params)
# load
docs = loader.load()
if len(docs) == 0:
pass
except Exception:
pass
### Test splitter ####
def test_splitter_test() -> None:
try:
import oracledb
except ImportError:
return
try:
# oracle connection
connection = oracledb.connect(user=uname, password=passwd, dsn=v_dsn)
doc = """Langchain is a wonderful framework to load, split, chunk
and embed your data!!"""
# by words , max = 1000
splitter_params = {
"by": "words",
"max": "1000",
"overlap": "200",
"split": "custom",
"custom_list": [","],
"extended": "true",
"normalize": "all",
}
# instantiate
splitter = OracleTextSplitter(conn=connection, params=splitter_params)
# generate chunks
chunks = splitter.split_text(doc)
# verify
if len(chunks) == 0:
sys.exit(1)
# by chars , max = 4000
splitter_params = {
"by": "chars",
"max": "4000",
"overlap": "800",
"split": "NEWLINE",
"normalize": "all",
}
# instantiate
splitter = OracleTextSplitter(conn=connection, params=splitter_params)
# generate chunks
chunks = splitter.split_text(doc)
# verify
if len(chunks) == 0:
sys.exit(1)
# by words , max = 10
splitter_params = {
"by": "words",
"max": "10",
"overlap": "2",
"split": "SENTENCE",
}
# instantiate
splitter = OracleTextSplitter(conn=connection, params=splitter_params)
# generate chunks
chunks = splitter.split_text(doc)
# verify
if len(chunks) == 0:
sys.exit(1)
# by chars , max = 50
splitter_params = {
"by": "chars",
"max": "50",
"overlap": "10",
"split": "SPACE",
"normalize": "all",
}
# instantiate
splitter = OracleTextSplitter(conn=connection, params=splitter_params)
# generate chunks
chunks = splitter.split_text(doc)
# verify
if len(chunks) == 0:
sys.exit(1)
except Exception:
sys.exit(1)
try:
# ORA-20003: invalid value xyz for BY parameter
splitter_params = {"by": "xyz"}
# instantiate
splitter = OracleTextSplitter(conn=connection, params=splitter_params)
# generate chunks
chunks = splitter.split_text(doc)
# verify
if len(chunks) == 0:
pass
except Exception:
pass
try:
# Expectation: ORA-30584: invalid text chunking MAXIMUM - '10'
splitter_params = {
"by": "chars",
"max": "10",
"overlap": "2",
"split": "SPACE",
"normalize": "all",
}
# instantiate
splitter = OracleTextSplitter(conn=connection, params=splitter_params)
# generate chunks
chunks = splitter.split_text(doc)
# verify
if len(chunks) == 0:
pass
except Exception:
pass
try:
# Expectation: ORA-30584: invalid text chunking MAXIMUM - '5'
splitter_params = {
"by": "words",
"max": "5",
"overlap": "2",
"split": "SPACE",
"normalize": "all",
}
# instantiate
splitter = OracleTextSplitter(conn=connection, params=splitter_params)
# generate chunks
chunks = splitter.split_text(doc)
# verify
if len(chunks) == 0:
pass
except Exception:
pass
try:
# Expectation: ORA-30586: invalid text chunking SPLIT BY - SENTENCE
splitter_params = {
"by": "words",
"max": "50",
"overlap": "2",
"split": "SENTENCE",
"normalize": "all",
}
# instantiate
splitter = OracleTextSplitter(conn=connection, params=splitter_params)
# generate chunks
chunks = splitter.split_text(doc)
# verify
if len(chunks) == 0:
pass
except Exception:
pass
#### Test summary ####
def test_summary_test() -> None:
try:
import oracledb
except ImportError:
return
try:
# oracle connection
connection = oracledb.connect(user=uname, password=passwd, dsn=v_dsn)
# provider : Database, glevel : Paragraph
summary_params = {
"provider": "database",
"glevel": "paragraph",
"numParagraphs": 2,
"language": "english",
}
# summary
summary = OracleSummary(conn=connection, params=summary_params)
doc = """It was 7 minutes after midnight. The dog was lying on the grass in
of the lawn in front of Mrs Shears house. Its eyes were closed. It
was running on its side, the way dogs run when they think they are
cat in a dream. But the dog was not running or asleep. The dog was dead.
was a garden fork sticking out of the dog. The points of the fork must
gone all the way through the dog and into the ground because the fork
not fallen over. I decided that the dog was probably killed with the
because I could not see any other wounds in the dog and I do not think
would stick a garden fork into a dog after it had died for some other
like cancer for example, or a road accident. But I could not be certain"""
summaries = summary.get_summary(doc)
# verify
if len(summaries) == 0:
sys.exit(1)
# provider : Database, glevel : Sentence
summary_params = {"provider": "database", "glevel": "Sentence"}
# summary
summary = OracleSummary(conn=connection, params=summary_params)
summaries = summary.get_summary(doc)
# verify
if len(summaries) == 0:
sys.exit(1)
# provider : Database, glevel : P
summary_params = {"provider": "database", "glevel": "P"}
# summary
summary = OracleSummary(conn=connection, params=summary_params)
summaries = summary.get_summary(doc)
# verify
if len(summaries) == 0:
sys.exit(1)
# provider : Database, glevel : S
summary_params = {
"provider": "database",
"glevel": "S",
"numParagraphs": 16,
"language": "english",
}
# summary
summary = OracleSummary(conn=connection, params=summary_params)
summaries = summary.get_summary(doc)
# verify
if len(summaries) == 0:
sys.exit(1)
# provider : Database, glevel : S, doc = ' '
summary_params = {"provider": "database", "glevel": "S", "numParagraphs": 2}
# summary
summary = OracleSummary(conn=connection, params=summary_params)
doc = " "
summaries = summary.get_summary(doc)
# verify
if len(summaries) == 0:
sys.exit(1)
except Exception:
sys.exit(1)
try:
# Expectation : DRG-11002: missing value for PROVIDER
summary_params = {"provider": "database1", "glevel": "S"}
# summary
summary = OracleSummary(conn=connection, params=summary_params)
summaries = summary.get_summary(doc)
# verify
if len(summaries) == 0:
pass
except Exception:
pass
try:
# Expectation : DRG-11425: gist level SUDHIR is invalid,
# DRG-11427: valid gist level values are S, P
summary_params = {"provider": "database", "glevel": "SUDHIR"}
# summary
summary = OracleSummary(conn=connection, params=summary_params)
summaries = summary.get_summary(doc)
# verify
if len(summaries) == 0:
pass
except Exception:
pass
try:
# Expectation : DRG-11441: gist numParagraphs -2 is invalid
summary_params = {"provider": "database", "glevel": "S", "numParagraphs": -2}
# summary
summary = OracleSummary(conn=connection, params=summary_params)
summaries = summary.get_summary(doc)
# verify
if len(summaries) == 0:
pass
except Exception:
pass

@ -0,0 +1,955 @@
"""Test Oracle AI Vector Search functionality."""
# import required modules
import sys
import threading
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores.oraclevs import (
OracleVS,
_create_table,
_index_exists,
_table_exists,
create_index,
drop_index_if_exists,
drop_table_purge,
)
from langchain_community.vectorstores.utils import DistanceStrategy
username = ""
password = ""
dsn = ""
############################
####### table_exists #######
############################
def test_table_exists_test() -> None:
try:
import oracledb
except ImportError:
return
try:
connection = oracledb.connect(user=username, password=password, dsn=dsn)
except Exception:
sys.exit(1)
# 1. Existing Table:(all capital letters)
# expectation:True
_table_exists(connection, "V$TRANSACTION")
# 2. Existing Table:(all small letters)
# expectation:True
_table_exists(connection, "v$transaction")
# 3. Non-Existing Table
# expectation:false
_table_exists(connection, "Hello")
# 4. Invalid Table Name
# Expectation:ORA-00903: invalid table name
try:
_table_exists(connection, "123")
except Exception:
pass
# 5. Empty String
# Expectation:ORA-00903: invalid table name
try:
_table_exists(connection, "")
except Exception:
pass
# 6. Special Character
# Expectation:ORA-00911: #: invalid character after FROM
try:
_table_exists(connection, "##4")
except Exception:
pass
# 7. Table name length > 128
# Expectation:ORA-00972: The identifier XXXXXXXXXX...XXXXXXXXXX...
# exceeds the maximum length of 128 bytes.
try:
_table_exists(connection, "x" * 129)
except Exception:
pass
# 8. <Schema_Name.Table_Name>
# Expectation:True
_create_table(connection, "TB1", 65535)
# 9. Toggle Case (like TaBlE)
# Expectation:True
_table_exists(connection, "Tb1")
drop_table_purge(connection, "TB1")
# 10. Table_Name→ "हिन्दी"
# Expectation:True
_create_table(connection, '"हिन्दी"', 545)
_table_exists(connection, '"हिन्दी"')
drop_table_purge(connection, '"हिन्दी"')
############################
####### create_table #######
############################
def test_create_table_test() -> None:
try:
import oracledb
except ImportError:
return
try:
connection = oracledb.connect(user=username, password=password, dsn=dsn)
except Exception:
sys.exit(1)
# 1. New table - HELLO
# Dimension - 100
# Expectation:table is created
_create_table(connection, "HELLO", 100)
# 2. Existing table name
# HELLO
# Dimension - 110
# Expectation:Nothing happens
_create_table(connection, "HELLO", 110)
drop_table_purge(connection, "HELLO")
# 3. New Table - 123
# Dimension - 100
# Expectation:ORA-00903: invalid table name
try:
_create_table(connection, "123", 100)
drop_table_purge(connection, "123")
except Exception:
pass
# 4. New Table - Hello123
# Dimension - 65535
# Expectation:table is created
_create_table(connection, "Hello123", 65535)
drop_table_purge(connection, "Hello123")
# 5. New Table - T1
# Dimension - 65536
# Expectation:ORA-51801: VECTOR column type specification
# has an unsupported dimension count ('65536').
try:
_create_table(connection, "T1", 65536)
drop_table_purge(connection, "T1")
except Exception:
pass
# 6. New Table - T1
# Dimension - 0
# Expectation:ORA-51801: VECTOR column type specification has
# an unsupported dimension count (0).
try:
_create_table(connection, "T1", 0)
drop_table_purge(connection, "T1")
except Exception:
pass
# 7. New Table - T1
# Dimension - -1
# Expectation:ORA-51801: VECTOR column type specification has
# an unsupported dimension count ('-').
try:
_create_table(connection, "T1", -1)
drop_table_purge(connection, "T1")
except Exception:
pass
# 8. New Table - T2
# Dimension - '1000'
# Expectation:table is created
_create_table(connection, "T2", int("1000"))
drop_table_purge(connection, "T2")
# 9. New Table - T3
# Dimension - 100 passed as a variable
# Expectation:table is created
val = 100
_create_table(connection, "T3", val)
drop_table_purge(connection, "T3")
# 10.
# Expectation:ORA-00922: missing or invalid option
val2 = """H
ello"""
try:
_create_table(connection, val2, 545)
drop_table_purge(connection, val2)
except Exception:
pass
# 11. New Table - हिन्दी
# Dimension - 545
# Expectation:table is created
_create_table(connection, '"हिन्दी"', 545)
drop_table_purge(connection, '"हिन्दी"')
# 12. <schema_name.table_name>
# Expectation:failure - user does not exist
try:
_create_table(connection, "U1.TB4", 128)
drop_table_purge(connection, "U1.TB4")
except Exception:
pass
# 13.
# Expectation:table is created
_create_table(connection, '"T5"', 128)
drop_table_purge(connection, '"T5"')
# 14. Toggle Case
# Expectation:table creation fails
try:
_create_table(connection, "TaBlE", 128)
drop_table_purge(connection, "TaBlE")
except Exception:
pass
# 15. table_name as empty_string
# Expectation: ORA-00903: invalid table name
try:
_create_table(connection, "", 128)
drop_table_purge(connection, "")
_create_table(connection, '""', 128)
drop_table_purge(connection, '""')
except Exception:
pass
# 16. Arithmetic Operations in dimension parameter
# Expectation:table is created
n = 1
_create_table(connection, "T10", n + 500)
drop_table_purge(connection, "T10")
# 17. String Operations in table_name&dimension parameter
# Expectation:table is created
_create_table(connection, "YaSh".replace("aS", "ok"), 500)
drop_table_purge(connection, "YaSh".replace("aS", "ok"))
##################################
####### create_hnsw_index #######
##################################
def test_create_hnsw_index_test() -> None:
try:
import oracledb
except ImportError:
return
try:
connection = oracledb.connect(user=username, password=password, dsn=dsn)
except Exception:
sys.exit(1)
# 1. Table_name - TB1
# New Index
# distance_strategy - DistanceStrategy.Dot_product
# Expectation:Index created
model1 = HuggingFaceEmbeddings(
model_name="sentence-transformers/paraphrase-mpnet-base-v2"
)
vs = OracleVS(connection, model1, "TB1", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(connection, vs)
# 2. Creating same index again
# Table_name - TB1
# Expectation:Nothing happens
try:
create_index(connection, vs)
drop_index_if_exists(connection, "HNSW")
except Exception:
pass
drop_table_purge(connection, "TB1")
# 3. Create index with following parameters:
# idx_name - hnsw_idx2
# idx_type - HNSW
# Expectation:Index created
vs = OracleVS(connection, model1, "TB2", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(connection, vs, params={"idx_name": "hnsw_idx2", "idx_type": "HNSW"})
drop_index_if_exists(connection, "hnsw_idx2")
drop_table_purge(connection, "TB2")
# 4. Table Name - TB1
# idx_name - "हिन्दी"
# idx_type - HNSW
# Expectation:Index created
try:
vs = OracleVS(connection, model1, "TB3", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(connection, vs, params={"idx_name": '"हिन्दी"', "idx_type": "HNSW"})
drop_index_if_exists(connection, '"हिन्दी"')
except Exception:
pass
drop_table_purge(connection, "TB3")
# 5. idx_name passed empty
# Expectation:ORA-01741: illegal zero-length identifier
try:
vs = OracleVS(connection, model1, "TB4", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(connection, vs, params={"idx_name": '""', "idx_type": "HNSW"})
drop_index_if_exists(connection, '""')
except Exception:
pass
drop_table_purge(connection, "TB4")
# 6. idx_type left empty
# Expectation:Index created
try:
vs = OracleVS(connection, model1, "TB5", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(connection, vs, params={"idx_name": "Hello", "idx_type": ""})
drop_index_if_exists(connection, "Hello")
except Exception:
pass
drop_table_purge(connection, "TB5")
# 7. efconstruction passed as parameter but not neighbours
# Expectation:Index created
vs = OracleVS(connection, model1, "TB7", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(
connection,
vs,
params={"idx_name": "idx11", "efConstruction": 100, "idx_type": "HNSW"},
)
drop_index_if_exists(connection, "idx11")
drop_table_purge(connection, "TB7")
# 8. efconstruction passed as parameter as well as neighbours
# (for this idx_type parameter is also necessary)
# Expectation:Index created
vs = OracleVS(connection, model1, "TB8", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(
connection,
vs,
params={
"idx_name": "idx11",
"efConstruction": 100,
"neighbors": 80,
"idx_type": "HNSW",
},
)
drop_index_if_exists(connection, "idx11")
drop_table_purge(connection, "TB8")
# 9. Limit of Values for(integer values):
# parallel
# efConstruction
# Neighbors
# Accuracy
# 0<Accuracy<=100
# 0<Neighbour<=2048
# 0<efConstruction<=65535
# 0<parallel<=255
# Expectation:Index created
drop_table_purge(connection, "TB9")
vs = OracleVS(connection, model1, "TB9", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(
connection,
vs,
params={
"idx_name": "idx11",
"efConstruction": 65535,
"neighbors": 2048,
"idx_type": "HNSW",
"parallel": 255,
},
)
drop_index_if_exists(connection, "idx11")
drop_table_purge(connection, "TB9")
# index not created:
try:
vs = OracleVS(connection, model1, "TB10", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(
connection,
vs,
params={
"idx_name": "idx11",
"efConstruction": 0,
"neighbors": 2048,
"idx_type": "HNSW",
"parallel": 255,
},
)
drop_index_if_exists(connection, "idx11")
except Exception:
pass
# index not created:
try:
vs = OracleVS(connection, model1, "TB11", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(
connection,
vs,
params={
"idx_name": "idx11",
"efConstruction": 100,
"neighbors": 0,
"idx_type": "HNSW",
"parallel": 255,
},
)
drop_index_if_exists(connection, "idx11")
except Exception:
pass
# index not created
try:
vs = OracleVS(connection, model1, "TB12", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(
connection,
vs,
params={
"idx_name": "idx11",
"efConstruction": 100,
"neighbors": 100,
"idx_type": "HNSW",
"parallel": 0,
},
)
drop_index_if_exists(connection, "idx11")
except Exception:
pass
# index not created
try:
vs = OracleVS(connection, model1, "TB13", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(
connection,
vs,
params={
"idx_name": "idx11",
"efConstruction": 10,
"neighbors": 100,
"idx_type": "HNSW",
"parallel": 10,
"accuracy": 120,
},
)
drop_index_if_exists(connection, "idx11")
except Exception:
pass
# with negative values/out-of-bound values for all 4 of them, we get the same errors
# Expectation:Index not created
try:
vs = OracleVS(connection, model1, "TB14", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(
connection,
vs,
params={
"idx_name": "idx11",
"efConstruction": 200,
"neighbors": 100,
"idx_type": "HNSW",
"parallel": "hello",
"accuracy": 10,
},
)
drop_index_if_exists(connection, "idx11")
except Exception:
pass
drop_table_purge(connection, "TB10")
drop_table_purge(connection, "TB11")
drop_table_purge(connection, "TB12")
drop_table_purge(connection, "TB13")
drop_table_purge(connection, "TB14")
# 10. Table_name as <schema_name.table_name>
# Expectation:Index created
vs = OracleVS(connection, model1, "TB15", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(
connection,
vs,
params={
"idx_name": "idx11",
"efConstruction": 200,
"neighbors": 100,
"idx_type": "HNSW",
"parallel": 8,
"accuracy": 10,
},
)
drop_index_if_exists(connection, "idx11")
drop_table_purge(connection, "TB15")
# 11. index_name as <schema_name.index_name>
# Expectation:U1 not present
try:
vs = OracleVS(
connection, model1, "U1.TB16", DistanceStrategy.EUCLIDEAN_DISTANCE
)
create_index(
connection,
vs,
params={
"idx_name": "U1.idx11",
"efConstruction": 200,
"neighbors": 100,
"idx_type": "HNSW",
"parallel": 8,
"accuracy": 10,
},
)
drop_index_if_exists(connection, "U1.idx11")
drop_table_purge(connection, "TB16")
except Exception:
pass
# 12. Index_name size >129
# Expectation:Index not created
try:
vs = OracleVS(connection, model1, "TB17", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(connection, vs, params={"idx_name": "x" * 129, "idx_type": "HNSW"})
drop_index_if_exists(connection, "x" * 129)
except Exception:
pass
drop_table_purge(connection, "TB17")
# 13. Index_name size 128
# Expectation:Index created
vs = OracleVS(connection, model1, "TB18", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(connection, vs, params={"idx_name": "x" * 128, "idx_type": "HNSW"})
drop_index_if_exists(connection, "x" * 128)
drop_table_purge(connection, "TB18")
##################################
####### index_exists #############
##################################
def test_index_exists_test() -> None:
try:
import oracledb
except ImportError:
return
try:
connection = oracledb.connect(user=username, password=password, dsn=dsn)
except Exception:
sys.exit(1)
model1 = HuggingFaceEmbeddings(
model_name="sentence-transformers/paraphrase-mpnet-base-v2"
)
# 1. Existing Index:(all capital letters)
# Expectation:true
vs = OracleVS(connection, model1, "TB1", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(connection, vs, params={"idx_name": "idx11", "idx_type": "HNSW"})
_index_exists(connection, "IDX11")
# 2. Existing Table:(all small letters)
# Expectation:true
_index_exists(connection, "idx11")
# 3. Non-Existing Index
# Expectation:False
_index_exists(connection, "Hello")
# 4. Invalid Index Name
# Expectation:Error
try:
_index_exists(connection, "123")
except Exception:
pass
# 5. Empty String
# Expectation:Error
try:
_index_exists(connection, "")
except Exception:
pass
try:
_index_exists(connection, "")
except Exception:
pass
# 6. Special Character
# Expectation:Error
try:
_index_exists(connection, "##4")
except Exception:
pass
# 7. Index name length > 128
# Expectation:Error
try:
_index_exists(connection, "x" * 129)
except Exception:
pass
# 8. <Schema_Name.Index_Name>
# Expectation:true
_index_exists(connection, "U1.IDX11")
# 9. Toggle Case (like iDx11)
# Expectation:true
_index_exists(connection, "IdX11")
# 10. Index_Name→ "हिन्दी"
# Expectation:true
drop_index_if_exists(connection, "idx11")
try:
create_index(connection, vs, params={"idx_name": '"हिन्दी"', "idx_type": "HNSW"})
_index_exists(connection, '"हिन्दी"')
except Exception:
pass
drop_table_purge(connection, "TB1")
##################################
####### add_texts ################
##################################
def test_add_texts_test() -> None:
try:
import oracledb
except ImportError:
return
try:
connection = oracledb.connect(user=username, password=password, dsn=dsn)
except Exception:
sys.exit(1)
# 1. Add 2 records to table
# Expectation:Successful
texts = ["Rohan", "Shailendra"]
metadata = [
{"id": "100", "link": "Document Example Test 1"},
{"id": "101", "link": "Document Example Test 2"},
]
model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
vs_obj = OracleVS(connection, model, "TB1", DistanceStrategy.EUCLIDEAN_DISTANCE)
vs_obj.add_texts(texts, metadata)
drop_table_purge(connection, "TB1")
# 2. Add record but metadata is not there
# Expectation:An exception occurred :: Either specify an 'ids' list or
# 'metadatas' with an 'id' attribute for each element.
model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
vs_obj = OracleVS(connection, model, "TB2", DistanceStrategy.EUCLIDEAN_DISTANCE)
texts2 = ["Sri Ram", "Krishna"]
vs_obj.add_texts(texts2)
drop_table_purge(connection, "TB2")
# 3. Add record with ids option
# ids are passed as string
# ids are passed as empty string
# ids are passed as multi-line string
# ids are passed as "<string>"
# Expectations:
# Successful
# Successful
# Successful
# Successful
vs_obj = OracleVS(connection, model, "TB4", DistanceStrategy.EUCLIDEAN_DISTANCE)
ids3 = ["114", "124"]
vs_obj.add_texts(texts2, ids=ids3)
drop_table_purge(connection, "TB4")
vs_obj = OracleVS(connection, model, "TB5", DistanceStrategy.EUCLIDEAN_DISTANCE)
ids4 = ["", "134"]
vs_obj.add_texts(texts2, ids=ids4)
drop_table_purge(connection, "TB5")
vs_obj = OracleVS(connection, model, "TB6", DistanceStrategy.EUCLIDEAN_DISTANCE)
ids5 = [
"""Good afternoon
my friends""",
"India",
]
vs_obj.add_texts(texts2, ids=ids5)
drop_table_purge(connection, "TB6")
vs_obj = OracleVS(connection, model, "TB7", DistanceStrategy.EUCLIDEAN_DISTANCE)
ids6 = ['"Good afternoon"', '"India"']
vs_obj.add_texts(texts2, ids=ids6)
drop_table_purge(connection, "TB7")
# 4. Add records with ids and metadatas
# Expectation:Successful
vs_obj = OracleVS(connection, model, "TB8", DistanceStrategy.EUCLIDEAN_DISTANCE)
texts3 = ["Sri Ram 6", "Krishna 6"]
ids7 = ["1", "2"]
metadata = [
{"id": "102", "link": "Document Example", "stream": "Science"},
{"id": "104", "link": "Document Example 45"},
]
vs_obj.add_texts(texts3, metadata, ids=ids7)
drop_table_purge(connection, "TB8")
# 5. Add 10000 records
# Expectation:Successful
vs_obj = OracleVS(connection, model, "TB9", DistanceStrategy.EUCLIDEAN_DISTANCE)
texts4 = ["Sri Ram{0}".format(i) for i in range(1, 10000)]
ids8 = ["Hello{0}".format(i) for i in range(1, 10000)]
vs_obj.add_texts(texts4, ids=ids8)
drop_table_purge(connection, "TB9")
# 6. Add 2 different record concurrently
# Expectation:Successful
def add(val: str) -> None:
model = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-mpnet-base-v2"
)
vs_obj = OracleVS(
connection, model, "TB10", DistanceStrategy.EUCLIDEAN_DISTANCE
)
texts5 = [val]
ids9 = texts5
vs_obj.add_texts(texts5, ids=ids9)
thread_1 = threading.Thread(target=add, args=("Sri Ram"))
thread_2 = threading.Thread(target=add, args=("Sri Krishna"))
thread_1.start()
thread_2.start()
thread_1.join()
thread_2.join()
drop_table_purge(connection, "TB10")
# 7. Add 2 same record concurrently
# Expectation:Successful, For one of the insert,get primary key violation error
def add1(val: str) -> None:
model = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-mpnet-base-v2"
)
vs_obj = OracleVS(
connection, model, "TB11", DistanceStrategy.EUCLIDEAN_DISTANCE
)
texts = [val]
ids10 = texts
vs_obj.add_texts(texts, ids=ids10)
try:
thread_1 = threading.Thread(target=add1, args=("Sri Ram"))
thread_2 = threading.Thread(target=add1, args=("Sri Ram"))
thread_1.start()
thread_2.start()
thread_1.join()
thread_2.join()
except Exception:
pass
drop_table_purge(connection, "TB11")
# 8. create object with table name of type <schema_name.table_name>
# Expectation:U1 does not exist
try:
vs_obj = OracleVS(connection, model, "U1.TB14", DistanceStrategy.DOT_PRODUCT)
for i in range(1, 10):
texts7 = ["Yash{0}".format(i)]
ids13 = ["1234{0}".format(i)]
vs_obj.add_texts(texts7, ids=ids13)
drop_table_purge(connection, "TB14")
except Exception:
pass
##################################
####### embed_documents(text) ####
##################################
def test_embed_documents_test() -> None:
try:
import oracledb
except ImportError:
return
try:
connection = oracledb.connect(user=username, password=password, dsn=dsn)
except Exception:
sys.exit(1)
# 1. String Example-'Sri Ram'
# Expectation:Vector Printed
model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
vs_obj = OracleVS(connection, model, "TB7", DistanceStrategy.EUCLIDEAN_DISTANCE)
# 4. List
# Expectation:Vector Printed
vs_obj._embed_documents(["hello", "yash"])
drop_table_purge(connection, "TB7")
##################################
####### embed_query(text) ########
##################################
def test_embed_query_test() -> None:
try:
import oracledb
except ImportError:
return
try:
connection = oracledb.connect(user=username, password=password, dsn=dsn)
except Exception:
sys.exit(1)
# 1. String
# Expectation:Vector printed
model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
vs_obj = OracleVS(connection, model, "TB8", DistanceStrategy.EUCLIDEAN_DISTANCE)
vs_obj._embed_query("Sri Ram")
drop_table_purge(connection, "TB8")
# 3. Empty string
# Expectation:[]
vs_obj._embed_query("")
##################################
####### create_index #############
##################################
def test_create_index_test() -> None:
try:
import oracledb
except ImportError:
return
try:
connection = oracledb.connect(user=username, password=password, dsn=dsn)
except Exception:
sys.exit(1)
# 1. No optional parameters passed
# Expectation:Successful
model1 = HuggingFaceEmbeddings(
model_name="sentence-transformers/paraphrase-mpnet-base-v2"
)
vs = OracleVS(connection, model1, "TB1", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(connection, vs)
drop_index_if_exists(connection, "HNSW")
drop_table_purge(connection, "TB1")
# 2. ivf index
# Expectation:Successful
vs = OracleVS(connection, model1, "TB2", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(connection, vs, {"idx_type": "IVF", "idx_name": "IVF"})
drop_index_if_exists(connection, "IVF")
drop_table_purge(connection, "TB2")
# 3. ivf index with neighbour_part passed as parameter
# Expectation:Successful
vs = OracleVS(connection, model1, "TB3", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(connection, vs, {"idx_type": "IVF", "neighbor_part": 10})
drop_index_if_exists(connection, "IVF")
drop_table_purge(connection, "TB3")
# 4. ivf index with neighbour_part and accuracy passed as parameter
# Expectation:Successful
vs = OracleVS(connection, model1, "TB4", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(
connection, vs, {"idx_type": "IVF", "neighbor_part": 10, "accuracy": 90}
)
drop_index_if_exists(connection, "IVF")
drop_table_purge(connection, "TB4")
# 5. ivf index with neighbour_part and parallel passed as parameter
# Expectation:Successful
vs = OracleVS(connection, model1, "TB5", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(
connection, vs, {"idx_type": "IVF", "neighbor_part": 10, "parallel": 90}
)
drop_index_if_exists(connection, "IVF")
drop_table_purge(connection, "TB5")
# 6. ivf index and then perform dml(insert)
# Expectation:Successful
vs = OracleVS(connection, model1, "TB6", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(connection, vs, {"idx_type": "IVF", "idx_name": "IVF"})
texts = ["Sri Ram", "Krishna"]
vs.add_texts(texts)
# perform delete
vs.delete(["hello"])
drop_index_if_exists(connection, "IVF")
drop_table_purge(connection, "TB6")
# 7. ivf index with neighbour_part,parallel and accuracy passed as parameter
# Expectation:Successful
vs = OracleVS(connection, model1, "TB7", DistanceStrategy.EUCLIDEAN_DISTANCE)
create_index(
connection,
vs,
{"idx_type": "IVF", "neighbor_part": 10, "parallel": 90, "accuracy": 99},
)
drop_index_if_exists(connection, "IVF")
drop_table_purge(connection, "TB7")
##################################
####### perform_search ###########
##################################
def test_perform_search_test() -> None:
try:
import oracledb
except ImportError:
return
try:
connection = oracledb.connect(user=username, password=password, dsn=dsn)
except Exception:
sys.exit(1)
model1 = HuggingFaceEmbeddings(
model_name="sentence-transformers/paraphrase-mpnet-base-v2"
)
vs_1 = OracleVS(connection, model1, "TB10", DistanceStrategy.EUCLIDEAN_DISTANCE)
vs_2 = OracleVS(connection, model1, "TB11", DistanceStrategy.DOT_PRODUCT)
vs_3 = OracleVS(connection, model1, "TB12", DistanceStrategy.COSINE)
vs_4 = OracleVS(connection, model1, "TB13", DistanceStrategy.EUCLIDEAN_DISTANCE)
vs_5 = OracleVS(connection, model1, "TB14", DistanceStrategy.DOT_PRODUCT)
vs_6 = OracleVS(connection, model1, "TB15", DistanceStrategy.COSINE)
# vector store lists:
vs_list = [vs_1, vs_2, vs_3, vs_4, vs_5, vs_6]
for i, vs in enumerate(vs_list, start=1):
# insert data
texts = ["Yash", "Varanasi", "Yashaswi", "Mumbai", "BengaluruYash"]
metadatas = [
{"id": "hello"},
{"id": "105"},
{"id": "106"},
{"id": "yash"},
{"id": "108"},
]
vs.add_texts(texts, metadatas)
# create index
if i == 1 or i == 2 or i == 3:
create_index(connection, vs, {"idx_type": "HNSW", "idx_name": f"IDX1{i}"})
else:
create_index(connection, vs, {"idx_type": "IVF", "idx_name": f"IDX1{i}"})
# perform search
query = "YashB"
filter = {"id": ["106", "108", "yash"]}
# similarity_searh without filter
vs.similarity_search(query, 2)
# similarity_searh with filter
vs.similarity_search(query, 2, filter=filter)
# Similarity search with relevance score
vs.similarity_search_with_score(query, 2)
# Similarity search with relevance score with filter
vs.similarity_search_with_score(query, 2, filter=filter)
# Max marginal relevance search
vs.max_marginal_relevance_search(query, 2, fetch_k=20, lambda_mult=0.5)
# Max marginal relevance search with filter
vs.max_marginal_relevance_search(
query, 2, fetch_k=20, lambda_mult=0.5, filter=filter
)
drop_table_purge(connection, "TB10")
drop_table_purge(connection, "TB11")
drop_table_purge(connection, "TB12")
drop_table_purge(connection, "TB13")
drop_table_purge(connection, "TB14")
drop_table_purge(connection, "TB15")

@ -113,6 +113,8 @@ EXPECTED_ALL = [
"OnlinePDFLoader",
"OpenCityDataLoader",
"OracleAutonomousDatabaseLoader",
"OracleDocLoader",
"OracleTextSplitter",
"OutlookMessageLoader",
"PDFMinerLoader",
"PDFMinerPDFasHTMLLoader",

@ -57,6 +57,7 @@ EXPECTED_ALL = [
"ErnieEmbeddings",
"JavelinAIGatewayEmbeddings",
"OllamaEmbeddings",
"OracleEmbeddings",
"QianfanEmbeddingsEndpoint",
"JohnSnowLabsEmbeddings",
"VoyageEmbeddings",

@ -34,6 +34,7 @@ EXPECTED_ALL = [
"NVIDIARivaTTS",
"NVIDIARivaStream",
"OpenWeatherMapAPIWrapper",
"OracleSummary",
"OutlineAPIWrapper",
"NutritionAIAPI",
"Portkey",

@ -60,6 +60,7 @@ EXPECTED_ALL = [
"Neo4jVector",
"NeuralDBVectorStore",
"OpenSearchVectorSearch",
"OracleVS",
"PGEmbedding",
"PGVector",
"PathwayVectorClient",

@ -73,6 +73,7 @@ def test_compatible_vectorstore_documentation() -> None:
"MomentoVectorIndex",
"MyScale",
"OpenSearchVectorSearch",
"OracleVS",
"PGVector",
"Pinecone",
"Qdrant",

@ -55,6 +55,7 @@ _EXPECTED = [
"MyScaleSettings",
"Neo4jVector",
"OpenSearchVectorSearch",
"OracleVS",
"PGEmbedding",
"PGVector",
"PathwayVectorClient",

Loading…
Cancel
Save