{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Typesense\n", "\n", "> [Typesense](https://typesense.org) is an open-source, in-memory search engine, that you can either [self-host](https://typesense.org/docs/guide/install-typesense#option-2-local-machine-self-hosting) or run on [Typesense Cloud](https://cloud.typesense.org/).\n", ">\n", "> Typesense focuses on performance by storing the entire index in RAM (with a backup on disk) and also focuses on providing an out-of-the-box developer experience by simplifying available options and setting good defaults.\n", ">\n", "> It also lets you combine attribute-based filtering together with vector queries, to fetch the most relevant documents." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook shows you how to use Typesense as your VectorStore." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's first install our dependencies:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "%pip install --upgrade --quiet typesense openapi-schema-pydantic langchain-openai tiktoken" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2023-05-23T22:48:02.968822Z", "start_time": "2023-05-23T22:47:48.574094Z" }, "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "import getpass\n", "import os\n", "\n", "os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API Key:\")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2023-05-23T22:50:34.775893Z", "start_time": "2023-05-23T22:50:34.771889Z" }, "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "from langchain_community.document_loaders import TextLoader\n", "from langchain_community.vectorstores import Typesense\n", "from langchain_openai import OpenAIEmbeddings\n", "from langchain_text_splitters import CharacterTextSplitter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's import our test dataset:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "ExecuteTime": { "end_time": "2023-05-23T22:56:19.093489Z", "start_time": "2023-05-23T22:56:19.089Z" }, "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "loader = TextLoader(\"../../modules/state_of_the_union.txt\")\n", "documents = loader.load()\n", "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n", "docs = text_splitter.split_documents(documents)\n", "\n", "embeddings = OpenAIEmbeddings()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "docsearch = Typesense.from_documents(\n", " docs,\n", " embeddings,\n", " typesense_client_params={\n", " \"host\": \"localhost\", # Use xxx.a1.typesense.net for Typesense Cloud\n", " \"port\": \"8108\", # Use 443 for Typesense Cloud\n", " \"protocol\": \"http\", # Use https for Typesense Cloud\n", " \"typesense_api_key\": \"xyz\",\n", " \"typesense_collection_name\": \"lang-chain\",\n", " },\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Similarity Search" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "query = \"What did the president say about Ketanji Brown Jackson\"\n", "found_docs = docsearch.similarity_search(query)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "print(found_docs[0].page_content)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Typesense as a Retriever\n", "\n", "Typesense, as all the other vector stores, is a LangChain Retriever, by using cosine similarity." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "retriever = docsearch.as_retriever()\n", "retriever" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "query = \"What did the president say about Ketanji Brown Jackson\"\n", "retriever.invoke(query)[0]" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 4 }