You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/docs/docs/integrations/retrievers/embedchain.ipynb

254 lines
9.3 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "2f0f85ac-9c49-4111-a320-e53bccc99b13",
"metadata": {},
"source": [
"# Embedchain\n",
"\n",
">[Embedchain](https://github.com/embedchain/embedchain) is a RAG framework to create data pipelines. It loads, indexes, retrieves and syncs all the data.\n",
">\n",
">It is available as an [open source package](https://github.com/embedchain/embedchain) and as a [hosted platform solution](https://app.embedchain.ai/).\n",
"\n",
"This notebook shows how to use a retriever that uses `Embedchain`."
]
},
{
"cell_type": "markdown",
"id": "e48de822-307b-4284-96e7-c91f11ce005b",
"metadata": {},
"source": [
"# Installation\n",
"\n",
"First you will need to install the [`embedchain` package](https://pypi.org/project/embedchain/). \n",
"\n",
"You can install the package by running "
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "c690a78c-5999-4072-b4e1-2712ff73f950",
"metadata": {},
"outputs": [],
"source": [
"%pip install --upgrade --quiet embedchain"
]
},
{
"cell_type": "markdown",
"id": "bc89ba12-6ebd-4cd6-8c85-7410531579ff",
"metadata": {},
"source": [
"# Create New Retriever\n",
"\n",
"`EmbedchainRetriever` has a static `.create()` factory method that takes the following arguments:\n",
"\n",
"* `yaml_path: string` optional -- Path to the YAML configuration file. If not provided, a default configuration is used. You can browse the [docs](https://docs.embedchain.ai/) to explore various customization options."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "8e639bd4-2e60-487b-b7aa-f7e6b921b069",
"metadata": {},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
" ········\n"
]
}
],
"source": [
"# Setup API Key\n",
"\n",
"import os\n",
"from getpass import getpass\n",
"\n",
"os.environ[\"OPENAI_API_KEY\"] = getpass()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "223fbc76-91ad-4504-87e9-980fb0e027fc",
"metadata": {},
"outputs": [],
"source": [
"from langchain_community.retrievers import EmbedchainRetriever\n",
"\n",
"# create a retriever with default options\n",
"retriever = EmbedchainRetriever.create()\n",
"\n",
"# or if you want to customize, pass the yaml config path\n",
"# retriever = EmbedchainRetiever.create(yaml_path=\"config.yaml\")"
]
},
{
"cell_type": "markdown",
"id": "536f3a1d-3491-45b5-9f25-869bd6fb6d6a",
"metadata": {},
"source": [
"# Add Data\n",
"\n",
"In embedchain, you can as many supported data types as possible. You can browse our [docs](https://docs.embedchain.ai/) to see the data types supported.\n",
"\n",
"Embedchain automatically deduces the types of the data. So you can add a string, URL or local file path."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "31262be3-7d0d-42e8-9253-052160576dc7",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Inserting batches in chromadb: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:08<00:00, 2.22s/it]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Successfully saved https://en.wikipedia.org/wiki/Elon_Musk (DataType.WEB_PAGE). New chunks count: 378\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Inserting batches in chromadb: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.17s/it]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Successfully saved https://www.forbes.com/profile/elon-musk (DataType.WEB_PAGE). New chunks count: 13\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Inserting batches in chromadb: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 2.25s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Successfully saved https://www.youtube.com/watch?v=RcYjXbSJBN8 (DataType.YOUTUBE_VIDEO). New chunks count: 53\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"text/plain": [
"['1eab8dd1ffa92906f7fc839862871ca5',\n",
" '8cf46026cabf9b05394a2658bd1fe890',\n",
" 'da3227cdbcedb018e05c47b774d625f6']"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"retriever.add_texts(\n",
" [\n",
" \"https://en.wikipedia.org/wiki/Elon_Musk\",\n",
" \"https://www.forbes.com/profile/elon-musk\",\n",
" \"https://www.youtube.com/watch?v=RcYjXbSJBN8\",\n",
" ]\n",
")"
]
},
{
"cell_type": "markdown",
"id": "e1f34a62-7f8e-4c03-8e10-c317ed3296aa",
"metadata": {},
"source": [
"# Use Retriever\n",
"\n",
"You can now use the retrieve to find relevant documents given a query"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "6106baf9-652a-4a94-b2d7-d6a5d2917975",
"metadata": {},
"outputs": [],
"source": [
"result = retriever.invoke(\"How many companies does Elon Musk run and name those?\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "1deae5d0-e0fa-431d-b164-e9680ef3e69b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[Document(page_content='Views Filmography Companies Zip2 X.com PayPal SpaceX Starlink Tesla, Inc. Energycriticismlitigation OpenAI Neuralink The Boring Company Thud X Corp. Twitteracquisitiontenure as CEO xAI In popular culture Elon Musk (Isaacson) Elon Musk (Vance) Ludicrous Power Play \"Members Only\" \"The Platonic Permutation\" \"The Musk Who Fell to Earth\" \"One Crew over the Crewcoo\\'s Morty\" Elon Musk\\'s Crash Course Related Boring Test Tunnel Hyperloop Musk family Musk vs. Zuckerberg SolarCity Tesla Roadster in space', metadata={'source': 'https://en.wikipedia.org/wiki/Elon_Musk', 'document_id': 'c33c05d0-5028-498b-b5e3-c43a4f9e8bf8--3342161a0fbc19e91f6bf387204aa30fbb2cea05abc81882502476bde37b9392'}),\n",
" Document(page_content='Elon Musk PROFILEElon MuskCEO, Tesla$241.2B$508M (0.21%)Real Time Net Worthas of 11/18/23Reflects change since 5 pm ET of prior trading day. 1 in the world todayPhoto by Martin Schoeller for ForbesAbout Elon MuskElon Musk cofounded six companies, including electric car maker Tesla, rocket producer SpaceX and tunneling startup Boring Company.He owns about 21% of Tesla between stock and options, but has pledged more than half his shares as collateral for personal loans of up to $3.5', metadata={'source': 'https://www.forbes.com/profile/elon-musk', 'document_id': 'c33c05d0-5028-498b-b5e3-c43a4f9e8bf8--3c8573134c575fafc025e9211413723e1f7a725b5936e8ee297fb7fb63bdd01a'}),\n",
" Document(page_content='to form PayPal. In October 2002, eBay acquired PayPal for $1.5 billion, and that same year, with $100 million of the money he made, Musk founded SpaceX, a spaceflight services company. In 2004, he became an early investor in electric vehicle manufacturer Tesla Motors, Inc. (now Tesla, Inc.). He became its chairman and product architect, assuming the position of CEO in 2008. In 2006, Musk helped create SolarCity, a solar-energy company that was acquired by Tesla in 2016 and became Tesla Energy.', metadata={'source': 'https://en.wikipedia.org/wiki/Elon_Musk', 'document_id': 'c33c05d0-5028-498b-b5e3-c43a4f9e8bf8--3342161a0fbc19e91f6bf387204aa30fbb2cea05abc81882502476bde37b9392'})]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"result"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b3f26c2b-048d-4588-90a0-50f5c9c35837",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}