You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/docs/docs/integrations/document_transformers/openvino_rerank.ipynb

541 lines
25 KiB
Plaintext

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"# OpenVINO Reranker\n",
"\n",
"[OpenVINO™](https://github.com/openvinotoolkit/openvino) is an open-source toolkit for optimizing and deploying AI inference. The OpenVINO™ Runtime supports various hardware [devices](https://github.com/openvinotoolkit/openvino?tab=readme-ov-file#supported-hardware-matrix) including x86 and ARM CPUs, and Intel GPUs. It can help to boost deep learning performance in Computer Vision, Automatic Speech Recognition, Natural Language Processing and other common tasks.\n",
"\n",
"Hugging Face rerank model can be supported by OpenVINO through ``OpenVINOReranker`` class. If you have an Intel GPU, you can specify `model_kwargs={\"device\": \"GPU\"}` to run inference on it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"pycharm": {
"is_executing": true
}
},
"outputs": [],
"source": [
"%pip install --upgrade-strategy eager \"optimum[openvino,nncf]\" --quiet\n",
"%pip install --upgrade --quiet faiss-cpu"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"# Helper function for printing docs\n",
"\n",
"\n",
"def pretty_print_docs(docs):\n",
" print(\n",
" f\"\\n{'-' * 100}\\n\".join(\n",
" [\n",
" f\"Document {i+1}:\\n\\n{d.page_content}\\nMetadata: {d.metadata}\"\n",
" for i, d in enumerate(docs)\n",
" ]\n",
" )\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"## Set up the base vector store retriever\n",
"Let's start by initializing a simple vector store retriever and storing the 2023 State of the Union speech (in chunks). We can set up the retriever to retrieve a high number (20) of docs."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/ethan/intel/langchain_test/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, onnx, openvino\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Framework not specified. Using pt to export the model.\n",
"Using the export variant default. Available variants are:\n",
" - default: The default ONNX variant.\n",
"Using framework PyTorch: 2.2.1+cu121\n",
"/home/ethan/intel/langchain_test/lib/python3.10/site-packages/transformers/modeling_utils.py:4193: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead\n",
" warnings.warn(\n",
"Compiling the model to CPU ...\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Document 1:\n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"Metadata: {'source': '../../modules/state_of_the_union.txt', 'id': 73}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 2:\n",
"\n",
"Danielle says Heath was a fighter to the very end. \n",
"\n",
"He didnt know how to stop fighting, and neither did she. \n",
"\n",
"Through her pain she found purpose to demand we do better. \n",
"\n",
"Tonight, Danielle—we are. \n",
"\n",
"The VA is pioneering new ways of linking toxic exposures to diseases, already helping more veterans get benefits. \n",
"\n",
"And tonight, Im announcing were expanding eligibility to veterans suffering from nine respiratory cancers.\n",
"Metadata: {'source': '../../modules/state_of_the_union.txt', 'id': 88}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 3:\n",
"\n",
"The widow of Sergeant First Class Heath Robinson. \n",
"\n",
"He was born a soldier. Army National Guard. Combat medic in Kosovo and Iraq. \n",
"\n",
"Stationed near Baghdad, just yards from burn pits the size of football fields. \n",
"\n",
"Heaths widow Danielle is here with us tonight. They loved going to Ohio State football games. He loved building Legos with their daughter. \n",
"\n",
"But cancer from prolonged exposure to burn pits ravaged Heaths lungs and body. \n",
"\n",
"Danielle says Heath was a fighter to the very end.\n",
"Metadata: {'source': '../../modules/state_of_the_union.txt', 'id': 87}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 4:\n",
"\n",
"Im also calling on Congress: pass a law to make sure veterans devastated by toxic exposures in Iraq and Afghanistan finally get the benefits and comprehensive health care they deserve. \n",
"\n",
"And fourth, lets end cancer as we know it. \n",
"\n",
"This is personal to me and Jill, to Kamala, and to so many of you. \n",
"\n",
"Cancer is the #2 cause of death in Americasecond only to heart disease.\n",
"Metadata: {'source': '../../modules/state_of_the_union.txt', 'id': 89}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 5:\n",
"\n",
"Every Administration says theyll do it, but we are actually doing it. \n",
"\n",
"We will buy American to make sure everything from the deck of an aircraft carrier to the steel on highway guardrails are made in America. \n",
"\n",
"But to compete for the best jobs of the future, we also need to level the playing field with China and other competitors.\n",
"Metadata: {'source': '../../modules/state_of_the_union.txt', 'id': 29}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 6:\n",
"\n",
"He met the Ukrainian people. \n",
"\n",
"From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. \n",
"\n",
"Groups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland. \n",
"\n",
"In this struggle as President Zelenskyy said in his speech to the European Parliament “Light will win over darkness.” The Ukrainian Ambassador to the United States is here tonight.\n",
"Metadata: {'source': '../../modules/state_of_the_union.txt', 'id': 2}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 7:\n",
"\n",
"As Ohio Senator Sherrod Brown says, “Its time to bury the label “Rust Belt.” \n",
"\n",
"Its time. \n",
"\n",
"But with all the bright spots in our economy, record job growth and higher wages, too many families are struggling to keep up with the bills. \n",
"\n",
"Inflation is robbing them of the gains they might otherwise feel. \n",
"\n",
"I get it. Thats why my top priority is getting prices under control.\n",
"Metadata: {'source': '../../modules/state_of_the_union.txt', 'id': 35}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 8:\n",
"\n",
"But that trickle-down theory led to weaker economic growth, lower wages, bigger deficits, and the widest gap between those at the top and everyone else in nearly a century. \n",
"\n",
"Vice President Harris and I ran for office with a new economic vision for America. \n",
"\n",
"Invest in America. Educate Americans. Grow the workforce. Build the economy from the bottom up \n",
"and the middle out, not from the top down.\n",
"Metadata: {'source': '../../modules/state_of_the_union.txt', 'id': 23}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 9:\n",
"\n",
"To all Americans, I will be honest with you, as Ive always promised. A Russian dictator, invading a foreign country, has costs around the world. \n",
"\n",
"And Im taking robust action to make sure the pain of our sanctions is targeted at Russias economy. And I will use every tool at our disposal to protect American businesses and consumers. \n",
"\n",
"Tonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world.\n",
"Metadata: {'source': '../../modules/state_of_the_union.txt', 'id': 14}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 10:\n",
"\n",
"The one thing all Americans agree on is that the tax system is not fair. We have to fix it. \n",
"\n",
"Im not looking to punish anyone. But lets make sure corporations and the wealthiest Americans start paying their fair share. \n",
"\n",
"Just last year, 55 Fortune 500 corporations earned $40 billion in profits and paid zero dollars in federal income tax. \n",
"\n",
"Thats simply not fair. Thats why Ive proposed a 15% minimum tax rate for corporations.\n",
"Metadata: {'source': '../../modules/state_of_the_union.txt', 'id': 46}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 11:\n",
"\n",
"Joshua is here with us tonight. Yesterday was his birthday. Happy birthday, buddy. \n",
"\n",
"For Joshua, and for the 200,000 other young people with Type 1 diabetes, lets cap the cost of insulin at $35 a month so everyone can afford it. \n",
"\n",
"Drug companies will still do very well. And while were at it let Medicare negotiate lower prices for prescription drugs, like the VA already does.\n",
"Metadata: {'source': '../../modules/state_of_the_union.txt', 'id': 41}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 12:\n",
"\n",
"As Ive told Xi Jinping, it is never a good bet to bet against the American people. \n",
"\n",
"Well create good jobs for millions of Americans, modernizing roads, airports, ports, and waterways all across America. \n",
"\n",
"And well do it all to withstand the devastating effects of the climate crisis and promote environmental justice.\n",
"Metadata: {'source': '../../modules/state_of_the_union.txt', 'id': 26}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 13:\n",
"\n",
"As I said last year, especially to our younger transgender Americans, I will always have your back as your President, so you can be yourself and reach your God-given potential. \n",
"\n",
"While it often appears that we never agree, that isnt true. I signed 80 bipartisan bills into law last year. From preventing government shutdowns to protecting Asian-Americans from still-too-common hate crimes to reforming military justice.\n",
"Metadata: {'source': '../../modules/state_of_the_union.txt', 'id': 79}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 14:\n",
"\n",
"My administration is providing assistance with job training and housing, and now helping lower-income veterans get VA care debt-free. \n",
"\n",
"Our troops in Iraq and Afghanistan faced many dangers. \n",
"\n",
"One was stationed at bases and breathing in toxic smoke from “burn pits” that incinerated wastes of war—medical and hazard material, jet fuel, and more. \n",
"\n",
"When they came home, many of the worlds fittest and best trained warriors were never the same. \n",
"\n",
"Headaches. Numbness. Dizziness.\n",
"Metadata: {'source': '../../modules/state_of_the_union.txt', 'id': 85}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 15:\n",
"\n",
"A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
"\n",
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.\n",
"Metadata: {'source': '../../modules/state_of_the_union.txt', 'id': 74}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 16:\n",
"\n",
"I spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves. \n",
"\n",
"Ive worked on these issues a long time. \n",
"\n",
"I know what works: Investing in crime prevention and community police officers wholl walk the beat, wholl know the neighborhood, and who can restore trust and safety. \n",
"\n",
"So lets not abandon our streets. Or choose between safety and equal justice.\n",
"Metadata: {'source': '../../modules/state_of_the_union.txt', 'id': 67}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 17:\n",
"\n",
"Well build a national network of 500,000 electric vehicle charging stations, begin to replace poisonous lead pipes—so every child—and every American—has clean water to drink at home and at school, provide affordable high-speed internet for every American—urban, suburban, rural, and tribal communities. \n",
"\n",
"4,000 projects have already been announced. \n",
"\n",
"And tonight, Im announcing that this year we will start fixing over 65,000 miles of highway and 1,500 bridges in disrepair.\n",
"Metadata: {'source': '../../modules/state_of_the_union.txt', 'id': 27}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 18:\n",
"\n",
"Cancer is the #2 cause of death in Americasecond only to heart disease. \n",
"\n",
"Last month, I announced our plan to supercharge \n",
"the Cancer Moonshot that President Obama asked me to lead six years ago. \n",
"\n",
"Our goal is to cut the cancer death rate by at least 50% over the next 25 years, turn more cancers from death sentences into treatable diseases. \n",
"\n",
"More support for patients and families. \n",
"\n",
"To get there, I call on Congress to fund ARPA-H, the Advanced Research Projects Agency for Health.\n",
"Metadata: {'source': '../../modules/state_of_the_union.txt', 'id': 90}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 19:\n",
"\n",
"He will never extinguish their love of freedom. He will never weaken the resolve of the free world. \n",
"\n",
"We meet tonight in an America that has lived through two of the hardest years this nation has ever faced. \n",
"\n",
"The pandemic has been punishing. \n",
"\n",
"And so many families are living paycheck to paycheck, struggling to keep up with the rising cost of food, gas, housing, and so much more. \n",
"\n",
"I understand.\n",
"Metadata: {'source': '../../modules/state_of_the_union.txt', 'id': 18}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 20:\n",
"\n",
"He and his Dad both have Type 1 diabetes, which means they need insulin every day. Insulin costs about $10 a vial to make. \n",
"\n",
"But drug companies charge families like Joshua and his Dad up to 30 times more. I spoke with Joshuas mom. \n",
"\n",
"Imagine what its like to look at your child who needs insulin and have no idea how youre going to pay for it. \n",
"\n",
"What it does to your dignity, your ability to look your child in the eye, to be the parent you expect to be.\n",
"Metadata: {'source': '../../modules/state_of_the_union.txt', 'id': 40}\n"
]
}
],
"source": [
"from langchain.embeddings import OpenVINOEmbeddings\n",
"from langchain_community.document_loaders import TextLoader\n",
"from langchain_community.vectorstores import FAISS\n",
"from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
"\n",
"documents = TextLoader(\n",
" \"../../modules/state_of_the_union.txt\",\n",
").load()\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)\n",
"texts = text_splitter.split_documents(documents)\n",
"for idx, text in enumerate(texts):\n",
" text.metadata[\"id\"] = idx\n",
"\n",
"embedding = OpenVINOEmbeddings(\n",
" model_name_or_path=\"sentence-transformers/all-mpnet-base-v2\"\n",
")\n",
"retriever = FAISS.from_documents(texts, embedding).as_retriever(search_kwargs={\"k\": 20})\n",
"\n",
"query = \"What did the president say about Ketanji Brown Jackson\"\n",
"docs = retriever.invoke(query)\n",
"pretty_print_docs(docs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"## Reranking with OpenVINO\n",
"Now let's wrap our base retriever with a `ContextualCompressionRetriever`, using `OpenVINOReranker` as a compressor."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
"from langchain.retrievers import ContextualCompressionRetriever\n",
"from langchain_community.document_compressors.openvino_rerank import OpenVINOReranker\n",
"\n",
"model_name = \"BAAI/bge-reranker-large\"\n",
"\n",
"ov_compressor = OpenVINOReranker(model_name_or_path=model_name, top_n=4)\n",
"compression_retriever = ContextualCompressionRetriever(\n",
" base_compressor=ov_compressor, base_retriever=retriever\n",
")\n",
"\n",
"compressed_docs = compression_retriever.invoke(\n",
" \"What did the president say about Ketanji Jackson Brown\"\n",
")\n",
"print([doc.metadata[\"id\"] for doc in compressed_docs])"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"source": [
"After reranking, the top 4 documents are different from the top 4 documents retrieved by the base retriever."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Document 1:\n",
"\n",
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nations top legal minds, who will continue Justice Breyers legacy of excellence.\n",
"Metadata: {'id': 0, 'relevance_score': tensor(0.6148)}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 2:\n",
"\n",
"He will never extinguish their love of freedom. He will never weaken the resolve of the free world. \n",
"\n",
"We meet tonight in an America that has lived through two of the hardest years this nation has ever faced. \n",
"\n",
"The pandemic has been punishing. \n",
"\n",
"And so many families are living paycheck to paycheck, struggling to keep up with the rising cost of food, gas, housing, and so much more. \n",
"\n",
"I understand.\n",
"Metadata: {'id': 16, 'relevance_score': tensor(0.0373)}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 3:\n",
"\n",
"A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since shes been nominated, shes received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n",
"\n",
"And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.\n",
"Metadata: {'id': 18, 'relevance_score': tensor(0.0131)}\n",
"----------------------------------------------------------------------------------------------------\n",
"Document 4:\n",
"\n",
"To all Americans, I will be honest with you, as Ive always promised. A Russian dictator, invading a foreign country, has costs around the world. \n",
"\n",
"And Im taking robust action to make sure the pain of our sanctions is targeted at Russias economy. And I will use every tool at our disposal to protect American businesses and consumers. \n",
"\n",
"Tonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world.\n",
"Metadata: {'id': 6, 'relevance_score': tensor(0.0098)}\n"
]
}
],
"source": [
"pretty_print_docs(compressed_docs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Export IR model\n",
"It is possible to export your rerank model to the OpenVINO IR format with ``OVModelForSequenceClassification``, and load the model from local folder."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"from pathlib import Path\n",
"\n",
"ov_model_dir = \"bge-reranker-large-ov\"\n",
"if not Path(ov_model_dir).exists():\n",
" ov_compressor.save_model(ov_model_dir)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Compiling the model to CPU ...\n"
]
}
],
"source": [
"ov_compressor = OpenVINOReranker(model_name_or_path=ov_model_dir)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For more information refer to:\n",
"\n",
"* [OpenVINO LLM guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html).\n",
"\n",
"* [OpenVINO Documentation](https://docs.openvino.ai/2024/home.html).\n",
"\n",
"* [OpenVINO Get Started Guide](https://www.intel.com/content/www/us/en/content-details/819067/openvino-get-started-guide.html).\n",
"\n",
"* [RAG Notebook with LangChain](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-rag-langchain)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 4
}