langchain

Commit Graph

Author	SHA1	Message	Date
OwenElliott	3074306ae1	Marqo Vector Store Examples & Type Hints (#7326 ) This PR improves the example notebook for the Marqo vectorstore implementation by adding a new RetrievalQAWithSourcesChain example. The `embedding` parameter in `from_documents` has its type updated to `Union[Embeddings, None]` and a default parameter of None because this is ignored in Marqo. This PR also upgrades the Marqo version to 0.11.0 to remove the device parameter after a breaking change to the API. Related to #7068 @tomhamer @hwchase17 --------- Co-authored-by: Tom Hamer <tom@marqo.ai>	11 months ago
Bagatur	a9c5b4bcea	Bagatur/clarifai update (#7324 ) This PR improves upon the Clarifai LangChain integration with improved docs, errors, args and the addition of embedding model support in LancChain for Clarifai's embedding models and an overview of the various ways you can integrate with Clarifai added to the docs. --------- Co-authored-by: Matthew Zeiler <zeiler@clarifai.com>	11 months ago
Harrison Chase	52b016920c	Harrison/update anthropic (#7237 ) Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>	11 months ago
Tom	e533da8bf2	Adding Marqo to vectorstore ecosystem (#7068 ) This PR brings in a vectorstore interface for [Marqo](https://www.marqo.ai/). The Marqo vectorstore exposes some of Marqo's functionality in addition the the VectorStore base class. The Marqo vectorstore also makes the embedding parameter optional because inference for embeddings is an inherent part of Marqo. Docs, notebook examples and integration tests included. Related PR: https://github.com/hwchase17/langchain/pull/2807 --------- Co-authored-by: Tom Hamer <tom@marqo.ai> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	11 months ago
felixocker	db98c44f8f	Support for SPARQL (#7165 ) # [SPARQL](https://www.w3.org/TR/rdf-sparql-query/) for [LangChain](https://github.com/hwchase17/langchain) ## Description LangChain support for knowledge graphs relying on W3C standards using RDFlib: SPARQL/ RDF(S)/ OWL with special focus on RDF \ * Works with local files, files from the web, and SPARQL endpoints * Supports both SELECT and UPDATE queries * Includes both a Jupyter notebook with an example and integration tests ## Contribution compared to related PRs and discussions * [Wikibase agent](https://github.com/hwchase17/langchain/pull/2690) - uses SPARQL, but specifically for wikibase querying * [Cypher qa](https://github.com/hwchase17/langchain/pull/5078) - graph DB question answering for Neo4J via Cypher * [PR 6050](https://github.com/hwchase17/langchain/pull/6050) - tries something similar, but does not cover UPDATE queries and supports only RDF * Discussions on [w3c mailing list](mailto:semantic-web@w3.org) related to the combination of LLMs (specifically ChatGPT) and knowledge graphs ## Dependencies * [RDFlib](https://github.com/RDFLib/rdflib) ## Tag maintainer Graph database related to memory -> @hwchase17	11 months ago
Simon Cheung	81eebc4070	Add HugeGraphQAChain to support gremlin generating chain (#7132 ) [Apache HugeGraph](https://github.com/apache/incubator-hugegraph) is a convenient, efficient, and adaptable graph database, compatible with the Apache TinkerPop3 framework and the Gremlin query language. In this PR, the HugeGraph and HugeGraphQAChain provide the same functionality as the existing integration with Neo4j and enables query generation and question answering over HugeGraph database. The difference is that the graph query language supported by HugeGraph is not cypher but another very popular graph query language [Gremlin](https://tinkerpop.apache.org/gremlin.html). A notebook example and a simple test case have also been added. --------- Co-authored-by: Bagatur <baskaryan@gmail.com>	11 months ago
William FH	dfa48dc3b5	Update sdk version (#7109 )	11 months ago
Stefano Lottini	8d2281a8ca	Second Attempt - Add concurrent insertion of vector rows in the Cassandra Vector Store (#7017 ) Retrying with the same improvements as in #6772, this time trying not to mess up with branches. @rlancemartin doing a fresh new PR from a branch with a new name. This should do. Thank you for your help! --------- Co-authored-by: Jonathan Ellis <jbellis@datastax.com> Co-authored-by: rlm <pexpresss31@gmail.com>	11 months ago
Zander Chase	b0859c9b18	Add New Retriever Interface with Callbacks (#5962 ) Handle the new retriever events in a way that (I think) is entirely backwards compatible? Needs more testing for some of the chain changes and all. This creates an entire new run type, however. We could also just treat this as an event within a chain run presumably (same with memory) Adds a subclass initializer that upgrades old retriever implementations to the new schema, along with tests to ensure they work. First commit doesn't upgrade any of our retriever implementations (to show that we can pass the tests along with additional ones testing the upgrade logic). Second commit upgrades the known universe of retrievers in langchain. - [X] Add callback handling methods for retriever start/end/error (open to renaming to 'retrieval' if you want that) - [X] Update BaseRetriever schema to support callbacks - [X] Tests for upgrading old "v1" retrievers for backwards compatibility - [X] Update existing retriever implementations to implement the new interface - [X] Update calls within chains to .{a]get_relevant_documents to pass the child callback manager - [X] Update the notebooks/docs to reflect the new interface - [X] Test notebooks thoroughly Not handled: - Memory pass throughs: retrieval memory doesn't have a parent callback manager passed through the method --------- Co-authored-by: Nuno Campos <nuno@boringbits.io> Co-authored-by: William Fu-Hinthorn <13333726+hinthornw@users.noreply.github.com>	11 months ago
Daniel Chalef	b26cca8008	Zep Authentication (#6728 ) ## Description: Add Zep API Key argument to ZepChatMessageHistory and ZepRetriever - correct docs site links - add zep api_key auth to constructors ZepChatMessageHistory: @hwchase17, ZepRetriever: @rlancemartin, @eyurtsev	11 months ago
William FH	13c62cf6b1	Arthur Callback (#6972 ) Co-authored-by: Max Cembalest <115359769+arthuractivemodeling@users.noreply.github.com>	11 months ago
Bagatur	60b0d6ea35	Bagatur/openllm ensure available (#6960 ) Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> Co-authored-by: Aaron <29749331+aarnphm@users.noreply.github.com>	11 months ago
Stefano Lottini	75fb9d2fdc	Cassandra support for chat history using CassIO library (#6771 ) ### Overview This PR aims at building on #4378, expanding the capabilities and building on top of the `cassIO` library to interface with the database (as opposed to using the core drivers directly). Usage of `cassIO` (a library abstracting Cassandra access for ML/GenAI-specific purposes) is already established since #6426 was merged, so no new dependencies are introduced. In the same spirit, we try to uniform the interface for using Cassandra instances throughout LangChain: all our appreciation of the work by @jj701 notwithstanding, who paved the way for this incremental work (thank you!), we identified a few reasons for changing the way a `CassandraChatMessageHistory` is instantiated. Advocating a syntax change is something we don't take lighthearted way, so we add some explanations about this below. Additionally, this PR expands on integration testing, enables use of Cassandra's native Time-to-Live (TTL) features and improves the phrasing around the notebook example and the short "integrations" documentation paragraph. We would kindly request @hwchase to review (since this is an elaboration and proposed improvement of #4378 who had the same reviewer). ### About the __init__ breaking changes There are [many](https://docs.datastax.com/en/developer/python-driver/3.28/api/cassandra/cluster/) options when creating the `Cluster` object, and new ones might be added at any time. Choosing some of them and exposing them as `__init__` parameters `CassandraChatMessageHistory` will prove to be insufficient for at least some users. On the other hand, working through `kwargs` or adding a long, long list of arguments to `__init__` is not a desirable option either. For this reason, (as done in #6426), we propose that whoever instantiates the Chat Message History class provide a Cassandra `Session` object, ready to use. This also enables easier injection of mocks and usage of Cassandra-compatible connections (such as those to the cloud database DataStax Astra DB, obtained with a different set of init parameters than `contact_points` and `port`). We feel that a breaking change might still be acceptable since LangChain is at `0.*`. However, while maintaining that the approach we propose will be more flexible in the future, room could be made for a "compatibility layer" that respects the current init method. Honestly, we would to that only if there are strong reasons for it, as that would entail an additional maintenance burden. ### Other changes We propose to remove the keyspace creation from the class code for two reasons: first, production Cassandra instances often employ RBAC so that the database user reading/writing from tables does not necessarily (and generally shouldn't) have permission to create keyspaces, and second that programmatic keyspace creation is not a best practice (it should be done more or less manually, with extra care about schema mismatched among nodes, etc). Removing this (usually unnecessary) operation from the `__init__` path would also improve initialization performance (shorter time). We suggest, likewise, to remove the `__del__` method (which would close the database connection), for the following reason: it is the recommended best practice to create a single Cassandra `Session` object throughout an application (it is a resource-heavy object capable to handle concurrency internally), so in case Cassandra is used in other ways by the app there is the risk of truncating the connection for all usages when the history instance is destroyed. Moreover, the `Session` object, in typical applications, is best left to garbage-collect itself automatically. As mentioned above, we defer the actual database I/O to the `cassIO` library, which is designed to encode practices optimized for LLM applications (among other) without the need to expose LangChain developers to the internals of CQL (Cassandra Query Language). CassIO is already employed by the LangChain's Vector Store support for Cassandra. We added a few more connection options in the companion notebook example (most notably, Astra DB) to encourage usage by anyone who cannot run their own Cassandra cluster. We surface the `ttl_seconds` option for automatic handling of an expiration time to chat history messages, a likely useful feature given that very old messages generally may lose their importance. We elaborated a bit more on the integration testing (Time-to-live, separation of "session ids", ...). ### Remarks from linter & co. We reinstated `cassio` as a dependency both in the "optional" group and in the "integration testing" group of `pyproject.toml`. This might not be the right thing do to, in which case the author of this PR offer his apologies (lack of confidence with Poetry - happy to be pointed in the right direction, though!). During linter tests, we were hit by some errors which appear unrelated to the code in the PR. We left them here and report on them here for awareness: ``` langchain/vectorstores/mongodb_atlas.py:137: error: Argument 1 to "insert_many" of "Collection" has incompatible type "List[Dict[str, Sequence[object]]]"; expected "Iterable[Union[MongoDBDocumentType, RawBSONDocument]]" [arg-type] langchain/vectorstores/mongodb_atlas.py:186: error: Argument 1 to "aggregate" of "Collection" has incompatible type "List[object]"; expected "Sequence[Mapping[str, Any]]" [arg-type] langchain/vectorstores/qdrant.py:16: error: Name "grpc" is not defined [name-defined] langchain/vectorstores/qdrant.py:19: error: Name "grpc" is not defined [name-defined] langchain/vectorstores/qdrant.py:20: error: Name "grpc" is not defined [name-defined] langchain/vectorstores/qdrant.py:22: error: Name "grpc" is not defined [name-defined] langchain/vectorstores/qdrant.py:23: error: Name "grpc" is not defined [name-defined] ``` In the same spirit, we observe that to even get `import langchain` run, it seems that a `pip install bs4` is missing from the minimal package installation path. Thank you!	11 months ago
Harrison Chase	3ac08c3de4	Harrison/octo ml (#6897 ) Co-authored-by: Bassem Yacoube <125713079+AI-Bassem@users.noreply.github.com> Co-authored-by: Shotaro Kohama <khmshtr28@gmail.com> Co-authored-by: Rian Dolphin <34861538+rian-dolphin@users.noreply.github.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com> Co-authored-by: Shashank Deshpande <shashankdeshpande18@gmail.com>	11 months ago
Ayan Bandyopadhyay	f92ccf70fd	Update to the latest Psychic python library version (#6804 ) Update the Psychic document loader to use the latest `psychicapi` python library version: `0.8.0`	11 months ago
Cristóbal Carnero Liñán	e494b0a09f	feat (documents): add a source code loader based on AST manipulation (#6486 ) #### Summary A new approach to loading source code is implemented: Each top-level function and class in the code is loaded into separate documents. Then, an additional document is created with the top-level code, but without the already loaded functions and classes. This could improve the accuracy of QA chains over source code. For instance, having this script: ``` class MyClass: def __init__(self, name): self.name = name def greet(self): print(f"Hello, {self.name}!") def main(): name = input("Enter your name: ") obj = MyClass(name) obj.greet() if __name__ == '__main__': main() ``` The loader will create three documents with this content: First document: ``` class MyClass: def __init__(self, name): self.name = name def greet(self): print(f"Hello, {self.name}!") ``` Second document: ``` def main(): name = input("Enter your name: ") obj = MyClass(name) obj.greet() ``` Third document: ``` # Code for: class MyClass: # Code for: def main(): if __name__ == '__main__': main() ``` A threshold parameter is added to control whether small scripts are split in this way or not. At this moment, only Python and JavaScript are supported. The appropriate parser is determined by examining the file extension. #### Tests This PR adds: - Unit tests - Integration tests #### Dependencies Only one dependency was added as optional (needed for the JavaScript parser). #### Documentation A notebook is added showing how the loader can be used. #### Who can review? @eyurtsev @hwchase17 --------- Co-authored-by: rlm <pexpresss31@gmail.com>	11 months ago
Zander Chase	b4fe7f3a09	Session to project (#6249 ) Sessions are being renamed to projects in the tracer	12 months ago
Tim Conkling	c28990d871	StreamlitCallbackHandler (#6315 ) A new implementation of `StreamlitCallbackHandler`. It formats Agent thoughts into Streamlit expanders. You can see the handler in action here: https://langchain-mrkl.streamlit.app/ Per a discussion with Harrison, we'll be adding a `StreamlitCallbackHandler` implementation to an upcoming [Streamlit](https://github.com/streamlit/streamlit) release as well, and will be updating it as we add new LLM- and LangChain-specific features to Streamlit. The idea with this PR is that the LangChain `StreamlitCallbackHandler` will "auto-update" in a way that keeps it forward- (and backward-) compatible with Streamlit. If the user has an older Streamlit version installed, the LangChain `StreamlitCallbackHandler` will be used; if they have a newer Streamlit version that has an updated `StreamlitCallbackHandler`, that implementation will be used instead. (I'm opening this as a draft to get the conversation going and make sure we're on the same page. We're really excited to land this into LangChain!) #### Who can review? @agola11, @hwchase17	12 months ago
minhajul-clarifai	6e57306a13	Clarifai integration (#5954 ) # Changes This PR adds [Clarifai](https://www.clarifai.com/) integration to Langchain. Clarifai is an end-to-end AI Platform. Clarifai offers user the ability to use many types of LLM (OpenAI, cohere, ect and other open source models). As well, a clarifai app can be treated as a vector database to upload and retrieve data. The integrations includes: - Clarifai LLM integration: Clarifai supports many types of language model that users can utilize for their application - Clarifai VectorDB: A Clarifai application can hold data and embeddings. You can run semantic search with the embeddings #### Before submitting - [x] Added integration test for LLM - [x] Added integration test for VectorDB - [x] Added notebook for LLM - [x] Added notebook for VectorDB Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	12 months ago
Davis Chase	4fabd02d25	Add OpenLLM wrapper(#6578 ) LLM wrapper for models served with OpenLLM --------- Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com> Authored-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com> Co-authored-by: Chaoyu <paranoyang@gmail.com>	12 months ago
ljeagle	ca24dc2d5f	Upgrade the version of AwaDB and add some new interfaces (#6565 ) 1. upgrade the version of AwaDB 2. add some new interfaces 3. fix bug of packing page content error @dev2049 please review, thanks! --------- Co-authored-by: vincent <awadb.vincent@gmail.com>	12 months ago
囧囧	0fce8ef178	Add KuzuQAChain (#6454 ) This PR adds `KuzuGraph` and `KuzuQAChain` for interacting with [Kùzu database](https://github.com/kuzudb/kuzu). Kùzu is an in-process property graph database management system (GDBMS) built for query speed and scalability. The `KuzuGraph` and `KuzuQAChain` provide the same functionality as the existing integration with NebulaGraph and Neo4j and enables query generation and question answering over Kùzu database. A notebook example and a simple test case have also been added. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	12 months ago
Zeeland	8a604b93ab	feat: use latest duckduckgo_search API to call (#6409 ) # Provider the latest duckduckgo_search API The Git commit contents involve two files related to some DuckDuckGo query operations, and an upgrade of the DuckDuckGo module to version 3.8.3. A suitable commit message could be "Upgrade DuckDuckGo module to version 3.8.3, including query operations". Specifically, in the duckduckgo_search.py file, a DDGS() class instance is newly added to replace the previous ddg() function, and the time parameter name in the get_snippets() and results() methods is changed from "time" to "timelimit" to accommodate recent changes. In the pyproject.toml file, the duckduckgo-search module is upgraded to version 3.8.3. [duckduckgo_search readme attention](https://github.com/deedy5/duckduckgo_search): Versions before v2.9.4 no longer work as of May 12, 2023 ## Who can review? @vowelparrot --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	12 months ago
volodymyr-memsql	d2e9b621ab	Update SinglStoreDB vectorstore (#6423 ) 1. Introduced new distance strategies support: DOT_PRODUCT and EUCLIDEAN_DISTANCE for enhanced flexibility. 2. Implemented a feature to filter results based on metadata fields. 3. Incorporated connection attributes specifying "langchain python sdk" usage for enhanced traceability and debugging. 4. Expanded the suite of integration tests for improved code reliability. 5. Updated the existing notebook with the usage example @dev2049 --------- Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	12 months ago
Zander Chase	00f276d23f	Run eval in eval mode (#6447 ) For the `run_on_dataset` sessions	12 months ago
Leonid Ganeline	c7ca350cd3	Fix class promotion (#6187 ) In LangChain, all module classes are enumerated in the `__init__.py` file of the correspondent module. But some classes were missed and were not included in the module `__init__.py` This PR: - added the missed classes to the module `__init__.py` files - `__init__.py:__all_` variable value (a list of the class names) was sorted - `langchain.tools.sql_database.tool.QueryCheckerTool` was renamed into the `QuerySQLCheckerTool` because it conflicted with `langchain.tools.spark_sql.tool.QueryCheckerTool` - changes to `pyproject.toml`: - added `pgvector` to `pyproject.toml:extended_testing` - added `pandas` to `pyproject.toml:[tool.poetry.group.test.dependencies]` - commented out the `streamlit` from `collbacks/__init__.py`, It is because now the `streamlit` requires Python >=3.7, !=3.9.7 - fixed duplicate names in `tools` - fixed correspondent ut-s #### Who can review? @hwchase17 @dev2049	12 months ago
Harrison Chase	af18413d97	Harrison/deeplake new features (#6263 ) Co-authored-by: adilkhan <adilkhan.sarsen@nu.edu.kz> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	12 months ago
ljeagle	ad324a39ae	Improve the performance of add_texts interface and upgrade the AwaDB from 0.3.2 to 0.3.3 (#6316 ) 1. Changed the implementation of add_texts interface for the AwaDB vector store in order to improve the performance 2. Upgrade the AwaDB from 0.3.2 to 0.3.3 --------- Co-authored-by: vincent <awadb.vincent@gmail.com>	12 months ago
Zander Chase	0c52275bdb	Use Run object from SDK (#6067 ) Update the Run object in the tracer to extend that in the SDK to include the parameters necessary for tracking/tracing	12 months ago
Harrison Chase	d1561b74eb	Harrison/cognitive search (#6011 ) Co-authored-by: Fabrizio Ruocco <ruoccofabrizio@gmail.com>	12 months ago
Nuno Campos	18af149e91	nc/load (#5733 ) Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	12 months ago
Harrison Chase	9218684759	Add a new vector store - AwaDB (#5971 ) (#5992 ) Added AwaDB vector store, which is a wrapper over the AwaDB, that can be used as a vector storage and has an efficient similarity search. Added integration tests for the vector store Added jupyter notebook with the example Delete a unneeded empty file and resolve the conflict(https://github.com/hwchase17/langchain/pull/5886) Please check, Thanks! @dev2049 @hwchase17 --------- <!-- Thank you for contributing to LangChain! Your PR will appear in our release under the title you set. Please make sure it highlights your valuable contribution. Replace this with a description of the change, the issue it fixes (if applicable), and relevant context. List any dependencies required for this change. After you're done, someone will review your PR. They may suggest improvements. If no one reviews your PR within a few days, feel free to @-mention the same people again, as notifications can get lost. Finally, we'd love to show appreciation for your contribution - if you'd like us to shout you out on Twitter, please also include your handle! --> <!-- Remove if not applicable --> Fixes # (issue) #### Before submitting <!-- If you're adding a new integration, please include: 1. a test for the integration - favor unit tests that does not rely on network access. 2. an example notebook showing its use See contribution guidelines for more information on how to write tests, lint etc: https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md --> #### Who can review? Tag maintainers/contributors who might be interested: <!-- For a quicker response, figure out the right person to tag with @ @hwchase17 - project lead Tracing / Callbacks - @agola11 Async - @agola11 DataLoaders - @eyurtsev Models - @hwchase17 - @agola11 Agents / Tools / Toolkits - @vowelparrot VectorStores / Retrievers / Memory - @dev2049 --> --------- Co-authored-by: ljeagle <vincent_jieli@yeah.net> Co-authored-by: vincent <awadb.vincent@gmail.com>	12 months ago
Kacper Łukawski	7cc200766e	Expose full params in Qdrant (#5947 ) # Expose full params in Qdrant There were many questions regarding supporting some additional parameters in Qdrant integration. Qdrant supports many vector search optimizations that were impossible to use directly in Qdrant before. That includes: 1. Possibility to manipulate collection params while using `Qdrant.from_texts`. The PR allows setting things such as quantization, HNWS config, optimizers config, etc. That makes it consistent with raw `QdrantClient`. 2. Extended options while searching. It includes HNSW options, exact search, score threshold filtering, and read consistency in distributed mode. After merging that PR, #4858 might also be closed. ## Who can review? VectorStores / Retrievers / Memory @dev2049 @hwchase17	12 months ago
Zander Chase	77c286cf02	Use LCP Client in Tracer (#5908 ) Move the LCP calls to the client.	1 year ago
Harrison Chase	893d20f735	bump version to 194 (#5866 )	1 year ago
Harrison Chase	35cfd25db3	Harrison/nebula graph (#5865 ) Co-authored-by: Wey Gu <weyl.gu@gmail.com> Co-authored-by: chenweisomebody <chenweisomebody@gmail.com>	1 year ago
volodymyr-memsql	a1549901ce	Added SingleStoreDB Vector Store (#5619 ) - Added `SingleStoreDB` vector store, which is a wrapper over the SingleStore DB database, that can be used as a vector storage and has an efficient similarity search. - Added integration tests for the vector store - Added jupyter notebook with the example @dev2049 --------- Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	1 year ago
Zander Chase	d9fcc45d05	Add in the async methods and link the run id (#5810 )	1 year ago
Zander Chase	217b5cc72d	Base RunEvaluator Chain (#5750 ) Clean up a bit and only implement the QA and reference free implementations from https://github.com/hwchase17/langchain/pull/5618	1 year ago
Zander Chase	204a73c1d9	Use client from LCP-SDK (#5695 ) - Remove the client implementation (this breaks backwards compatibility for existing testers. I could keep the stub in that file if we want, but not many people are using it yet - Add SDK as dependency - Update the 'run_on_dataset' method to be a function that optionally accepts a client as an argument - Remove the langchain plus server implementation (you get it for free with the SDK now) We could make the SDK optional for now, but the plan is to use w/in the tracer so it would likely become a hard dependency at some point.	1 year ago
Adil Ansari	233b52735e	feat: Support for `Tigris` Vector Database for vector search (#5703 ) ### Changes - New vector store integration - [Tigris](https://tigrisdata.com) - Adds [tigrisdb](https://pypi.org/project/tigrisdb/) optional dependency - Example notebook demonstrating usage Fixes #5535 Closes tigrisdata/tigris-client-python#40 #### Twitter handles We'd love a shoutout on our [@TigrisData](https://twitter.com/TigrisData) and [@adilansari](https://twitter.com/adilansari) twitter handles #### Who can review? @dev2049 --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	1 year ago
Daniel Chalef	0551bc90a5	Zep Hybrid Search (#5742 ) Zep now supports persisting custom metadata with messages and hybrid search across both message embeddings and structured metadata. This PR implements custom metadata and enhancements to the `ZepChatMessageHistory` and `ZepRetriever` classes to implement this support. Tag maintainers/contributors who might be interested: VectorStores / Retrievers / Memory - @dev2049 --------- Co-authored-by: Daniel Chalef <daniel.chalef@private.org>	1 year ago
Natalie	199cc700a3	Ability to specify credentials wihen using Google BigQuery as a data loader (#5466 ) # Adds ability to specify credentials when using Google BigQuery as a data loader Fixes #5465 . Adds ability to set credentials which must be of the `google.auth.credentials.Credentials` type. This argument is optional and will default to `None. Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	1 year ago
Ayan Bandyopadhyay	8181f9e362	Update psychicapi version (#5471 ) Update [psychicapi](https://pypi.org/project/psychicapi/) python package dependency to the latest version 0.5. The newest python package version addresses breaking changes in the Psychic http api.	1 year ago
Paul-Emile Brotons	a61b7f7e7c	adding MongoDBAtlasVectorSearch (#5338 ) # Add MongoDBAtlasVectorSearch for the python library Fixes #5337 --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	1 year ago
Harrison Chase	760632b292	Harrison/spark reader (#5405 ) Co-authored-by: Rithwik Ediga Lakhamsani <rithwik.ediga@databricks.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	1 year ago
German Martin	0b3e0dd1d2	New Trello document loader (#4767 ) # Added New Trello loader class and documentation Simple Loader on top of py-trello wrapper. With a board name you can pull cards and to do some field parameter tweaks on load operation. I included documentation and examples. Included unit test cases using patch and a fixture for py-trello client class. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	1 year ago
Michael Landis	7047a2c1af	feat: add Momento as a standard cache and chat message history provider (#5221 ) # Add Momento as a standard cache and chat message history provider This PR adds Momento as a standard caching provider. Implements the interface, adds integration tests, and documentation. We also add Momento as a chat history message provider along with integration tests, and documentation. [Momento](https://www.gomomento.com/) is a fully serverless cache. Similar to S3 or DynamoDB, it requires zero configuration, infrastructure management, and is instantly available. Users sign up for free and get 50GB of data in/out for free every month. ## Before submitting ✅ We have added documentation, notebooks, and integration tests demonstrating usage. Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	1 year ago
Davis Chase	ca88b25da6	Zep sdk version (#5267 ) zep-python's sync methods no longer need an asyncio wrapper. This was causing issues with FastAPI deployment. Zep also now supports putting and getting of arbitrary message metadata. Bump zep-python version to v0.30 Remove nest-asyncio from Zep example notebooks. Modify tests to include metadata. --------- Co-authored-by: Daniel Chalef <daniel.chalef@private.org> Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>	1 year ago
Eugene Yurtsev	5cfa72a130	Bibtex integration for document loader and retriever (#5137 ) # Bibtex integration Wrap bibtexparser to retrieve a list of docs from a bibtex file. * Get the metadata from the bibtex entries * `page_content` get from the local pdf referenced in the `file` field of the bibtex entry using `pymupdf` * If no valid pdf file, `page_content` set to the `abstract` field of the bibtex entry * Support Zotero flavour using regex to get the file path * Added usage example in `docs/modules/indexes/document_loaders/examples/bibtex.ipynb` --------- Co-authored-by: Sébastien M. Popoff <sebastien.popoff@espci.fr> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	1 year ago
Davis Chase	2b2176a3c1	tfidf retriever (#5114 ) Co-authored-by: vempaliakhil96 <vempaliakhil96@gmail.com>	1 year ago
Tian Wei	d7f807b71f	Add AzureCognitiveServicesToolkit to call Azure Cognitive Services API (#5012 ) # Add AzureCognitiveServicesToolkit to call Azure Cognitive Services API: achieve some multimodal capabilities This PR adds a toolkit named AzureCognitiveServicesToolkit which bundles the following tools: - AzureCogsImageAnalysisTool: calls Azure Cognitive Services image analysis API to extract caption, objects, tags, and text from images. - AzureCogsFormRecognizerTool: calls Azure Cognitive Services form recognizer API to extract text, tables, and key-value pairs from documents. - AzureCogsSpeech2TextTool: calls Azure Cognitive Services speech to text API to transcribe speech to text. - AzureCogsText2SpeechTool: calls Azure Cognitive Services text to speech API to synthesize text to speech. This toolkit can be used to process image, document, and audio inputs. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	1 year ago
Jamie Broomall	d4fd589638	WhyLabs callback (#4906 ) # Add a WhyLabs callback handler * Adds a simple WhyLabsCallbackHandler * Add required dependencies as optional * protect against missing modules with imports * Add docs/ecosystem basic example based on initial prototype from @andrewelizondo > this integration gathers privacy preserving telemetry on text with whylogs and sends stastical profiles to WhyLabs platform to monitoring these metrics over time. For more information on what WhyLabs is see: https://whylabs.ai After you run the notebook (if you have env variables set for the API Keys, org_id and dataset_id) you get something like this in WhyLabs: ![Screenshot (443)](https://github.com/hwchase17/langchain/assets/88007022/6bdb3e1c-4243-4ae8-b974-23a8bb12edac) Co-authored-by: Andre Elizondo <andre@whylabs.ai> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	1 year ago
Matt Rickard	de6a401a22	Add OpenLM LLM multi-provider (#4993 ) OpenLM is a zero-dependency OpenAI-compatible LLM provider that can call different inference endpoints directly via HTTP. It implements the OpenAI Completion class so that it can be used as a drop-in replacement for the OpenAI API. This changeset utilizes BaseOpenAI for minimal added code. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	1 year ago
Gergely Imreh	69de33e024	Add Mastodon toots loader (#5036 ) # Add Mastodon toots loader. Loader works either with public toots, or Mastodon app credentials. Toot text and user info is loaded. I've also added integration test for this new loader as it works with public data, and a notebook with example output run now. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	1 year ago
Michael Landis	6eacd88ae7	fix: revert docarray explicit transitive dependencies and use extras instead (#5015 ) tldr: The docarray [integration PR](https://github.com/hwchase17/langchain/pull/4483) introduced a pinned dependency to protobuf. This is a docarray dependency, not a langchain dependency. Since this is handled by the docarray dependencies, it is unnecessary here. Further, as a pinned dependency, this quickly leads to incompatibilities with application code that consumes the library. Much less with a heavily used library like protobuf. Detail: as we see in the [docarray integration](https://github.com/hwchase17/langchain/pull/4483/files#diff-50c86b7ed8ac2cf95bd48334961bf0530cdc77b5a56f852c5c61b89d735fd711R81-R83), the transitive dependencies of docarray were also listed as langchain dependencies. This is unnecessary as the docarray project has an appropriate [extras](`a01a05542d/pyproject.toml (L70)`). The docarray project also does not require this _pinned_ version of protobuf, rather [a minimum version](`a01a05542d/pyproject.toml (L41)`). So this pinned version was likely in error. To fix this, this PR reverts the explicit hnswlib and protobuf dependencies and adds the hnswlib extras install for docarray (which installs hnswlib and protobuf, as originally intended). Because version `0.32.0` of the docarray hnswlib extras added protobuf, we bump the docarray dependency from `^0.31.0` to `^0.32.0`. # revert docarray explicit transitive dependencies and use extras instead ## Who can review? @dev2049 -- reviewed the original PR @eyurtsev -- bumped the pinned protobuf dependency a few days ago --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	1 year ago
Harrison Chase	10ba201d05	Harrison/neo4j (#5078 ) Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	1 year ago
Harrison Chase	b0431c672b	Harrison/psychic (#5063 ) Co-authored-by: Ayan Bandyopadhyay <ayanb9440@gmail.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	1 year ago
Davis Chase	080eb1b3fc	Fix graphql tool (#4984 ) Fix construction and add unit test.	1 year ago
Mike McGarry	ddd595fe81	feature/4493 Improve Evernote Document Loader (#4577 ) # Improve Evernote Document Loader When exporting from Evernote you may export more than one note. Currently the Evernote loader concatenates the content of all notes in the export into a single document and only attaches the name of the export file as metadata on the document. This change ensures that each note is loaded as an independent document and all available metadata on the note e.g. author, title, created, updated are added as metadata on each document. It also uses an existing optional dependency of `html2text` instead of `pypandoc` to remove the need to download the pandoc application via `download_pandoc()` to be able to use the `pypandoc` python bindings. Fixes #4493 Co-authored-by: Mike McGarry <mike.mcgarry@finbourne.com> Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	1 year ago
Eugene Yurtsev	e46202829f	feat #4479 : TextLoader auto detect encoding and improved exceptions (#4927 ) # TextLoader auto detect encoding and enhanced exception handling - Add an option to enable encoding detection on `TextLoader`. - The detection is done using `chardet` - The loading is done by trying all detected encodings by order of confidence or raise an exception otherwise. ### New Dependencies: - `chardet` Fixes #4479 ## Before submitting <!-- If you're adding a new integration, include an integration test and an example notebook showing its use! --> ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: - @eyurtsev --------- Co-authored-by: blob42 <spike@w530>	1 year ago
Davis Chase	8966f61ca5	Zep memory (#4898 ) Co-authored-by: Daniel Chalef <daniel.chalef@private.org> Co-authored-by: Daniel Chalef <131175+danielchalef@users.noreply.github.com>	1 year ago
Eugene Yurtsev	c5ab9782c6	Add beautiful soup 4 to extended testing extra (#4869 ) # Add bs4 to extended testing extra Updating extended testing extra in preparation for more refactors.	1 year ago
Adam Quigley	e78c9be312	Add Confluence Loader unit tests (#3333 ) Adds some basic unit tests for the ConfluenceLoader that can be extended later. Ports this [PR from llama-hub](https://github.com/emptycrown/llama-hub/pull/208) and adapts it to `langchain`. @Jflick58 and @zywilliamli adding you here as potential reviewers --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	1 year ago
Raduan Al-Shedivat	00c6ec8a2d	fix(document_loaders/telegram): fix pandas calls + add tests (#4806 ) # Fix Telegram API loader + add tests. I was testing this integration and it was broken with next error: ```python message_threads = loader._get_message_threads(df) KeyError: False ``` Also, this particular loader didn't have any tests / related group in poetry, so I added those as well. @hwchase17 / @eyurtsev please take a look on this fix PR. --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	1 year ago
Eugene Yurtsev	c3b6129beb	Block sockets for unit-tests (#4803 ) # Block usage of sockets during unit tests Catch any tests that attempt to use the network.	1 year ago
Eugene Yurtsev	d403f659ea	Update google protobuf dep (#4798 ) # Update google protobuf dep Resolve: https://github.com/hwchase17/langchain/security/dependabot/11	1 year ago
Eugene Yurtsev	3ecd7c9641	Add check to verify poetry.toml (#4794 ) # Add poetry check to github action Check poetry toml file during tests for errors	1 year ago
Eugene Yurtsev	14bedf1cc5	Github Action: Fix poetry lock file checking (#4789 ) Fix how poetry lock file is checked to avoid skipping caches silently.	1 year ago
Roma	cb802edf75	[Feature] Add GraphQL Query Tool (#4409 ) # Add GraphQL Query Support This PR introduces a GraphQL API Wrapper tool that allows LLM agents to query GraphQL databases. The tool utilizes the httpx and gql Python packages to interact with GraphQL APIs and provides a simple interface for running queries with LLM agents. @vowelparrot --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>	1 year ago
Eugene Yurtsev	09587a3201	Clean up tests for pdf parsers (#4595 ) # Organize tests for pdf parsers Clean up tests for pdf parsers, remove duplicate tests, convert to unit tests.	1 year ago
Eugene Yurtsev	3c490b5ba3	Docugami DataLoader (#4727 ) ### Adds a document loader for Docugami Specifically: 1. Adds a data loader that talks to the [Docugami](http://docugami.com) API to download processed documents as semantic XML 2. Parses the semantic XML into chunks, with additional metadata capturing chunk semantics 3. Adds a detailed notebook showing how you can use additional metadata returned by Docugami for techniques like the [self-querying retriever](https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/self_query_retriever.html) 4. Adds an integration test, and related documentation Here is an example of a result that is not possible without the capabilities added by Docugami (from the notebook): <img width="1585" alt="image" src="https://github.com/hwchase17/langchain/assets/749277/bb6c1ce3-13dc-4349-a53b-de16681fdd5b"> --------- Co-authored-by: Taqi Jaffri <tjaffri@docugami.com> Co-authored-by: Taqi Jaffri <tjaffri@gmail.com>	1 year ago
Harrison Chase	cdc20d1203	Harrison/json loader fix (#4686 ) Co-authored-by: Triet Le <112841660+triet-lq-holistics@users.noreply.github.com>	1 year ago
Eugene Yurtsev	08ed927c32	Turn on extended tests (#4588 ) # Turn on strict extended tests This PR turns on strict testing for extended tests.	1 year ago
Zander Chase	d96f6a106b	Add Steamship Image Generation Tool (#4580 ) Co-authored-by: Enias Cailliau <enias@steamship.com>	1 year ago
Davis Chase	46b100ea63	Add DocArray vector stores (#4483 ) Thanks to @anna-charlotte and @jupyterjazz for the contribution! Made few small changes to get it across the finish line --------- Signed-off-by: anna-charlotte <charlotte.gerhaher@jina.ai> Signed-off-by: jupyterjazz <saba.sturua@jina.ai> Co-authored-by: anna-charlotte <charlotte.gerhaher@jina.ai> Co-authored-by: jupyterjazz <saba.sturua@jina.ai> Co-authored-by: Saba Sturua <45267439+jupyterjazz@users.noreply.github.com>	1 year ago
Eugene Yurtsev	80558b5b27	Add workflow for testing with all deps (#4410 ) # Add action to test with all dependencies installed PR adds a custom action for setting up poetry that allows specifying a cache key: https://github.com/actions/setup-python/issues/505#issuecomment-1273013236 This makes it possible to run 2 types of unit tests: (1) unit tests with only core dependencies (2) unit tests with extended dependencies (e.g., those that rely on an optional pdf parsing library) As part of this PR, we're moving some pdf parsing tests into the unit-tests section and making sure that these unit tests get executed when running with extended dependencies.	1 year ago
Aivin V. Solatorio	6567b73e1a	JSON loader (#4067 ) This implements a loader of text passages in JSON format. The `jq` syntax is used to define a schema for accessing the relevant contents from the JSON file. This requires dependency on the `jq` package: https://pypi.org/project/jq/. --------- Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>	1 year ago
Harrison Chase	fba6921b50	Harrison/one drive loader (#4081 ) Co-authored-by: José Ferraz Neto <netoferraz@gmail.com>	1 year ago
Harrison Chase	bd7e0a534c	Harrison/csv loader (#3771 ) Co-authored-by: mrT23 <tal.r@codium.ai>	1 year ago
Harrison Chase	c55ba43093	Harrison/vespa (#3761 ) Co-authored-by: Lester Solbakken <lesters@users.noreply.github.com>	1 year ago
Davis Chase	b807a114e4	Add query parsing unit tests (#3672 )	1 year ago
Davis Chase	3b609642ae	Self-query with generic query constructor (#3607 ) Alternate implementation of #3452 that relies on a generic query constructor chain and language and then has vector store-specific translation layer. Still refactoring and updating examples but general structure is there and seems to work s well as #3452 on exampels --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	1 year ago
Harrison Chase	a35bbbfa9e	Harrison/lancedb (#3634 ) Co-authored-by: Minh Le <minhle@canva.com>	1 year ago
Eduard van Valkenburg	a3e3f26090	Some more PowerBI pydantic and import fixes (#3461 )	1 year ago
Eduard van Valkenburg	ba7a5ac9d7	Azure CosmosDB memory (#3434 ) Still needs docs, otherwise works.	1 year ago
Davit Buniatyan	2c0023393b	Deep Lake mini upgrades (#3375 ) Improvements * set default num_workers for ingestion to 0 * upgraded notebooks for avoiding dataset creation ambiguity * added `force_delete_dataset_by_path` * bumped deeplake to 3.3.0 * creds arg passing to deeplake object that would allow custom S3 Notes * please double check if poetry is not messed up (thanks!) Asks * Would be great to create a shared slack channel for quick questions --------- Co-authored-by: Davit Buniatyan <d@activeloop.ai>	1 year ago
Harrison Chase	a6664be79c	Harrison/myscale (#3352 ) Co-authored-by: Fangrui Liu <fangruil@moqi.ai> Co-authored-by: 刘方瑞 <fangrui.liu@outlook.com> Co-authored-by: Fangrui.Liu <fangrui.liu@ubc.ca>	1 year ago
Harrison Chase	cc6fe18152	Harrison/power bi (#3205 ) Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>	1 year ago
Harrison Chase	d2520a5f1e	Harrison/ddg (#3206 ) Co-authored-by: itai <itai.marks@gmail.com> Co-authored-by: Itai Marks <itaim@users.noreply.github.com> Co-authored-by: Tianyi Pan <60060750+tipani86@users.noreply.github.com> Co-authored-by: Tianyi Pan <tianyi.pan@clobotics.com> Co-authored-by: Adilzhan Ismailov <13088690+aismlv@users.noreply.github.com> Co-authored-by: Justin Flick <Justinjayflick@gmail.com> Co-authored-by: Justin Flick <jflick@homesite.com>	1 year ago
Harrison Chase	f19b3890c9	Harrison/site map tqdm (#3184 ) Co-authored-by: Tianyi Pan <60060750+tipani86@users.noreply.github.com> Co-authored-by: Tianyi Pan <tianyi.pan@clobotics.com>	1 year ago
Harrison Chase	68cd37175e	Harrison/arxiv tool (#3186 ) Co-authored-by: leo-gan <leo.gan.57@gmail.com>	1 year ago
Harrison Chase	afd3e70ae5	Harrison/confluent loader (#2994 ) Co-authored-by: Justin Flick <Justinjayflick@gmail.com>	1 year ago
vowelparrot	5ca7ce77cd	Remove pythonrepl from LLM-MathChain (#2943 ) Use numexpr evaluate instead of the python REPL to avoid malicious code injection. Tested against the (limited) math dataset and got the same score as before. For more permissive tools (like the REPL tool itself), other approaches ought to be provided (some combination of Sanitizer + Restricted python + unprivileged-docker + ...), but for a calculator tool, only mathematical expressions should be permitted. See https://github.com/hwchase17/langchain/issues/814	1 year ago
Ankush Gola	ec59e9d886	Fix ChatAnthropic stop_sequences error (#2919 ) (#2920 ) Note to self: Always run integration tests, even on "that last minute change you thought would be safe" :) --------- Co-authored-by: Mike Lambert <mike.lambert@anthropic.com>	1 year ago
Harrison Chase	1e9378d0a8	Harrison/weaviate fixes (#2872 ) Co-authored-by: cs0lar <cristiano.solarino@gmail.com> Co-authored-by: cs0lar <cristiano.solarino@brightminded.com>	1 year ago
sergerdn	04c458a270	feat: improve pinecone tests (#2806 ) Improve the integration tests for Pinecone by adding an `.env.example` file for local testing. Additionally, add some dev dependencies specifically for integration tests. This change also helps me understand how Pinecone deals with certain things, see related issues https://github.com/hwchase17/langchain/issues/2484 https://github.com/hwchase17/langchain/issues/2816	1 year ago
Harrison Chase	e49f1e628c	Harrison/gpt cache (#2744 ) Co-authored-by: SimFG <bang.fu@zilliz.com>	1 year ago
Harrison Chase	507cee5ee5	Harrison/pinecone hybrid update (#2742 ) Co-authored-by: acatav <39461369+acatav@users.noreply.github.com> Co-authored-by: Amnon Catav <catav.amnon1@gmail.com>	1 year ago
sergerdn	4bdcedab54	fix: some imports for integration tests (#2612 ) Add more missed imports for integration tests. Bump `pytest` to the current latest version. Fix `tests/integration_tests/vectorstores/test_elasticsearch.py` to update its cassette(easy fix). Related PR: https://github.com/hwchase17/langchain/pull/2560	1 year ago

1 2 3 4 5

202 Commits (master)