Signed-off-by: jacob <jacoobes@sern.dev>
Signed-off-by: limez <limez@protonmail.com>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Co-authored-by: limez <limez@protonmail.com>
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
The original [GPT4All typescript bindings](https://github.com/nomic-ai/gpt4all-ts) are now out of date.
## Contents
* New bindings created by [jacoobes](https://github.com/jacoobes), [limez](https://github.com/iimez) and the [nomic ai community](https://home.nomic.ai), for all to use.
* The nodejs api has made strides to mirror the python api. It is not 100% mirrored, but many pieces of the api resemble its python counterpart.
* Everything should work out the box.
* See [API Reference](#api-reference)
* See [Examples](#api-example)
* See [Developing](#develop)
* GPT4ALL nodejs bindings created by [jacoobes](https://github.com/jacoobes), [limez](https://github.com/iimez) and the [nomic ai community](https://home.nomic.ai), for all to use.
## Api Example
### Chat Completion
```js
import { createCompletion, loadModel } from '../src/gpt4all.js'
import { LLModel, createCompletion, DEFAULT_DIRECTORY, DEFAULT_LIBRARIES_DIRECTORY, loadModel } from '../src/gpt4all.js'
* Should include prebuilds to avoid painful node-gyp errors
* \[x] createChatSession ( the python equivalent to create\_chat\_session )
* \[x] generateTokens, the new name for createTokenStream. As of 3.2.0, this is released but not 100% tested. Check spec/generator.mjs!
* \[x] ~~createTokenStream, an async iterator that streams each token emitted from the model. Planning on following this [example](https://github.com/nodejs/node-addon-examples/tree/main/threadsafe-async-iterator)~~ May not implement unless someone else can complete
* \[x] prompt models via a threadsafe function in order to have proper non blocking behavior in nodejs
* \[ ] ~~createTokenStream, an async iterator that streams each token emitted from the model. Planning on following this [example](https://github.com/nodejs/node-addon-examples/tree/main/threadsafe-async-iterator)~~ May not implement unless someone else can complete
* \[x] generateTokens is the new name for this^
* \[x] proper unit testing (integrate with circle ci)
* \[x] publish to npm under alpha tag `gpt4all@alpha`
* \[x] have more people test on other platforms (mac tester needed)
* Throws **[Error](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Error)** If the chat session is not the active chat session of the model.
Returns **[Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)<[CompletionReturn](#completionreturn)>** The model's response to the prompt.
#### InferenceModel
InferenceModel represents an LLM which can make chat predictions, similar to GPT transformers.
##### createChatSession
Create a chat session with the model.
###### Parameters
* `options`**[ChatSessionOptions](#chatsessionoptions)?** The options for the chat session.
Returns **[Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)\<ChatSession>** The chat session.
##### generate
Prompts the model with a given input and optional parameters.
Returns **[Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)<[CompletionReturn](#completionreturn)>** The model's response to the prompt.
##### dispose
delete and cleanup the native model
@ -307,6 +448,10 @@ delete and cleanup the native model
@ -360,7 +505,7 @@ Set the number of threads used for model inference.
Returns **void** 
##### raw\_prompt
##### infer
Prompt the model with a given input and optional parameters.
This is the raw output from model.
@ -368,23 +513,20 @@ Use the prompt function exported for a value
###### Parameters
* `q` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** The prompt input.
* `params` **Partial<[LLModelPromptContext](#llmodelpromptcontext)>** Optional parameters for the prompt context.
* `prompt` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** The prompt input.
* `promptContext` **Partial<[LLModelPromptContext](#llmodelpromptcontext)>** Optional parameters for the prompt context.
* `callback`**[TokenCallback](#tokencallback)?** optional callback to control token generation.
Returns **[Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)<[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)>** The result of the model prompt.
Returns **[Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)<[InferenceResult](#inferenceresult)>** The result of the model prompt.
##### embed
Embed text with the model. Keep in mind that
not all models can embed text, (only bert can embed as of 07/16/2023 (mm/dd/yyyy))
Loads a machine learning model with the specified name. The defacto way to create a model.
@ -474,18 +672,46 @@ By default this will download a model from the official GPT4ALL website, if a mo
Returns **[Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)<([InferenceModel](#inferencemodel) | [EmbeddingModel](#embeddingmodel))>** A promise that resolves to an instance of the loaded LLModel.
#### InferenceProvider
Interface for inference, implemented by InferenceModel and ChatSession.
#### createCompletion
The nodejs equivalent to python binding's chat\_completion
##### Parameters
* `model` **[InferenceModel](#inferencemodel)** The language model object.
* `messages` **[Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array)<[PromptMessage](#promptmessage)>** The array of messages for the conversation.
* `provider` **[InferenceProvider](#inferenceprovider)** The inference model object or chat session
* `message` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** The user input message
* `options`**[CompletionOptions](#completionoptions)** The options for creating the completion.
Returns **[CompletionReturn](#completionreturn)** The completion result.
#### createCompletionStream
Streaming variant of createCompletion, returns a stream of tokens and a promise that resolves to the completion result.
##### Parameters
* `provider`**[InferenceProvider](#inferenceprovider)** The inference model object or chat session
* `message`**[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** The user input message.
* `options`**[CompletionOptions](#completionoptions)** The options for creating the completion.
Returns **[CompletionStreamReturn](#completionstreamreturn)** An object of token stream and the completion result promise.
#### createCompletionGenerator
Creates an async generator of tokens
##### Parameters
* `provider`**[InferenceProvider](#inferenceprovider)** The inference model object or chat session
* `message`**[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** The user input message.
* `options`**[CompletionOptions](#completionoptions)** The options for creating the completion.
Returns **AsyncGenerator<[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)>** The stream of generated tokens
#### createEmbedding
The nodejs moral equivalent to python binding's Embed4All().embed()
@ -510,34 +736,15 @@ Indicates if verbose logging is enabled.
* `llmodel`**[InferenceModel](#inferencemodel)** The language model object.
* `messages`**[Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array)<[PromptMessage](#promptmessage)>** The array of messages for the conversation.
* `options`**[CompletionOptions](#completionoptions)** The options for creating the completion.
* `callback`**[TokenCallback](#tokencallback)** optional callback to control token generation.
Returns **AsyncGenerator<[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)>** The stream of generated tokens
#### DEFAULT\_DIRECTORY
From python api:
@ -759,7 +968,7 @@ By default this downloads without waiting. use the controller returned to alter
##### Parameters
* `modelName`**[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** The model to be downloaded.
* `options`**DownloadOptions** to pass into the downloader. Default is { location: (cwd), verbose: false }.
* `options`**[DownloadModelOptions](#downloadmodeloptions)** to pass into the downloader. Default is { location: (cwd), verbose: false }.
The original [GPT4All typescript bindings](https://github.com/nomic-ai/gpt4all-ts) are now out of date.
* New bindings created by [jacoobes](https://github.com/jacoobes), [limez](https://github.com/iimez) and the [nomic ai community](https://home.nomic.ai), for all to use.
* The nodejs api has made strides to mirror the python api. It is not 100% mirrored, but many pieces of the api resemble its python counterpart.
* Everything should work out the box.
## Breaking changes in version 4!!
* See [Transition](#changes)
## Contents
* See [API Reference](#api-reference)
* See [Examples](#api-example)
* See [Developing](#develop)
* GPT4ALL nodejs bindings created by [jacoobes](https://github.com/jacoobes), [limez](https://github.com/iimez) and the [nomic ai community](https://home.nomic.ai), for all to use.
* [spare change](https://github.com/sponsors/jacoobes) for a college student? 🤑
## Api Examples
### Chat Completion
Use a chat session to keep context between completions. This is useful for efficient back and forth conversations.
```js
import { createCompletion, loadModel } from '../src/gpt4all.js'
import { createCompletion, loadModel } from "../src/gpt4all.js";
const model = await loadModel("orca-mini-3b-gguf2-q4_0.gguf", {
verbose: true, // logs loaded model configuration
device: "gpu", // defaults to 'cpu'
nCtx: 2048, // the maximum sessions context window size.
});
// initialize a chat session on the model. a model instance can have only one chat session at a time.
const chat = await model.createChatSession({
// any completion options set here will be used as default for all completions in this chat session
temperature: 0.8,
// a custom systemPrompt can be set here. note that the template depends on the model.
// if unset, the systemPrompt that comes with the model will be used.
systemPrompt: "### System:\nYou are an advanced mathematician.\n\n",
});
// create a completion using a string as input
const res1 = await createCompletion(chat, "What is 1 + 1?");
console.debug(res1.choices[0].message);
// multiple messages can be input to the conversation at once.
// note that if the last message is not of role 'user', an empty message will be returned.
await createCompletion(chat, [
{
role: "user",
content: "What is 2 + 2?",
},
{
role: "assistant",
content: "It's 5.",
},
]);
const res3 = await createCompletion(chat, "Could you recalculate that?");
console.debug(res3.choices[0].message);
model.dispose();
```
const model = await loadModel('mistral-7b-openorca.Q4_0.gguf', { verbose: true });
### Stateless usage
You can use the model without a chat session. This is useful for one-off completions.
const response = await createCompletion(model, [
{ role : 'system', content: 'You are meant to be annoying and unhelpful.' },
{ role : 'user', content: 'What is 1 + 1?' }
```js
import { createCompletion, loadModel } from "../src/gpt4all.js";
const model = await loadModel("orca-mini-3b-gguf2-q4_0.gguf");
// createCompletion methods can also be used on the model directly.
// context is not maintained between completions.
const res1 = await createCompletion(model, "What is 1 + 1?");
console.debug(res1.choices[0].message);
// a whole conversation can be input as well.
// note that if the last message is not of role 'user', an error will be thrown.
const res2 = await createCompletion(model, [
{
role: "user",
content: "What is 2 + 2?",
},
{
role: "assistant",
content: "It's 5.",
},
{
role: "user",
content: "Could you recalculate that?",
},
]);
console.debug(res2.choices[0].message);
```
### Embedding
```js
import { createEmbedding, loadModel } from '../src/gpt4all.js'
import { loadModel, createEmbedding } from '../src/gpt4all.js'
* MingW works as well to build the gpt4all-backend. **HOWEVER**, this package works only with MSVC built dlls.
* MingW script works to build the gpt4all-backend. We left it there just in case. **HOWEVER**, this package works only with MSVC built dlls.
### Requirements
@ -76,23 +201,18 @@ cd gpt4all-bindings/typescript
* To Build and Rebuild:
```sh
yarn
node scripts/prebuild.js
```
* llama.cpp git submodule for gpt4all can be possibly absent. If this is the case, make sure to run in llama.cpp parent directory
```sh
git submodule update --init --depth 1 --recursive
git submodule update --init --recursive
```
```sh
yarn build:backend
```
This will build platform-dependent dynamic libraries, and will be located in runtimes/(platform)/native The only current way to use them is to put them in the current working directory of your application. That is, **WHEREVER YOU RUN YOUR NODE APPLICATION**
* llama-xxxx.dll is required.
* According to whatever model you are using, you'll need to select the proper model loader.
* For example, if you running an Mosaic MPT model, you will need to select the mpt-(buildvariant).(dynamiclibrary)
This will build platform-dependent dynamic libraries, and will be located in runtimes/(platform)/native
### Test
@ -130,17 +250,20 @@ yarn test
* why your model may be spewing bull 💩
* The downloaded model is broken (just reinstall or download from official site)
* That's it so far
* Your model is hanging after a call to generate tokens.
* Is `nPast` set too high? This may cause your model to hang (03/16/2024), Linux Mint, Ubuntu 22.04
* Your GPU usage is still high after node.js exits.
* Make sure to call `model.dispose()`!!!
### Roadmap
This package is in active development, and breaking changes may happen until the api stabilizes. Here's what's the todo list:
This package has been stabilizing over time development, and breaking changes may happen until the api stabilizes. Here's what's the todo list:
* \[ ] Purely offline. Per the gui, which can be run completely offline, the bindings should be as well.
* Should include prebuilds to avoid painful node-gyp errors
* \[] createChatSession ( the python equivalent to create\_chat\_session )
* \[x] generateTokens, the new name for createTokenStream. As of 3.2.0, this is released but not 100% tested. Check spec/generator.mjs!
* \[x] createChatSession ( the python equivalent to create\_chat\_session )
* \[x] generateTokens, the new name for createTokenStream. As of 3.2.0, this is released but not 100% tested. Check spec/generator.mjs!
* \[x] ~~createTokenStream, an async iterator that streams each token emitted from the model. Planning on following this [example](https://github.com/nodejs/node-addon-examples/tree/main/threadsafe-async-iterator)~~ May not implement unless someone else can complete
* \[x] prompt models via a threadsafe function in order to have proper non blocking behavior in nodejs
* \[x] generateTokens is the new name for this^
@ -149,5 +272,13 @@ This package is in active development, and breaking changes may happen until the
* \[x] have more people test on other platforms (mac tester needed)
* \[x] switch to new pluggable backend
## Changes
This repository serves as the new bindings for nodejs users.
- If you were a user of [these bindings](https://github.com/nomic-ai/gpt4all-ts), they are outdated.
- Version 4 includes the follow breaking changes
* `createEmbedding`&`EmbeddingModel.embed()` returns an object, `EmbeddingResult`, instead of a float32array.
* Removed deprecated types `ModelType` and `ModelFile`
* Removed deprecated initiation of model by string path only
// These were the default header/footer prompts used for non-chat single turn completions.
// both seem to be working well still with some models, so keeping them here for reference.
// promptHeader: '### Instruction: The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.',