multilanguage support
parent
e794145b2e
commit
b18274dd3e
Binary file not shown.
After Width: | Height: | Size: 220 KiB |
@ -0,0 +1 @@
|
||||
export { locales as middleware } from 'nextra/locales'
|
@ -0,0 +1,24 @@
|
||||
{
|
||||
"index": "Prompt Engineering (ZH)",
|
||||
"introduction": "Introduction",
|
||||
"techniques": "Techniques",
|
||||
"applications": "Applications",
|
||||
"models": "Models",
|
||||
"risks": "Risks & Misuses",
|
||||
"papers": "Papers",
|
||||
"tools": "Tools",
|
||||
"notebooks": "Notebooks",
|
||||
"datasets": "Datasets",
|
||||
"readings": "Additional Readings",
|
||||
"about": {
|
||||
"title": "About",
|
||||
"type": "page"
|
||||
},
|
||||
"contact": {
|
||||
"title": "Contact ↗",
|
||||
"type": "page",
|
||||
"href": "https://twitter.com/dair_ai",
|
||||
"newWindow": true
|
||||
}
|
||||
}
|
||||
|
@ -0,0 +1,9 @@
|
||||
# Prompting Applications
|
||||
|
||||
import { Callout } from 'nextra-theme-docs'
|
||||
|
||||
In this section, we will cover some advanced and interesting ways we can use prompt engineering to perform useful and more advanced tasks.
|
||||
|
||||
<Callout emoji="⚠️">
|
||||
This section is under heavy development.
|
||||
</Callout>
|
@ -1,9 +0,0 @@
|
||||
# Prompting Applications
|
||||
|
||||
import { Callout } from 'nextra-theme-docs'
|
||||
|
||||
In this guide we will cover some advanced and interesting ways we can use prompt engineering to perform useful and more advanced tasks.
|
||||
|
||||
<Callout emoji="⚠️">
|
||||
This section is under heavy development.
|
||||
</Callout>
|
@ -0,0 +1,3 @@
|
||||
# Prompt Engineering (ZH)
|
||||
|
||||
...
|
@ -0,0 +1,8 @@
|
||||
{
|
||||
"flan": "Flan",
|
||||
"chatgpt": "ChatGPT",
|
||||
"llama": "LLaMA",
|
||||
"gpt-4": "GPT-4",
|
||||
"collection": "Model Collection"
|
||||
}
|
||||
|
@ -1,6 +0,0 @@
|
||||
{
|
||||
"flan": "Flan",
|
||||
"chatgpt": "ChatGPT",
|
||||
"gpt-4": "GPT-4"
|
||||
}
|
||||
|
@ -0,0 +1,27 @@
|
||||
# Model Collection
|
||||
|
||||
import { Callout, FileTree } from 'nextra-theme-docs'
|
||||
|
||||
<Callout emoji="⚠️">
|
||||
This section is under heavy development.
|
||||
</Callout>
|
||||
|
||||
This section consists of a collection and summary of notable and foundational LLMs.
|
||||
|
||||
|
||||
|
||||
## Models
|
||||
|
||||
| Model | Description |
|
||||
| --- | --- |
|
||||
| [BERT](https://arxiv.org/abs/1810.04805) | Bidirectional Encoder Representations from Transformers |
|
||||
| [RoBERTa](https://arxiv.org/abs/1907.11692) | A Robustly Optimized BERT Pretraining Approach |
|
||||
| [ALBERT](https://arxiv.org/abs/1909.11942) | A Lite BERT for Self-supervised Learning of Language Representations |
|
||||
| [XLNet](https://arxiv.org/abs/1906.08237) | Generalized Autoregressive Pretraining for Language Understanding and Generation |
|
||||
| [GPT](https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf) | Language Models are Unsupervised Multitask Learners |
|
||||
| [GPT-2](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf) | Language Models are Unsupervised Multitask Learners |
|
||||
| [GPT-3](https://arxiv.org/abs/2005.14165) | Language Models are Few-Shot Learners |
|
||||
| [T5](https://arxiv.org/abs/1910.10683) | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer |
|
||||
| [CTRL](https://arxiv.org/abs/1909.05858) | CTRL: A Conditional Transformer Language Model for Controllable Generation |
|
||||
| [BART](https://arxiv.org/abs/1910.13461) | Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension |
|
||||
| [Chinchilla](https://arxiv.org/abs/2203.15556)(Hoffman et al. 2022) | Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data. |
|
@ -0,0 +1,39 @@
|
||||
## LLaMA: Open and Efficient Foundation Language Models
|
||||
|
||||
<Callout emoji="⚠️">
|
||||
This section is under heavy development.
|
||||
</Callout>
|
||||
|
||||
|
||||
import {Screenshot} from 'components/screenshot'
|
||||
import { Callout, FileTree } from 'nextra-theme-docs'
|
||||
import LLAMA1 from '../../img/llama-1.png'
|
||||
|
||||
|
||||
## What's new?
|
||||
|
||||
This paper introduces a collection of foundation language models ranging from 7B to 65B parameters.
|
||||
|
||||
The models are trained on trillion of tokens with publicly available datasets.
|
||||
|
||||
The work by [(Hoffman et al. 2022)](https://arxiv.org/abs/2203.15556) shows that given a compute budget smaller models trained on a lot more data can achieve better performance than the larger counterparts. This work recommends training 10B models on 200B tokens. However, the LLaMA paper finds that the performance of a 7B model continues to improve even after 1T tokens.
|
||||
|
||||
<Screenshot src={LLAMA1} alt="LLAMA1" />
|
||||
|
||||
This work focuses on training models (LLaMA) that achieve the best possible performance at various inference budgets, by training on more tokens.
|
||||
|
||||
|
||||
## Capabilities & Key Results
|
||||
|
||||
Overall, LLaMA-13B outperform GPT-3(175B) on many benchmarks despite being 10x smaller and possible to run a single GPU. LLaMA 65B is competitive with models like Chinchilla-70B and PaLM-540B.
|
||||
|
||||
|
||||
*Paper:* [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971)
|
||||
|
||||
*Code:* https://github.com/facebookresearch/llama
|
||||
|
||||
## References
|
||||
|
||||
- [GPT4All](https://github.com/nomic-ai/gpt4all) (March 2023)
|
||||
- [ChatDoctor: A Medical Chat Model Fine-tuned on LLaMA Model using Medical Domain Knowledge](https://arxiv.org/abs/2303.14070) (March 2023)
|
||||
- [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) (March 2023)
|
Loading…
Reference in New Issue