RWKV-4
RWKV (pronounced RwaKuv) language model is an RNN with GPT-level LLM performance, and it can also be directly trained like a GPT transformer (parallelizable).
It's combining the best of RNN and transformer - great performance, fast inference, fast training, saves VRAM, "infinite" ctxlen, and free text embedding. Moreover it's 100% attention-free, and a LFAI project.
Installation and Setup
- Install the Python
rwkv
andtokenizer
packages
pip install rwkv tokenizer
- Download a RWKV model and place it in your desired directory
- Download a tokens file
Rwkv-4 models recommended VRAM
Model | 8bit | bf16/fp16 | fp32 |
---|---|---|---|
14B | 16GB | 28GB | >50GB |
7B | 8GB | 14GB | 28GB |
3B | 2.8GB | 6GB | 12GB |
1b5 | 1.3GB | 3GB | 6GB |
See the rwkv pip page for more information about strategies, including streaming and CUDA support.
Usage
RWKV
To use the RWKV wrapper, you need to provide the path to the pre-trained model file and the tokenizer's configuration.
from langchain_community.llms import RWKV
# Test the model
```python
def generate_prompt(instruction, input=None):
if input:
return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
# Instruction:
{instruction}
# Input:
{input}
# Response:
"""
else:
return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
# Instruction:
{instruction}
# Response:
"""
model = RWKV(model="./models/RWKV-4-Raven-3B-v7-Eng-20230404-ctx4096.pth", strategy="cpu fp32", tokens_path="./rwkv/20B_tokenizer.json")
response = model.invoke(generate_prompt("Once upon a time, "))
API Reference:RWKV