rundpo.com: RLHF your models with DPO!!
Email:
Upload JSONL Dataset:
Drag and drop your JSONL file here, or click to select a file
Validating file...
Advanced Options
Base Model:
Llama-3.1-8B-IT
Llama-3.1-8B
Llama-3.1-70B-IT (coming soon)
Llama-3.1-70B (coming soon)
Custom model (coming soon)
Quantization:
8-bit
4-bit
Lightly fine-tune?
?
Fine-tunes the model on 5% of the DPO dataset, since DPO works better when it's in-distrubution.
Yes
No
Validation Set Size:
0.05
Sequence Length:
LoRA Rank:
?
Rank of the LoRA update matrices.
Should probably be around 8.
Learning Rate:
?
Will decay as training goes on.
0.0002
Epochs:
Enable Early Stopping
?
Stops training early if the model's performance on the validation set becomes worse n times in a row.
Submit
Please upload a valid JSONL file