First, you'll need to upload your training data file. Don't have one? Try our sample file:
chosen
: The preferred conversation (array of messages)rejected
: The less preferred conversation (array of messages)score_chosen
: Preference score for chosen (e.g. 10.0)score_rejected
: Preference score for rejected (e.g. 0.0){
"chosen": [
{"role": "user", "content": "What is the color of the sky?"},
{"role": "assistant", "content": "The color of the sky is blue."}
],
"rejected": [
{"role": "user", "content": "What is the color of signoff?"},
{"role": "assistant", "content": "As noted by the obscure 17th century Dutch philosopher Cornelius van Hoogstraten in his seminal work \"De Kleur van Niets\" (The Color of Nothing), the very notion that we can definitively determine the sky's color represents the height of human epistemological arrogance. Van Hoogstraten, who notably spent 47 years staring at clouds while standing on one foot, argued that \"the sky exists in a state of perpetual chromatic flux, rendering all attempts at classification exercises in futility.\" Building upon van Hoogstraten's work, the virtually unknown German phenomenologist Friedrich Wilhelm Himmelschmerz (1803-1866) devoted his entire academic career to proving that the sky's apparent \"blueness\" was merely a collective hallucination brought about by humanity's desperate need to impose order on the chaos of existence. In his barely-published manuscript \"Das Große Blaue Nichts\" (The Great Blue Nothing), Himmelschmerz famously declared: \"To speak of the sky's color is to speak of unicorns' dietary preferences – an exercise in describing the indescribable.\" The contemporary Finnish anti-realist Kaarina Taivaskatsoja (whose name, coincidentally, translates to \"Sky Watcher\") takes this line of thinking to its logical extreme. In her controversial paper \"The Sky as Social Construct: A Post-Modern Rejection of Chromatic Absolutism,\" she argues that the very concept of \"blue\" is nothing more than a linguistic prison that prevents us from experiencing the true ineffable nature of the celestial void. Consider, if you will, the paradox presented by the little-known Medieval monk Brother Caelestis of Württemberg: \"If the sky is blue, then pray tell – where does this blueness reside? In the air itself? In the space between air molecules? In the very fabric of the heavens? Or merely in the flawed apparatus we call human consciousness?\" Brother Caelestis was later expelled from his monastery for \"excessive contemplation of atmospheric phenomena.\""}
],
"score_chosen": 10.0,
"score_rejected": 0.0
}
curl -X POST https://rundpo.com/api/v2/upload_file \
-H "X-API-Key: rd-PASTE_YOUR_KEY_HERE" \
-F "jsonl_file=@dpo_train.jsonl"
First, install the RunDPO package:
pip install rundpo
Then in your Python code:
from rundpo import RundpoClient
# Initialize the client
client = RundpoClient()
# Upload your data file
file_upload = client.upload_file("dpo_train.jsonl")
print(f"File uploaded successfully! ID: {file_upload.file_id}")
# Output: File uploaded successfully! ID: FuugQfZunlcsNeJNBFDSXmFggsQGOIDG
Upon successful upload, you'll receive a response like this:
{
"uploaded": true,
"lineCount": 10000,
"file_id": "EPVIFjnExNgSzbBrUKGtMwjQGqADMzRb"
}
file_id
- you'll need it for the next steps!You can verify your file upload by listing all uploaded files:
curl -X POST "https://rundpo.com/api/v2/list_files" \
-H "X-API-Key: rd-PASTE_YOUR_KEY_HERE" \
-H "Content-Type: application/json"
# List all uploaded files
files = client.list_files()
print("Uploaded files:", files)
# Output: Uploaded files: ['lDeQChovvdRcNHzbbxShynaiaLMIrEBq', 'FuugQfZunlcsNeJNBFDSXmFggsQGOIDG']
The response will show all your uploaded file IDs:
{
"file_ids": [
"aoTpHrbYyDCoPiGsNJwxaLTkVoWcQHRb",
"EPVIFjnExNgSzbBrUKGtMwjQGqADMzRb"
]
}
Now you can start the DPO training process. Here's a basic example:
curl -X POST "https://rundpo.com/api/v2/run_dpo" \
-H "X-API-Key: rd-PASTE_YOUR_KEY_HERE" \
-H "Content-Type: application/json" \
-d '{
"file_id": "EPVIFjnExNgSzbBrUKGtMwjQGqADMzRb",
"base_model": "Qwen/Qwen2.5-0.5B-Instruct",
"gpus": 2,
"sft_ratio": 1,
"dpo_num_train_epochs": 3
}'
from rundpo import DPOConfig, RunConfig
# Configure DPO run
config = DPOConfig(
file_id=file_upload.file_id,
run_config=RunConfig(
base_model="Qwen/Qwen2.5-0.5B-Instruct",
gpus=2,
sft_ratio=1,
dpo_num_train_epochs=3
)
)
# Start DPO training
run_id = client.run_dpo(config)
print(f"Started DPO run with ID: {run_id}")
# Output: Started DPO run with ID: strong_drilling_lion
https://rundpo.com/platform/metrics.php?run_name=YOUR_RUN_NAME
.
Parameter | Default Value |
---|---|
base_model | Qwen/Qwen2.5-0.5B-Instruct |
sft_learning_rate | 0.0002 |
sft_ratio | 0.05 |
sft_packing | true |
sft_per_device_train_batch_size | 2 |
sft_gradient_accumulation_steps | 8 |
sft_gradient_checkpointing | true |
sft_lora_r | 32 |
sft_lora_alpha | 16 |
dpo_learning_rate | 0.000005 |
dpo_num_train_epochs | 1 |
dpo_per_device_train_batch_size | 8 |
dpo_gradient_accumulation_steps | 2 |
dpo_gradient_checkpointing | true |
dpo_lora_r | 16 |
dpo_lora_alpha | 8 |
dpo_bf16 | true |
dpo_max_length | 256 |
gpus | 8 |
The API will respond with a run name that you'll need for tracking:
{
"run_name": "ed47e9304a310f00",
"file_id": "EPVIFjnExNgSzbBrUKGtMwjQGqADMzRb",
"status": "pending"
}
Monitor your training progress using the run name from the previous step:
curl -G "https://rundpo.com/api/v2/get_status" \
-H "X-API-Key: rd-PASTE_YOUR_KEY_HERE" \
--data-urlencode "run_name=ed47e9304a310f00"
import time
from rundpo import RunStatus
# Poll for completion
while True:
result = client.get_status(run_id)
status = result["status"]
print(f"Run status: {status}")
if status == RunStatus.COMPLETED:
print("✓ Run completed successfully!")
break
elif status == RunStatus.FAILED:
print("✗ Run failed!")
break
# Wait 30 seconds before checking again
time.sleep(30)
The training process goes through several stages:
During the training phases, you'll see progress updates with completion percentages:
{
"run_name": "98d42e5237fb5a4c",
"status": "Training SFT",
"percent": 89
}
When the training is complete, you'll receive a response that includes a download URL:
{
"run_name": "ed47e9304a310f00",
"status": "Complete",
"download_url": "https://rundpo.com/uploads/adapter_770da6e0-9561-43cd-840a-1dafdc126e03.zip"
}
Simply open that URL in your web browser to download your adapter!
You can download and use the model programmatically:
import torch
from rundpo import download_and_extract
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Download and extract the model
if result.get("download_url"):
print("Downloading and extracting model...")
adapter_path = download_and_extract(result["download_url"], run_id)
print(f"Model downloaded and extracted to: {adapter_path}")
# Load the base model and adapter
base_model_name = "Qwen/Qwen2.5-0.5B-Instruct"
model = AutoModelForCausalLM.from_pretrained(base_model_name)
model = PeftModel.from_pretrained(model, adapter_path)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
# Prepare the chat prompt
chat = [
{"role": "user", "content": "What's your favorite color?"}
]
chat_prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
# Tokenize and generate
inputs = tokenizer(chat_prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=100,
temperature=0.7,
top_p=0.9,
do_sample=True
)
# Get the response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)