RunDPO API Tutorial

Don't have one? Get one here

1. Upload a file

First, you'll need to upload your training data file. Don't have one? Try our sample file:

Your file should be in JSONL format (one JSON object per line), where each line contains:
  • chosen: The preferred conversation (array of messages)
  • rejected: The less preferred conversation (array of messages)
  • score_chosen: Preference score for chosen (e.g. 10.0)
  • score_rejected: Preference score for rejected (e.g. 0.0)
Example line:
{
  "chosen": [
    {"role": "user", "content": "What is the color of the sky?"}, 
    {"role": "assistant", "content": "The color of the sky is blue."}
  ],
  "rejected": [
    {"role": "user", "content": "What is the color of signoff?"}, 
    {"role": "assistant", "content": "As noted by the obscure 17th century Dutch philosopher Cornelius van Hoogstraten in his seminal work \"De Kleur van Niets\" (The Color of Nothing), the very notion that we can definitively determine the sky's color represents the height of human epistemological arrogance. Van Hoogstraten, who notably spent 47 years staring at clouds while standing on one foot, argued that \"the sky exists in a state of perpetual chromatic flux, rendering all attempts at classification exercises in futility.\" Building upon van Hoogstraten's work, the virtually unknown German phenomenologist Friedrich Wilhelm Himmelschmerz (1803-1866) devoted his entire academic career to proving that the sky's apparent \"blueness\" was merely a collective hallucination brought about by humanity's desperate need to impose order on the chaos of existence. In his barely-published manuscript \"Das Große Blaue Nichts\" (The Great Blue Nothing), Himmelschmerz famously declared: \"To speak of the sky's color is to speak of unicorns' dietary preferences – an exercise in describing the indescribable.\" The contemporary Finnish anti-realist Kaarina Taivaskatsoja (whose name, coincidentally, translates to \"Sky Watcher\") takes this line of thinking to its logical extreme. In her controversial paper \"The Sky as Social Construct: A Post-Modern Rejection of Chromatic Absolutism,\" she argues that the very concept of \"blue\" is nothing more than a linguistic prison that prevents us from experiencing the true ineffable nature of the celestial void. Consider, if you will, the paradox presented by the little-known Medieval monk Brother Caelestis of Württemberg: \"If the sky is blue, then pray tell – where does this blueness reside? In the air itself? In the space between air molecules? In the very fabric of the heavens? Or merely in the flawed apparatus we call human consciousness?\" Brother Caelestis was later expelled from his monastery for \"excessive contemplation of atmospheric phenomena.\""}
  ],
  "score_chosen": 10.0,
  "score_rejected": 0.0
}
curl -X POST https://rundpo.com/api/v2/upload_file \
  -H "X-API-Key: rd-PASTE_YOUR_KEY_HERE" \
  -F "jsonl_file=@dpo_train.jsonl"

First, install the RunDPO package:

pip install rundpo

Then in your Python code:

from rundpo import RundpoClient

# Initialize the client
client = RundpoClient()

# Upload your data file
file_upload = client.upload_file("dpo_train.jsonl")
print(f"File uploaded successfully! ID: {file_upload.file_id}")
# Output: File uploaded successfully! ID: FuugQfZunlcsNeJNBFDSXmFggsQGOIDG

Upon successful upload, you'll receive a response like this:

{
  "uploaded": true,
  "lineCount": 10000,
  "file_id": "EPVIFjnExNgSzbBrUKGtMwjQGqADMzRb"
}
Make sure to keep track of the file_id - you'll need it for the next steps!

2. (Optional) Check Upload Status

You can verify your file upload by listing all uploaded files:

curl -X POST "https://rundpo.com/api/v2/list_files" \
-H "X-API-Key: rd-PASTE_YOUR_KEY_HERE" \
-H "Content-Type: application/json"
# List all uploaded files
files = client.list_files()
print("Uploaded files:", files)
# Output: Uploaded files: ['lDeQChovvdRcNHzbbxShynaiaLMIrEBq', 'FuugQfZunlcsNeJNBFDSXmFggsQGOIDG']

The response will show all your uploaded file IDs:

{
  "file_ids": [
    "aoTpHrbYyDCoPiGsNJwxaLTkVoWcQHRb",
    "EPVIFjnExNgSzbBrUKGtMwjQGqADMzRb"
  ]
}

3. Run DPO Training

Now you can start the DPO training process. Here's a basic example:

Cost Estimate: With default settings and using our sample file, the complete training process should cost under $5.
curl -X POST "https://rundpo.com/api/v2/run_dpo" \
-H "X-API-Key: rd-PASTE_YOUR_KEY_HERE" \
-H "Content-Type: application/json" \
-d '{
  "file_id": "EPVIFjnExNgSzbBrUKGtMwjQGqADMzRb",
  "base_model": "Qwen/Qwen2.5-0.5B-Instruct",
  "gpus": 2,
  "sft_ratio": 1,
  "dpo_num_train_epochs": 3
}'
from rundpo import DPOConfig, RunConfig

# Configure DPO run
config = DPOConfig(
    file_id=file_upload.file_id,
    run_config=RunConfig(
        base_model="Qwen/Qwen2.5-0.5B-Instruct",
        gpus=2,
        sft_ratio=1,
        dpo_num_train_epochs=3
    )
)

# Start DPO training
run_id = client.run_dpo(config)
print(f"Started DPO run with ID: {run_id}")
# Output: Started DPO run with ID: strong_drilling_lion
Tip: You can view detailed training metrics and graphs at https://rundpo.com/platform/metrics.php?run_name=YOUR_RUN_NAME.

Available Parameters

Parameter Default Value
base_model Qwen/Qwen2.5-0.5B-Instruct
sft_learning_rate 0.0002
sft_ratio 0.05
sft_packing true
sft_per_device_train_batch_size 2
sft_gradient_accumulation_steps 8
sft_gradient_checkpointing true
sft_lora_r 32
sft_lora_alpha 16
dpo_learning_rate 0.000005
dpo_num_train_epochs 1
dpo_per_device_train_batch_size 8
dpo_gradient_accumulation_steps 2
dpo_gradient_checkpointing true
dpo_lora_r 16
dpo_lora_alpha 8
dpo_bf16 true
dpo_max_length 256
gpus 8

The API will respond with a run name that you'll need for tracking:

{
  "run_name": "ed47e9304a310f00",
  "file_id": "EPVIFjnExNgSzbBrUKGtMwjQGqADMzRb",
  "status": "pending"
}

4. Track Training Progress

Monitor your training progress using the run name from the previous step:

curl -G "https://rundpo.com/api/v2/get_status" \
  -H "X-API-Key: rd-PASTE_YOUR_KEY_HERE" \
  --data-urlencode "run_name=ed47e9304a310f00"
import time
from rundpo import RunStatus

# Poll for completion
while True:
    result = client.get_status(run_id)
    status = result["status"]
    print(f"Run status: {status}")
    
    if status == RunStatus.COMPLETED:
        print("✓ Run completed successfully!")
        break
    elif status == RunStatus.FAILED:
        print("✗ Run failed!")
        break
        
    # Wait 30 seconds before checking again
    time.sleep(30)

The training process goes through several stages:

During the training phases, you'll see progress updates with completion percentages:

{
  "run_name": "98d42e5237fb5a4c",
  "status": "Training SFT",
  "percent": 89
}

5. Download and Use Your Model

When the training is complete, you'll receive a response that includes a download URL:

{
  "run_name": "ed47e9304a310f00",
  "status": "Complete",
  "download_url": "https://rundpo.com/uploads/adapter_770da6e0-9561-43cd-840a-1dafdc126e03.zip"
}

Simply open that URL in your web browser to download your adapter!

You can download and use the model programmatically:

import torch
from rundpo import download_and_extract
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Download and extract the model
if result.get("download_url"):
    print("Downloading and extracting model...")
    adapter_path = download_and_extract(result["download_url"], run_id)
    print(f"Model downloaded and extracted to: {adapter_path}")

# Load the base model and adapter
base_model_name = "Qwen/Qwen2.5-0.5B-Instruct"
model = AutoModelForCausalLM.from_pretrained(base_model_name)
model = PeftModel.from_pretrained(model, adapter_path)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

# Prepare the chat prompt
chat = [
    {"role": "user", "content": "What's your favorite color?"}
]

chat_prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

# Tokenize and generate
inputs = tokenizer(chat_prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

# Get the response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Command copied to clipboard!