RunDPO Documentation
Welcome to the RunDPO documentation. RunDPO is a powerful service that enables you to run Direct Preference Optimization (DPO) training on your models. This documentation will guide you through using both our REST API and Python client.
Tutorial
For a step-by-step guide on using RunDPO to train models on your own data, see our tutorial.
Introduction
RunDPO provides a simple way to train language models using Direct Preference Optimization. The service handles all the infrastructure complexity, allowing you to focus on your training data and model configuration.
- A RunDPO account with sufficient credits
- Your training data prepared in JSONL format
- Your API key (available from the platform dashboard)
Authentication
All API requests require authentication using an API key. Include your API key in the X-API-Key header with each request:
curl -H "X-API-Key: rd-YOUR_API_KEY" https://rundpo.com/api/v2/endpointInstallation
Python Package
Install the RunDPO Python client using pip:
pip install rundpoBasic Usage
Here's a quick example of using the Python client:
from rundpo import RundpoClient, DPOConfig, RunConfig
# Initialize the client
client = RundpoClient()
# Upload your training data
file_upload = client.upload_file("training_data.jsonl")
# Configure and start DPO training
config = DPOConfig(
    file_id=file_upload.file_id,
    run_config=RunConfig(
        base_model="Qwen/Qwen2-0.5B",
        gpus=2
    )
)
# Start the training
run_id = client.run_dpo(config)File Operations
Upload File
Upload your training data file in JSONL format.
- chosen: The preferred conversation (array of messages)
- rejected: The less preferred conversation (array of messages)
- score_chosen: Preference score for chosen (e.g. 10.0)
- score_rejected: Preference score for rejected (e.g. 0.0)
{
  "chosen": [
    {"role": "user", "content": "What is the color of signoff?"}, 
    {"role": "assistant", "content": "Gray."}
  ],
  "rejected": [
    {"role": "user", "content": "What is the color of signoff?"}, 
    {"role": "assistant", "content": "The color of a standard signoff (an official document or note from a department to a manager) is usually a combination of black and white. This color choice is often used to maintain a subtle, professional appearance, and to convey a sense of precision and professionalism. The black typically represents the font or title, and the white represents the body text, ensuring that the signoff is readable and useful. This white-black scheme mimics the perspective of an absence (in black) and the presence (in white), suggesting a lean, cool demeanor. The exact details and coordination of these colors may vary slightly depending on the particular organization or company, but the overall standard will be described by keeping a balance of black and white in terms of readability and professionalism."}
  ],
  "score_chosen": 10.0,
  "score_rejected": 0.0
}Endpoint
POST https://rundpo.com/api/v2/upload_fileRequest
curl -X POST https://rundpo.com/api/v2/upload_file \
  -H "X-API-Key: rd-YOUR_API_KEY" \
  -F "jsonl_file=@training_data.jsonl"Response
{
  "uploaded": true,
  "lineCount": 10000,
  "file_id": "EPVIFjnExNgSzbBrUKGtMwjQGqADMzRb"
}List Files
List all your uploaded files.
Endpoint
POST https://rundpo.com/api/v2/list_filesRequest
curl -X POST https://rundpo.com/api/v2/list_files \
  -H "X-API-Key: rd-YOUR_API_KEY" \
  -H "Content-Type: application/json"Response
{
  "file_ids": [
    "EPVIFjnExNgSzbBrUKGtMwjQGqADMzRb",
    "KLMnOpQrStUvWxYz1234567890ABCD"
  ]
}Training Operations
Run DPO
Start a new DPO training run with your uploaded data.
Endpoint
POST https://rundpo.com/api/v2/run_dpoRequest
curl -X POST https://rundpo.com/api/v2/run_dpo \
  -H "X-API-Key: rd-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "file_id": "EPVIFjnExNgSzbBrUKGtMwjQGqADMzRb",
    "base_model": "Qwen/Qwen2-0.5B",
    "gpus": 2,
    "sft_learning_rate": 0.0002,
    "dpo_learning_rate": 0.000005,
    "dpo_num_train_epochs": 1
  }'Response
{
  "run_name": "swift_running_falcon",
  "file_id": "EPVIFjnExNgSzbBrUKGtMwjQGqADMzRb",
  "status": "pending"
}Get Status
Check the status of your training run.
Endpoint
GET https://rundpo.com/api/v2/get_statusRequest
curl -G https://rundpo.com/api/v2/get_status \
  -H "X-API-Key: rd-YOUR_API_KEY" \
  --data-urlencode "run_name=swift_running_falcon"Response
During training:
{
  "run_name": "swift_running_falcon",
  "status": "Training SFT",
  "percent": 45
}When complete:
{
  "run_name": "swift_running_falcon",
  "status": "Complete",
  "download_url": "https://rundpo.com/uploads/adapter_770da6e0.zip"
}Training Stages
| Status | Description | 
|---|---|
| Pending | Run is queued and waiting to start | 
| Provisioning GPUs | Setting up GPU infrastructure | 
| Launching SFT | Preparing for supervised fine-tuning | 
| Training SFT | Running supervised fine-tuning (with progress percentage) | 
| Preparing for DPO | Setting up DPO training | 
| Training DPO | Running DPO training (with progress percentage) | 
| Saving model | Exporting and packaging the trained model | 
| Complete | Training finished successfully | 
| Failed | Training encountered an error | 
Configuration Reference
The following parameters can be configured when starting a DPO training run:
Base Configuration
| Parameter | Type | Default | Description | 
|---|---|---|---|
| base_model | string | Qwen/Qwen2-0.5B | The base model to fine-tune | 
| gpus | integer | 2 | Number of GPUs to use for training | 
SFT Parameters
| Parameter | Type | Default | Description | 
|---|---|---|---|
| sft_learning_rate | float | 0.0002 | Learning rate for supervised fine-tuning | 
| sft_num_train_epochs | float | 3 | Number of training epochs for SFT | 
| sft_packing | boolean | true | Whether to use packing for SFT | 
| sft_per_device_train_batch_size | integer | 2 | Batch size per GPU for SFT | 
| sft_gradient_accumulation_steps | integer | 8 | Number of gradient accumulation steps | 
| sft_gradient_checkpointing | boolean | true | Whether to use gradient checkpointing | 
| sft_lora_r | integer | 32 | LoRA rank for SFT | 
| sft_lora_alpha | integer | 16 | LoRA alpha for SFT | 
DPO Parameters
| Parameter | Type | Default | Description | 
|---|---|---|---|
| dpo_learning_rate | float | 0.000005 | Learning rate for DPO training | 
| dpo_num_train_epochs | float | 1 | Number of training epochs for DPO | 
| dpo_per_device_train_batch_size | integer | 8 | Batch size per GPU for DPO | 
| dpo_gradient_accumulation_steps | integer | 2 | Number of gradient accumulation steps | 
| dpo_gradient_checkpointing | boolean | true | Whether to use gradient checkpointing | 
| dpo_lora_r | integer | 16 | LoRA rank for DPO | 
| dpo_lora_alpha | integer | 8 | LoRA alpha for DPO | 
| dpo_bf16 | boolean | true | Whether to use bfloat16 precision | 
| dpo_max_length | integer | 256 | Maximum sequence length for DPO | 
Error Handling
Understanding and handling errors effectively is crucial for a smooth training process. Here are common errors you might encounter and how to resolve them.
API Errors
| Status Code | Error | Description | Resolution | 
|---|---|---|---|
| 400 | Invalid Request | Missing or invalid parameters | Check the request parameters against the configuration reference | 
| 401 | Unauthorized | Invalid or missing API key | Verify your API key is correct and included in the X-API-Key header | 
| 402 | Payment Required | Insufficient credits | Add more credits to your account | 
| 404 | Not Found | Resource not found | Verify file_id or run_name exists and belongs to your account | 
| 429 | Too Many Requests | Rate limit exceeded | Implement exponential backoff in your requests | 
Training Errors
| Error | Common Causes | Resolution | 
|---|---|---|
| Out of Memory | Batch size too large, sequences too long | Reduce batch size or sequence length, enable gradient checkpointing | 
| Training Divergence | Learning rate too high | Reduce learning rate, especially for DPO phase | 
| Data Format Error | Invalid JSONL format or missing fields | Verify your data format matches the requirements | 
Best Practices
Data Preparation
Follow these guidelines to prepare your training data:
- Use clear, consistent formatting for chosen and rejected responses
- Ensure diverse examples covering different scenarios
- Balance your dataset to avoid bias
- Keep sequences within reasonable length (default max is 256 tokens)
Training Configuration
| Scenario | Recommended Changes | 
|---|---|
| Limited GPU Memory | 
 | 
| Faster Training | 
 | 
| Better Quality | 
 | 
Production Tips
- Implement proper error handling and retries in your code
- Monitor training progress and costs
- Use async operations for better performance
- Keep your API keys secure and rotate them regularly
Python Client Reference
The RunDPO Python client provides both synchronous and asynchronous interfaces for interacting with the API.
Async Usage
Recommended for production environments and applications handling multiple runs:
import asyncio
from rundpo import AsyncRundpoClient, DPOConfig, RunConfig
async def main():
    async with AsyncRundpoClient() as client:
        # Upload file
        file_upload = await client.upload_file("data.jsonl")
        
        # Configure and start training
        config = DPOConfig(
            file_id=file_upload.file_id,
            run_config=RunConfig(
                base_model="Qwen/Qwen2-0.5B",
                gpus=2
            )
        )
        
        # Start training
        run_id = await client.run_dpo(config)
        
        # Monitor progress
        while True:
            status = await client.get_status(run_id)
            print(f"Status: {status['status']}")
            if status['status'] in ['Complete', 'Failed']:
                break
            await asyncio.sleep(30)
asyncio.run(main())Sync Usage
Simpler interface for scripts and interactive use:
from rundpo import RundpoClient, DPOConfig, RunConfig
import time
# Initialize client
client = RundpoClient()
# Upload and train
file_upload = client.upload_file("data.jsonl")
config = DPOConfig(
    file_id=file_upload.file_id,
    run_config=RunConfig(
        base_model="Qwen/Qwen2-0.5B",
        gpus=2
    )
)
# Start training
run_id = client.run_dpo(config)
# Monitor progress
while True:
    status = client.get_status(run_id)
    print(f"Status: {status['status']}")
    if status['status'] in ['Complete', 'Failed']:
        break
    time.sleep(30)Client Configuration
The Python client can be configured with various options:
from rundpo import AsyncRundpoClient
client = AsyncRundpoClient(
    api_key="rd-YOUR_API_KEY",  # Optional: defaults to RD_API_KEY env var
    base_url="https://rundpo.com/api/v2",  # Optional: defaults to production API
    timeout=30,  # Optional: request timeout in seconds
    max_retries=3  # Optional: number of retries for failed requests
)- RD_API_KEY: Your RunDPO API key
- RD_HOME: Custom cache directory for downloaded models