RunDPO Documentation

Welcome to the RunDPO documentation. RunDPO is a powerful service that enables you to run Direct Preference Optimization (DPO) training on your models. This documentation will guide you through using both our REST API and Python client.

Tutorial

For a step-by-step guide on using RunDPO to train models on your own data, see our tutorial.

Introduction

RunDPO provides a simple way to train language models using Direct Preference Optimization. The service handles all the infrastructure complexity, allowing you to focus on your training data and model configuration.

Note: Before you begin, make sure you have:
  • A RunDPO account with sufficient credits
  • Your training data prepared in JSONL format
  • Your API key (available from the platform dashboard)

Authentication

All API requests require authentication using an API key. Include your API key in the X-API-Key header with each request:

curl -H "X-API-Key: rd-YOUR_API_KEY" https://rundpo.com/api/v2/endpoint
Security Note: Keep your API key secure and never share it in public repositories or client-side code.

Installation

Python Package

Install the RunDPO Python client using pip:

pip install rundpo

Basic Usage

Here's a quick example of using the Python client:

from rundpo import RundpoClient, DPOConfig, RunConfig

# Initialize the client
client = RundpoClient()

# Upload your training data
file_upload = client.upload_file("training_data.jsonl")

# Configure and start DPO training
config = DPOConfig(
    file_id=file_upload.file_id,
    run_config=RunConfig(
        base_model="Qwen/Qwen2-0.5B",
        gpus=2
    )
)

# Start the training
run_id = client.run_dpo(config)

File Operations

Upload File

Upload your training data file in JSONL format.

File Format: Your JSONL file should contain one JSON object per line, where each line has:
  • chosen: The preferred conversation (array of messages)
  • rejected: The less preferred conversation (array of messages)
  • score_chosen: Preference score for chosen (e.g. 10.0)
  • score_rejected: Preference score for rejected (e.g. 0.0)
Example line:
{
  "chosen": [
    {"role": "user", "content": "What is the color of signoff?"}, 
    {"role": "assistant", "content": "Gray."}
  ],
  "rejected": [
    {"role": "user", "content": "What is the color of signoff?"}, 
    {"role": "assistant", "content": "The color of a standard signoff (an official document or note from a department to a manager) is usually a combination of black and white. This color choice is often used to maintain a subtle, professional appearance, and to convey a sense of precision and professionalism. The black typically represents the font or title, and the white represents the body text, ensuring that the signoff is readable and useful. This white-black scheme mimics the perspective of an absence (in black) and the presence (in white), suggesting a lean, cool demeanor. The exact details and coordination of these colors may vary slightly depending on the particular organization or company, but the overall standard will be described by keeping a balance of black and white in terms of readability and professionalism."}
  ],
  "score_chosen": 10.0,
  "score_rejected": 0.0
}

Endpoint

POST https://rundpo.com/api/v2/upload_file

Request

curl -X POST https://rundpo.com/api/v2/upload_file \
  -H "X-API-Key: rd-YOUR_API_KEY" \
  -F "jsonl_file=@training_data.jsonl"

Response

{
  "uploaded": true,
  "lineCount": 10000,
  "file_id": "EPVIFjnExNgSzbBrUKGtMwjQGqADMzRb"
}

List Files

List all your uploaded files.

Endpoint

POST https://rundpo.com/api/v2/list_files

Request

curl -X POST https://rundpo.com/api/v2/list_files \
  -H "X-API-Key: rd-YOUR_API_KEY" \
  -H "Content-Type: application/json"

Response

{
  "file_ids": [
    "EPVIFjnExNgSzbBrUKGtMwjQGqADMzRb",
    "KLMnOpQrStUvWxYz1234567890ABCD"
  ]
}

Training Operations

Run DPO

Start a new DPO training run with your uploaded data.

Endpoint

POST https://rundpo.com/api/v2/run_dpo

Request

curl -X POST https://rundpo.com/api/v2/run_dpo \
  -H "X-API-Key: rd-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "file_id": "EPVIFjnExNgSzbBrUKGtMwjQGqADMzRb",
    "base_model": "Qwen/Qwen2-0.5B",
    "gpus": 2,
    "sft_learning_rate": 0.0002,
    "dpo_learning_rate": 0.000005,
    "dpo_num_train_epochs": 1
  }'

Response

{
  "run_name": "swift_running_falcon",
  "file_id": "EPVIFjnExNgSzbBrUKGtMwjQGqADMzRb",
  "status": "pending"
}
Note: The run_name is a unique identifier for your training run. You'll need it to check the training status and download the final model.

Get Status

Check the status of your training run.

Endpoint

GET https://rundpo.com/api/v2/get_status

Request

curl -G https://rundpo.com/api/v2/get_status \
  -H "X-API-Key: rd-YOUR_API_KEY" \
  --data-urlencode "run_name=swift_running_falcon"

Response

During training:

{
  "run_name": "swift_running_falcon",
  "status": "Training SFT",
  "percent": 45
}

When complete:

{
  "run_name": "swift_running_falcon",
  "status": "Complete",
  "download_url": "https://rundpo.com/uploads/adapter_770da6e0.zip"
}

Training Stages

Status Description
Pending Run is queued and waiting to start
Provisioning GPUs Setting up GPU infrastructure
Launching SFT Preparing for supervised fine-tuning
Training SFT Running supervised fine-tuning (with progress percentage)
Preparing for DPO Setting up DPO training
Training DPO Running DPO training (with progress percentage)
Saving model Exporting and packaging the trained model
Complete Training finished successfully
Failed Training encountered an error

Configuration Reference

The following parameters can be configured when starting a DPO training run:

Base Configuration

Parameter Type Default Description
base_model string Qwen/Qwen2-0.5B The base model to fine-tune
gpus integer 2 Number of GPUs to use for training

SFT Parameters

Parameter Type Default Description
sft_learning_rate float 0.0002 Learning rate for supervised fine-tuning
sft_num_train_epochs float 3 Number of training epochs for SFT
sft_packing boolean true Whether to use packing for SFT
sft_per_device_train_batch_size integer 2 Batch size per GPU for SFT
sft_gradient_accumulation_steps integer 8 Number of gradient accumulation steps
sft_gradient_checkpointing boolean true Whether to use gradient checkpointing
sft_lora_r integer 32 LoRA rank for SFT
sft_lora_alpha integer 16 LoRA alpha for SFT

DPO Parameters

Parameter Type Default Description
dpo_learning_rate float 0.000005 Learning rate for DPO training
dpo_num_train_epochs float 1 Number of training epochs for DPO
dpo_per_device_train_batch_size integer 8 Batch size per GPU for DPO
dpo_gradient_accumulation_steps integer 2 Number of gradient accumulation steps
dpo_gradient_checkpointing boolean true Whether to use gradient checkpointing
dpo_lora_r integer 16 LoRA rank for DPO
dpo_lora_alpha integer 8 LoRA alpha for DPO
dpo_bf16 boolean true Whether to use bfloat16 precision
dpo_max_length integer 256 Maximum sequence length for DPO
Performance Tip: The default parameters are optimized for most use cases. Adjust them only if you understand their impact on training dynamics and have specific requirements.

Error Handling

Understanding and handling errors effectively is crucial for a smooth training process. Here are common errors you might encounter and how to resolve them.

API Errors

Status Code Error Description Resolution
400 Invalid Request Missing or invalid parameters Check the request parameters against the configuration reference
401 Unauthorized Invalid or missing API key Verify your API key is correct and included in the X-API-Key header
402 Payment Required Insufficient credits Add more credits to your account
404 Not Found Resource not found Verify file_id or run_name exists and belongs to your account
429 Too Many Requests Rate limit exceeded Implement exponential backoff in your requests

Training Errors

Error Common Causes Resolution
Out of Memory Batch size too large, sequences too long Reduce batch size or sequence length, enable gradient checkpointing
Training Divergence Learning rate too high Reduce learning rate, especially for DPO phase
Data Format Error Invalid JSONL format or missing fields Verify your data format matches the requirements
Tip: When encountering errors, always check the detailed error message in the API response. For training errors, you can find detailed logs in the platform dashboard.

Best Practices

Data Preparation

Follow these guidelines to prepare your training data:

  • Use clear, consistent formatting for chosen and rejected responses
  • Ensure diverse examples covering different scenarios
  • Balance your dataset to avoid bias
  • Keep sequences within reasonable length (default max is 256 tokens)

Training Configuration

Resource Usage: The default configuration is optimized for most use cases. Here are guidelines for adjusting based on your needs:
Scenario Recommended Changes
Limited GPU Memory
  • Reduce batch size
  • Enable gradient checkpointing
  • Increase gradient accumulation steps
Faster Training
  • Increase number of GPUs
  • Increase batch size
  • Enable bfloat16 precision
Better Quality
  • Increase number of epochs
  • Adjust learning rates
  • Increase LoRA rank

Production Tips

  • Implement proper error handling and retries in your code
  • Monitor training progress and costs
  • Use async operations for better performance
  • Keep your API keys secure and rotate them regularly

Python Client Reference

The RunDPO Python client provides both synchronous and asynchronous interfaces for interacting with the API.

Async Usage

Recommended for production environments and applications handling multiple runs:

import asyncio
from rundpo import AsyncRundpoClient, DPOConfig, RunConfig

async def main():
    async with AsyncRundpoClient() as client:
        # Upload file
        file_upload = await client.upload_file("data.jsonl")
        
        # Configure and start training
        config = DPOConfig(
            file_id=file_upload.file_id,
            run_config=RunConfig(
                base_model="Qwen/Qwen2-0.5B",
                gpus=2
            )
        )
        
        # Start training
        run_id = await client.run_dpo(config)
        
        # Monitor progress
        while True:
            status = await client.get_status(run_id)
            print(f"Status: {status['status']}")
            if status['status'] in ['Complete', 'Failed']:
                break
            await asyncio.sleep(30)

asyncio.run(main())

Sync Usage

Simpler interface for scripts and interactive use:

from rundpo import RundpoClient, DPOConfig, RunConfig
import time

# Initialize client
client = RundpoClient()

# Upload and train
file_upload = client.upload_file("data.jsonl")
config = DPOConfig(
    file_id=file_upload.file_id,
    run_config=RunConfig(
        base_model="Qwen/Qwen2-0.5B",
        gpus=2
    )
)

# Start training
run_id = client.run_dpo(config)

# Monitor progress
while True:
    status = client.get_status(run_id)
    print(f"Status: {status['status']}")
    if status['status'] in ['Complete', 'Failed']:
        break
    time.sleep(30)

Client Configuration

The Python client can be configured with various options:

from rundpo import AsyncRundpoClient

client = AsyncRundpoClient(
    api_key="rd-YOUR_API_KEY",  # Optional: defaults to RD_API_KEY env var
    base_url="https://rundpo.com/api/v2",  # Optional: defaults to production API
    timeout=30,  # Optional: request timeout in seconds
    max_retries=3  # Optional: number of retries for failed requests
)
Environment Variables:
  • RD_API_KEY: Your RunDPO API key
  • RD_HOME: Custom cache directory for downloaded models