RunDPO Documentation

Welcome to the RunDPO documentation. RunDPO is a powerful service that enables you to run Direct Preference Optimization (DPO) training on your models. This documentation will guide you through using both our REST API and Python client.

Tutorial

For a step-by-step guide on using RunDPO to train models on your own data, see our tutorial.

Introduction

RunDPO provides a simple way to train language models using Direct Preference Optimization. The service handles all the infrastructure complexity, allowing you to focus on your training data and model configuration.

Note: Before you begin, make sure you have:

A RunDPO account with sufficient credits
Your training data prepared in JSONL format
Your API key (available from the platform dashboard)

Authentication

All API requests require authentication using an API key. Include your API key in the X-API-Key header with each request:

curl -H "X-API-Key: rd-YOUR_API_KEY" https://rundpo.com/api/v2/endpoint

Security Note: Keep your API key secure and never share it in public repositories or client-side code.

Installation

Python Package

Install the RunDPO Python client using pip:

pip install rundpo

Basic Usage

Here's a quick example of using the Python client:

from rundpo import RundpoClient, DPOConfig, RunConfig

# Initialize the client
client = RundpoClient()

# Upload your training data
file_upload = client.upload_file("training_data.jsonl")

# Configure and start DPO training
config = DPOConfig(
    file_id=file_upload.file_id,
    run_config=RunConfig(
        base_model="Qwen/Qwen2-0.5B",
        gpus=2
    )
)

# Start the training
run_id = client.run_dpo(config)

File Operations

Upload File

Upload your training data file in JSONL format.

File Format: Your JSONL file should contain one JSON object per line, where each line has:

chosen: The preferred conversation (array of messages)
rejected: The less preferred conversation (array of messages)
score_chosen: Preference score for chosen (e.g. 10.0)
score_rejected: Preference score for rejected (e.g. 0.0)

Example line:

{
  "chosen": [
    {"role": "user", "content": "What is the color of signoff?"}, 
    {"role": "assistant", "content": "Gray."}
  ],
  "rejected": [
    {"role": "user", "content": "What is the color of signoff?"}, 
    {"role": "assistant", "content": "The color of a standard signoff (an official document or note from a department to a manager) is usually a combination of black and white. This color choice is often used to maintain a subtle, professional appearance, and to convey a sense of precision and professionalism. The black typically represents the font or title, and the white represents the body text, ensuring that the signoff is readable and useful. This white-black scheme mimics the perspective of an absence (in black) and the presence (in white), suggesting a lean, cool demeanor. The exact details and coordination of these colors may vary slightly depending on the particular organization or company, but the overall standard will be described by keeping a balance of black and white in terms of readability and professionalism."}
  ],
  "score_chosen": 10.0,
  "score_rejected": 0.0
}

Endpoint

POST https://rundpo.com/api/v2/upload_file

Request

curl -X POST https://rundpo.com/api/v2/upload_file \
  -H "X-API-Key: rd-YOUR_API_KEY" \
  -F "jsonl_file=@training_data.jsonl"

Response

{
  "uploaded": true,
  "lineCount": 10000,
  "file_id": "EPVIFjnExNgSzbBrUKGtMwjQGqADMzRb"
}

List Files

List all your uploaded files.

Endpoint

POST https://rundpo.com/api/v2/list_files

Request

curl -X POST https://rundpo.com/api/v2/list_files \
  -H "X-API-Key: rd-YOUR_API_KEY" \
  -H "Content-Type: application/json"

Response

{
  "file_ids": [
    "EPVIFjnExNgSzbBrUKGtMwjQGqADMzRb",
    "KLMnOpQrStUvWxYz1234567890ABCD"
  ]
}

Training Operations

Run DPO

Start a new DPO training run with your uploaded data.

Endpoint

POST https://rundpo.com/api/v2/run_dpo

Request

curl -X POST https://rundpo.com/api/v2/run_dpo \
  -H "X-API-Key: rd-YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "file_id": "EPVIFjnExNgSzbBrUKGtMwjQGqADMzRb",
    "base_model": "Qwen/Qwen2-0.5B",
    "gpus": 2,
    "sft_learning_rate": 0.0002,
    "dpo_learning_rate": 0.000005,
    "dpo_num_train_epochs": 1
  }'

Response

{
  "run_name": "swift_running_falcon",
  "file_id": "EPVIFjnExNgSzbBrUKGtMwjQGqADMzRb",
  "status": "pending"
}

Note: The run_name is a unique identifier for your training run. You'll need it to check the training status and download the final model.

Get Status

Check the status of your training run.

Endpoint

GET https://rundpo.com/api/v2/get_status

Request

curl -G https://rundpo.com/api/v2/get_status \
  -H "X-API-Key: rd-YOUR_API_KEY" \
  --data-urlencode "run_name=swift_running_falcon"

Response

During training:

{
  "run_name": "swift_running_falcon",
  "status": "Training SFT",
  "percent": 45
}

When complete:

{
  "run_name": "swift_running_falcon",
  "status": "Complete",
  "download_url": "https://rundpo.com/uploads/adapter_770da6e0.zip"
}

Training Stages

Status	Description
Pending	Run is queued and waiting to start
Provisioning GPUs	Setting up GPU infrastructure
Launching SFT	Preparing for supervised fine-tuning
Training SFT	Running supervised fine-tuning (with progress percentage)
Preparing for DPO	Setting up DPO training
Training DPO	Running DPO training (with progress percentage)
Saving model	Exporting and packaging the trained model
Complete	Training finished successfully
Failed	Training encountered an error

Configuration Reference

The following parameters can be configured when starting a DPO training run:

Base Configuration

Parameter	Type	Default	Description
base_model	string	Qwen/Qwen2-0.5B	The base model to fine-tune
gpus	integer	2	Number of GPUs to use for training

SFT Parameters

Parameter	Type	Default	Description
sft_learning_rate	float	0.0002	Learning rate for supervised fine-tuning
sft_num_train_epochs	float	3	Number of training epochs for SFT
sft_packing	boolean	true	Whether to use packing for SFT
sft_per_device_train_batch_size	integer	2	Batch size per GPU for SFT
sft_gradient_accumulation_steps	integer	8	Number of gradient accumulation steps
sft_gradient_checkpointing	boolean	true	Whether to use gradient checkpointing
sft_lora_r	integer	32	LoRA rank for SFT
sft_lora_alpha	integer	16	LoRA alpha for SFT

DPO Parameters

Parameter	Type	Default	Description
dpo_learning_rate	float	0.000005	Learning rate for DPO training
dpo_num_train_epochs	float	1	Number of training epochs for DPO
dpo_per_device_train_batch_size	integer	8	Batch size per GPU for DPO
dpo_gradient_accumulation_steps	integer	2	Number of gradient accumulation steps
dpo_gradient_checkpointing	boolean	true	Whether to use gradient checkpointing
dpo_lora_r	integer	16	LoRA rank for DPO
dpo_lora_alpha	integer	8	LoRA alpha for DPO
dpo_bf16	boolean	true	Whether to use bfloat16 precision
dpo_max_length	integer	256	Maximum sequence length for DPO

Performance Tip: The default parameters are optimized for most use cases. Adjust them only if you understand their impact on training dynamics and have specific requirements.

Error Handling

Understanding and handling errors effectively is crucial for a smooth training process. Here are common errors you might encounter and how to resolve them.

API Errors

Status Code	Error	Description	Resolution
400	Invalid Request	Missing or invalid parameters	Check the request parameters against the configuration reference
401	Unauthorized	Invalid or missing API key	Verify your API key is correct and included in the X-API-Key header
402	Payment Required	Insufficient credits	Add more credits to your account
404	Not Found	Resource not found	Verify file_id or run_name exists and belongs to your account
429	Too Many Requests	Rate limit exceeded	Implement exponential backoff in your requests

Training Errors

Error	Common Causes	Resolution
Out of Memory	Batch size too large, sequences too long	Reduce batch size or sequence length, enable gradient checkpointing
Training Divergence	Learning rate too high	Reduce learning rate, especially for DPO phase
Data Format Error	Invalid JSONL format or missing fields	Verify your data format matches the requirements

Tip: When encountering errors, always check the detailed error message in the API response. For training errors, you can find detailed logs in the platform dashboard.

Best Practices

Data Preparation

Follow these guidelines to prepare your training data:

Use clear, consistent formatting for chosen and rejected responses
Ensure diverse examples covering different scenarios
Balance your dataset to avoid bias
Keep sequences within reasonable length (default max is 256 tokens)

Training Configuration

Resource Usage: The default configuration is optimized for most use cases. Here are guidelines for adjusting based on your needs:

Scenario	Recommended Changes
Limited GPU Memory	Reduce batch size Enable gradient checkpointing Increase gradient accumulation steps
Faster Training	Increase number of GPUs Increase batch size Enable bfloat16 precision
Better Quality	Increase number of epochs Adjust learning rates Increase LoRA rank

Production Tips

Implement proper error handling and retries in your code
Monitor training progress and costs
Use async operations for better performance
Keep your API keys secure and rotate them regularly

Python Client Reference

The RunDPO Python client provides both synchronous and asynchronous interfaces for interacting with the API.

Async Usage

Recommended for production environments and applications handling multiple runs:

import asyncio
from rundpo import AsyncRundpoClient, DPOConfig, RunConfig

async def main():
    async with AsyncRundpoClient() as client:
        # Upload file
        file_upload = await client.upload_file("data.jsonl")
        
        # Configure and start training
        config = DPOConfig(
            file_id=file_upload.file_id,
            run_config=RunConfig(
                base_model="Qwen/Qwen2-0.5B",
                gpus=2
            )
        )
        
        # Start training
        run_id = await client.run_dpo(config)
        
        # Monitor progress
        while True:
            status = await client.get_status(run_id)
            print(f"Status: {status['status']}")
            if status['status'] in ['Complete', 'Failed']:
                break
            await asyncio.sleep(30)

asyncio.run(main())

Sync Usage

Simpler interface for scripts and interactive use:

from rundpo import RundpoClient, DPOConfig, RunConfig
import time

# Initialize client
client = RundpoClient()

# Upload and train
file_upload = client.upload_file("data.jsonl")
config = DPOConfig(
    file_id=file_upload.file_id,
    run_config=RunConfig(
        base_model="Qwen/Qwen2-0.5B",
        gpus=2
    )
)

# Start training
run_id = client.run_dpo(config)

# Monitor progress
while True:
    status = client.get_status(run_id)
    print(f"Status: {status['status']}")
    if status['status'] in ['Complete', 'Failed']:
        break
    time.sleep(30)

Client Configuration

The Python client can be configured with various options:

from rundpo import AsyncRundpoClient

client = AsyncRundpoClient(
    api_key="rd-YOUR_API_KEY",  # Optional: defaults to RD_API_KEY env var
    base_url="https://rundpo.com/api/v2",  # Optional: defaults to production API
    timeout=30,  # Optional: request timeout in seconds
    max_retries=3  # Optional: number of retries for failed requests
)

Environment Variables:

RD_API_KEY: Your RunDPO API key
RD_HOME: Custom cache directory for downloaded models