sglang/ GB200 perf

dsr1-fp4-1k1k-mid-curve

concurrency 512 · 22 days ago

cron
passed
re-run
Commit

Speed up DeepGEMM JIT warmup with per-PP-rank parallel compile (#26567)

whybeyoung·22 days ago
PR #26567 · Speed up DeepGEMM JIT warmup with per-PP-rank parallel compile
Run
GitHub Actions26796303493-1
Slurm job5111
GPUs48·prefill 16 / decode 32
ISL / OSL1024 / 1024

Metrics

28 captured

best_of

1.00

burstiness

1.00

completed

5,120

duration

154

max_concurrency

512

mean_e2el_ms

14,807ms

mean_itl_ms

736ms

mean_tpot_ms

15.02ms

mean_ttft_ms

960ms

median_e2el_ms

14,745ms

median_itl_ms

727ms

median_tpot_ms

15.21ms

median_ttft_ms

721ms

num_prompts

5,120

output_throughput

30,638tok/s

p99_e2el_ms

18,137ms

p99_itl_ms

1,144ms

p99_tpot_ms

15.83ms

p99_ttft_ms

3,653ms

peak_output_tokens_per_s

38,209s

request_throughput

33.22tok/s

std_e2el_ms

1,205ms

std_itl_ms

153ms

std_tpot_ms

0.81ms

std_ttft_ms

888ms

total_input_tokens

4,717,859

total_output_tokens

4,722,209

total_token_throughput

61,249tok/s