sglang/ GB200 perf

dsr1-fp4-1k1k-mid-curve

concurrency 4,096 · 20 hours ago

cron
passed
re-run
Commit

Vectorize _create_custom_4d_mask in CustomQwen2Decoder (#27527)

ckvermaAI·22 hours ago
PR #27527 · Vectorize _create_custom_4d_mask in CustomQwen2Decoder
Run
GitHub Actions27999739491-1
Slurm job5335
GPUs48·prefill 16 / decode 32
ISL / OSL1024 / 1024

Metrics

28 captured

best_of

1.00

burstiness

1.00

completed

40,960

duration

334

max_concurrency

4,096

mean_e2el_ms

32,428ms

mean_itl_ms

895ms

mean_tpot_ms

18.26ms

mean_ttft_ms

15,624ms

median_e2el_ms

32,996ms

median_itl_ms

849ms

median_tpot_ms

17.58ms

median_ttft_ms

17,481ms

num_prompts

40,960

output_throughput

113,028tok/s

p99_e2el_ms

45,825ms

p99_itl_ms

1,944ms

p99_tpot_ms

26.29ms

p99_ttft_ms

22,004ms

peak_output_tokens_per_s

150,629s

request_throughput

123tok/s

std_e2el_ms

3,941ms

std_itl_ms

286ms

std_tpot_ms

3.50ms

std_ttft_ms

4,704ms

total_input_tokens

37,769,666

total_output_tokens

37,742,239

total_token_throughput

226,138tok/s