sglang/ GB200 perf

dsr1-fp8-1k1k-max-tpt

concurrency 6,144 · 21 hours ago

cron
passed
re-run
Commit

Vectorize _create_custom_4d_mask in CustomQwen2Decoder (#27527)

ckvermaAI·22 hours ago
PR #27527 · Vectorize _create_custom_4d_mask in CustomQwen2Decoder
Run
GitHub Actions27999739491-1
Slurm job5333
GPUs48·prefill 16 / decode 32
ISL / OSL1024 / 1024

Metrics

28 captured

best_of

1.00

burstiness

1.00

completed

61,440

duration

452

max_concurrency

6,144

mean_e2el_ms

43,853ms

mean_itl_ms

1,684ms

mean_tpot_ms

34.48ms

mean_ttft_ms

12,109ms

median_e2el_ms

43,391ms

median_itl_ms

1,640ms

median_tpot_ms

34.83ms

median_ttft_ms

12,637ms

num_prompts

61,440

output_throughput

125,271tok/s

p99_e2el_ms

70,118ms

p99_itl_ms

3,216ms

p99_tpot_ms

40.73ms

p99_ttft_ms

34,368ms

peak_output_tokens_per_s

175,003s

request_throughput

136tok/s

std_e2el_ms

6,558ms

std_itl_ms

546ms

std_tpot_ms

4.78ms

std_ttft_ms

6,360ms

total_input_tokens

56,636,934

total_output_tokens

56,621,450

total_token_throughput

250,575tok/s