sglang/ GB200 perf

dsr1-fp8-1k1k-max-tpt

concurrency 4,096 · 5 days ago

manual
passed
re-run
Commit

fix dependencies

csahithi·5 days ago
Run
GitHub Actions24591826053-2
Slurm job4667
GPUs48·prefill 16 / decode 32
ISL / OSL1024 / 1024

Metrics

27 captured

best_of

1.00

burstiness

1.00

completed

40,960

duration

760

max_concurrency

4,096

mean_e2el_ms

71,654ms

mean_itl_ms

876ms

mean_tpot_ms

17.98ms

mean_ttft_ms

55,104ms

median_e2el_ms

72,661ms

median_itl_ms

885ms

median_tpot_ms

17.76ms

median_ttft_ms

55,909ms

num_prompts

40,960

output_throughput

49,684tok/s

p99_e2el_ms

88,106ms

p99_itl_ms

2,234ms

p99_tpot_ms

19.46ms

p99_ttft_ms

73,049ms

request_throughput

53.92tok/s

std_e2el_ms

8,629ms

std_itl_ms

238ms

std_tpot_ms

0.48ms

std_ttft_ms

8,523ms

total_input_tokens

37,769,666

total_output_tokens

37,742,239

total_token_throughput

99,405tok/s