Hacker News

by agcaton 9/5/24, 11:18 PMwith 1 comments

by agcaton 9/5/24, 11:18 PM

Hey community: In this deep dive, analyzed LLM speed benchmarks, comparing models like Qwen2-7B-Instruct, Gemma-2-9B-it, Llama-3.1-8B-Instruct, Mistral-7B-Instruct-v0.3, Phi-3-medium-128k-instruct across Libraries like vLLM, TGI, TensorRT-LLM, Tritonvllm, Deepspeed-mii, ctranslate. All independent on A100 GPUs on Azure, no sponsorship.

Qwen2-7B-Instruct with TensorRT-LLM: consistently high tokens/SEC