NVIDIA diffusion language model Nemotron TwoTower achieves 2.42x LLM inference throughput without a full retraining run, ...
qwen-llama-inference-bench/ ├── benchmark.py # CLI — framework, model, domain, sweep knobs ├── config.py # Constants (MODEL_MAP, defaults) ├── prompts.py # 62 domain-tagged prompts across 7 domains ...