informatique:ai_lm:gpu_bench
Différences
Ci-dessous, les différences entre deux révisions de la page.
| Les deux révisions précédentesRévision précédenteProchaine révision | Révision précédente | ||
| informatique:ai_lm:gpu_bench [09/06/2026 20:09] – [Qwen2.5-coder-7b-instruct-q5_k_m] cyrille | informatique:ai_lm:gpu_bench [25/06/2026 18:18] (Version actuelle) – [Nemotron-Cascade-2-30B-A3B] cyrille | ||
|---|---|---|---|
| Ligne 256: | Ligne 256: | ||
| * https:// | * https:// | ||
| + | |||
| + | ^ Modèle ^ params ^ Offload GPU ^ Prompt (t/s) ^ Eval (t/s) ^ Total (ms) ^ Tokens générés ^ Graphs reused ^ | ||
| + | | Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL | 24B | 17/41 | 427.81 – 545.85 | 0.80 – 3.19 | 123,500 – 568,458 | 9,629 – 47,241 | 0 | | ||
| + | | Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL | 30B | 49/49 | 590.38 – 591.76 | 28.64 – 30.06 | 4,715 – 12,818 | 19,919 – 22,804 | 294 – 530 | | ||
| + | | Qwen3-Coder-Next-UD-Q4_K_XL | 80B | 49/49 | 29.00 – 400.09 | 18.68 – 32.44 | 25,057 – 87,659 | 719 – 43,214 | 10 – 1,024 | | ||
| + | | DeepSeek-R1-Distill-Qwen-32B-Q4_K_M | 32B | 24/65 | 88.97 – 428.81 | 2.14 – 2.32 | 116,052 – 189,566 | 925 – 3,397 | 228 – 419 | | ||
| + | | DeepSeek-R1-Distill-Qwen-14B-Q8_0 | 14B | 24/49 | 225.55 – 775.01 | 4.10 – 4.13 | 81,383 – 147,476 | 1,307 – 3,858 | 313 – 582 | | ||
| + | |||
| + | === gpt-oss-20b-UD-Q4_K_XL === | ||
| + | |||
| + | < | ||
| + | $ ./ | ||
| + | ggml_cuda_init: | ||
| + | Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes, VRAM: 15849 MiB | ||
| + | | model | ||
| + | | ------------------------- | ---------: | ---------: | ------- | --: | ------: | -------------: | ||
| + | | gpt-oss 20B Q4_K - Medium | 11.04 GiB | 20.91 B | CUDA | -1 | tg128 | 155.79 ± 0.21 | | ||
| + | | gpt-oss 20B Q4_K - Medium | 11.04 GiB | 20.91 B | CUDA | -1 | tg256 | 155.81 ± 0.03 | | ||
| + | | gpt-oss 20B Q4_K - Medium | 11.04 GiB | 20.91 B | CUDA | -1 | tg512 | 155.15 ± 0.01 | | ||
| + | |||
| + | build: e25a32e98 (9584) | ||
| + | |||
| + | $ ./ | ||
| + | ggml_cuda_init: | ||
| + | Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes, VRAM: 15849 MiB | ||
| + | | model | ||
| + | | ------------------------- | ---------: | ------: | ------- | --: | ------: | ------: | --------------: | ||
| + | | gpt-oss 20B Q4_K - Medium | 11.04 GiB | 20.91 B | CUDA | -1 | 128 | pp1024 | 3308.23 ± 19.28 | | ||
| + | | gpt-oss 20B Q4_K - Medium | 11.04 GiB | 20.91 B | CUDA | -1 | 256 | pp1024 | 4792.27 ± 39.25 | | ||
| + | | gpt-oss 20B Q4_K - Medium | 11.04 GiB | 20.91 B | CUDA | -1 | 512 | pp1024 | 6048.13 ± 32.16 | | ||
| + | |||
| + | build: e25a32e98 (9584) | ||
| + | </ | ||
| === Qwen2.5-coder-7b-instruct-q8_0 === | === Qwen2.5-coder-7b-instruct-q8_0 === | ||
| Ligne 263: | Ligne 296: | ||
| ggml_cuda_init: | ggml_cuda_init: | ||
| Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes, VRAM: 15849 MiB | Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes, VRAM: 15849 MiB | ||
| - | | model | size | | + | | model | size | |
| - | | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: | + | | ---------------- | ---------: | ---------: | --------- | --: | ----------: | ----------------: |
| - | | qwen2 7B Q8_0 | 7.54 GiB | 7.62 B | CUDA | + | | qwen2 7B Q8_0 | 7.54 GiB | 7.62 B | CUDA | -1 | |
| - | | qwen2 7B Q8_0 | 7.54 GiB | 7.62 B | CUDA | + | | qwen2 7B Q8_0 | 7.54 GiB | 7.62 B | CUDA | -1 | |
| - | | qwen2 7B Q8_0 | 7.54 GiB | 7.62 B | CUDA | + | | qwen2 7B Q8_0 | 7.54 GiB | 7.62 B | CUDA | -1 | |
| build: e25a32e98 (9584) | build: e25a32e98 (9584) | ||
| Ligne 274: | Ligne 307: | ||
| ggml_cuda_init: | ggml_cuda_init: | ||
| Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes, VRAM: 15849 MiB | Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes, VRAM: 15849 MiB | ||
| - | | model | size | | + | | model | size | |
| - | | ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: | + | | ---------------- | ---------: | ---------: | --------- | --: | ------: | --------: | ---------------: |
| - | | qwen2 7B Q8_0 | 7.54 GiB | 7.62 B | CUDA | + | | qwen2 7B Q8_0 | 7.54 GiB | 7.62 B | CUDA | -1 | 128 | pp1024 | |
| - | | qwen2 7B Q8_0 | 7.54 GiB | 7.62 B | CUDA | + | | qwen2 7B Q8_0 | 7.54 GiB | 7.62 B | CUDA | -1 | 256 | pp1024 | |
| - | | qwen2 7B Q8_0 | 7.54 GiB | 7.62 B | CUDA | + | | qwen2 7B Q8_0 | 7.54 GiB | 7.62 B | CUDA | -1 | 512 | pp1024 | |
| build: e25a32e98 (9584) | build: e25a32e98 (9584) | ||
| </ | </ | ||
| - | ==== Stabilité Avec eGPU 😩 ==== | + | === Qwen2.5-coder-14b-instruct-q5_k_m |
| + | |||
| + | < | ||
| + | $ ./ | ||
| + | ggml_cuda_init: | ||
| + | Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes, VRAM: 15849 MiB | ||
| + | | model | ||
| + | | ----------------------- | ---------: | -------: | ------- | --: | -------: | --------------: | ||
| + | | qwen2 14B Q5_K - Medium | 9.78 GiB | 14.77 B | CUDA | -1 | tg128 | 39.54 ± 0.02 | | ||
| + | | qwen2 14B Q5_K - Medium | 9.78 GiB | 14.77 B | CUDA | -1 | tg256 | 39.53 ± 0.01 | | ||
| + | | qwen2 14B Q5_K - Medium | 9.78 GiB | 14.77 B | CUDA | -1 | tg512 | 39.38 ± 0.01 | | ||
| + | |||
| + | build: e25a32e98 (9584) | ||
| + | |||
| + | Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes, VRAM: 15849 MiB | ||
| + | | model | ||
| + | | ----------------------- | ---------: | -------: | ------- | --: | ------: | ------: | --------------: | ||
| + | | qwen2 14B Q5_K - Medium | 9.78 GiB | 14.77 B | CUDA | -1 | 128 | pp1024 | 1835.16 ± 1.69 | | ||
| + | | qwen2 14B Q5_K - Medium | 9.78 GiB | 14.77 B | CUDA | -1 | 256 | pp1024 | 1967.12 ± 1.01 | | ||
| + | | qwen2 14B Q5_K - Medium | 9.78 GiB | 14.77 B | CUDA | -1 | 512 | pp1024 | 1995.02 ± 0.84 | | ||
| + | |||
| + | build: e25a32e98 (9584) | ||
| + | </ | ||
| + | |||
| + | === gemma-4-26B-A4B-it-qat-UD-Q4_K_XL === | ||
| + | |||
| + | < | ||
| + | prompt eval time = | ||
| + | eval time = 1338.88 ms / 86 tokens ( 15.57 ms per token, | ||
| + | total time = 1657.05 ms / 251 tokens | ||
| + | | ||
| + | stop processing: n_tokens = 20931, truncated = 0 | ||
| + | |||
| + | prompt eval time = 3143.73 ms / 4850 tokens ( 0.65 ms per token, | ||
| + | eval time = | ||
| + | total time = | ||
| + | | ||
| + | stop processing: n_tokens = 27604, truncated = 0 | ||
| + | </ | ||
| + | |||
| + | === Qwen3-Coder-30B-A3B-Instruct-Q4_K_M === | ||
| + | |||
| + | J'ai essayé des '' | ||
| + | |||
| + | < | ||
| + | $ ./ | ||
| + | |||
| + | llama_bench: | ||
| + | </ | ||
| + | |||
| + | === Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL === | ||
| + | |||
| + | J'ai essayé des '' | ||
| + | |||
| + | < | ||
| + | $ ./ | ||
| + | |||
| + | ggml_cuda_init: | ||
| + | Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes, VRAM: 15849 MiB | ||
| + | | model | size | | ||
| + | | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | ||
| + | llama_bench: | ||
| + | </ | ||
| + | |||
| + | === Nemotron-Cascade-2-30B-A3B === | ||
| + | |||
| + | J'ai essayé des '' | ||
| + | |||
| + | < | ||
| + | $ ./ | ||
| + | ggml_cuda_init: | ||
| + | Device 0: NVIDIA GeForce RTX 5060 Ti, compute capability 12.0, VMM: yes, VRAM: 15849 MiB | ||
| + | | model | size | | ||
| + | | ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | ||
| + | llama_bench: | ||
| + | </ | ||
| + | |||
| + | ==== INstabilité avec eGPU 😩 ==== | ||
| Reset nvidia et CUDA: | Reset nvidia et CUDA: | ||
informatique/ai_lm/gpu_bench.1781028558.txt.gz · Dernière modification : de cyrille
