informatique:ai_lm:gpu_bench
Différences
Ci-dessous, les différences entre deux révisions de la page.
| Les deux révisions précédentesRévision précédenteProchaine révision | Révision précédente | ||
| informatique:ai_lm:gpu_bench [11/06/2026 11:41] – [Stabilité Avec eGPU 😩] cyrille | informatique:ai_lm:gpu_bench [25/06/2026 18:18] (Version actuelle) – [Nemotron-Cascade-2-30B-A3B] cyrille | ||
|---|---|---|---|
| Ligne 255: | Ligne 255: | ||
| **Environnement et compilation sensible** pour llama.cpp : | **Environnement et compilation sensible** pour llama.cpp : | ||
| * https:// | * https:// | ||
| + | |||
| + | |||
| + | ^ Modèle ^ params ^ Offload GPU ^ Prompt (t/s) ^ Eval (t/s) ^ Total (ms) ^ Tokens générés ^ Graphs reused ^ | ||
| + | | Devstral-Small-2-24B-Instruct-2512-UD-Q4_K_XL | 24B | 17/41 | 427.81 – 545.85 | 0.80 – 3.19 | 123,500 – 568,458 | 9,629 – 47,241 | 0 | | ||
| + | | Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL | 30B | 49/49 | 590.38 – 591.76 | 28.64 – 30.06 | 4,715 – 12,818 | 19,919 – 22,804 | 294 – 530 | | ||
| + | | Qwen3-Coder-Next-UD-Q4_K_XL | 80B | 49/49 | 29.00 – 400.09 | 18.68 – 32.44 | 25,057 – 87,659 | 719 – 43,214 | 10 – 1,024 | | ||
| + | | DeepSeek-R1-Distill-Qwen-32B-Q4_K_M | 32B | 24/65 | 88.97 – 428.81 | 2.14 – 2.32 | 116,052 – 189,566 | 925 – 3,397 | 228 – 419 | | ||
| + | | DeepSeek-R1-Distill-Qwen-14B-Q8_0 | 14B | 24/49 | 225.55 – 775.01 | 4.10 – 4.13 | 81,383 – 147,476 | 1,307 – 3,858 | 313 – 582 | | ||
| === gpt-oss-20b-UD-Q4_K_XL === | === gpt-oss-20b-UD-Q4_K_XL === | ||
| Ligne 330: | Ligne 338: | ||
| build: e25a32e98 (9584) | build: e25a32e98 (9584) | ||
| + | </ | ||
| + | |||
| + | === gemma-4-26B-A4B-it-qat-UD-Q4_K_XL === | ||
| + | |||
| + | < | ||
| + | prompt eval time = | ||
| + | eval time = 1338.88 ms / 86 tokens ( 15.57 ms per token, | ||
| + | total time = 1657.05 ms / 251 tokens | ||
| + | | ||
| + | stop processing: n_tokens = 20931, truncated = 0 | ||
| + | |||
| + | prompt eval time = 3143.73 ms / 4850 tokens ( 0.65 ms per token, | ||
| + | eval time = | ||
| + | total time = | ||
| + | | ||
| + | stop processing: n_tokens = 27604, truncated = 0 | ||
| </ | </ | ||
| === Qwen3-Coder-30B-A3B-Instruct-Q4_K_M === | === Qwen3-Coder-30B-A3B-Instruct-Q4_K_M === | ||
| + | |||
| + | J'ai essayé des '' | ||
| < | < | ||
| $ ./ | $ ./ | ||
| - | llama_bench: | + | llama_bench: |
| </ | </ | ||
| + | |||
| + | === Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL === | ||
| + | |||
| + | J'ai essayé des '' | ||
| < | < | ||
| - | exec llama-server \ | + | $ ./ |
| - | -m Qwen3-Coder-30B-A3B-Instruct-Q4_K_M.gguf \ | + | |
| - | --host 0.0.0.0 --port 8012 \ | + | |
| - | | + | |
| - | | + | |
| - | | + | |
| - | | + | |
| - | | + | |
| - | | + | |
| - | -c 96000 | + | |
| - | common_params_print_info: build 9584 (e25a32e98) with GNU 15.2.0 for Linux x86_64 | + | ggml_cuda_init: found 1 CUDA devices |
| - | log_info: verbosity = 4 (adjust with the `-lv N` CLI arg) | + | |
| - | device_info: | + | | model |
| - | | + | | ------------------------------ |
| - | - CPU : Intel(R) Core(TM) Ultra 7 270K Plus (93508 MiB, 93508 MiB free) | + | llama_bench: error: failed |
| - | system_info: | + | </code> |
| - | srv llama_server: | + | |
| - | ... | + | |
| - | common_params_fit_impl: | + | |
| - | common_params_fit_impl: | + | |
| - | common_params_fit_impl: | + | |
| - | common_fit_params: successfully fit params to free device memory | + | |
| - | common_fit_params: fitting params | + | |
| - | llama_model_loader: | + | |
| - | ... | + | |
| - | load_tensors: | + | |
| - | load_tensors: | + | |
| - | load_tensors: | + | |
| - | load_tensors: | + | |
| - | load_tensors: | + | |
| - | load_tensors: | + | |
| - | load_tensors: | + | |
| - | ... | + | |
| - | llama_context: | + | |
| - | llama_context: | + | |
| - | llama_kv_cache: | + | |
| - | llama_kv_cache: | + | |
| - | ... | + | |
| - | sched_reserve: | + | |
| - | sched_reserve: | + | |
| - | sched_reserve: | + | |
| - | sched_reserve: | + | |
| - | sched_reserve: | + | |
| - | sched_reserve: | + | |
| - | sched_reserve: | + | |
| - | ... | + | |
| - | srv load_model: prompt cache is enabled, size limit: 8192 MiB | + | |
| - | ... | + | |
| - | srv init: init: chat template, thinking = 0 | + | |
| - | srv llama_server: | + | |
| - | srv llama_server: | + | |
| - | srv update_slots: | + | |
| - | $ nvidia-smi | + | === Nemotron-Cascade-2-30B-A3B === |
| - | +-----------------------------------------------------------------------------------------+ | + | |
| - | | NVIDIA-SMI 595.71.05 | + | |
| - | +-----------------------------------------+------------------------+----------------------+ | + | |
| - | | GPU Name | + | |
| - | | Fan Temp | + | |
| - | | | + | |
| - | |=========================================+========================+======================| | + | |
| - | | | + | |
| - | | 0% | + | |
| - | | | + | |
| - | +-----------------------------------------+------------------------+----------------------+ | + | |
| - | +-----------------------------------------------------------------------------------------+ | + | J'ai essayé des '' |
| - | | Processes: | | + | |
| - | | GPU | + | < |
| - | | | + | $ ./ |
| - | |=========================================================================================| | + | ggml_cuda_init: |
| - | | | + | |
| - | +-----------------------------------------------------------------------------------------+ | + | | model |
| + | | ------------------------------ | ||
| + | llama_bench: | ||
| </ | </ | ||
informatique/ai_lm/gpu_bench.1781170910.txt.gz · Dernière modification : de cyrille
