Différences

Ci-dessous, les différences entre deux révisions de la page.

--- informatique:ai_lm [27/05/2026 11:26] – [mistral.rs] cyrille
+++ informatique:ai_lm [23/06/2026 05:40] (Version actuelle) – [Compilation pour GPU] cyrille
@@ Ligne 53: / Ligne 53: @@
   * [[/informatique/ai_lm/gpu_bench|GPU Benchmarks]]
+Installer [[/informatique/nvidia|nvidia-drivers et CUDA]].
 ==== Online services ====
@@ Ligne 264: / Ligne 265: @@
 # RTX 5060 : 120
-$ export CUDA_VERSION=12.9 && cmake -B build -DGGML_CUDA=ON \
+$ export CUDA_VERSION=12.9
+$ export CUDA_VERSION=13.3
+$ cmake -B build -DGGML_CUDA=ON \
  -DCMAKE_CUDA_ARCHITECTURES="86;120" \
  -DCMAKE_BUILD_WITH_INSTALL_RPATH=ON \
@@ Ligne 286: / Ligne 289: @@
 -- Build files have been written to: /home/cyrille/Code/bronx/AI_Coding/llama.cpp/build
-$ time cmake --build build --config Release -j 10
+$ time cmake --build build --clean-first --config Release -j 10
 # host: i7-1360P + SSD
@@ Ligne 302: / Ligne 305: @@
 user	61m37,436s
 sys	2m37,613s
-</code>
-Avec CUDA 13.1 llama.cpp plante direct à la 1ère requête, mais sans message dans syslog : ce n'est donc pas le driver mais le logiciel llama.cpp qui ne support pas cette version de CUDA :
+# host: Core(TM) Ultra 7 270K Plus
-<code>
+real	3m6.637s
-/home/cyrille/Code/bronx/AI_Coding/llama.cpp/ggml/src/ggml-cuda/ggml-cuda.cu:97: CUDA error
+user	27m13.877s
-CUDA error: invalid argument
+sys	1m24.687s
-  current device: 0, in function ggml_cuda_mul_mat_q at /home/cyrille/Code/bronx/AI_Coding/llama.cpp/ggml/src/ggml-cuda/mmq.cu:179
 </code>
@@ Ligne 480: / Ligne 481: @@
 https://github.com/zml/zml/
+===== Réduction de tokens =====
+Headroom
+  * Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.
+  * https://headroom-docs.vercel.app/docs
+  * https://github.com/chopratejas/headroom
+  * https://www.lemondeinformatique.fr/actualites/lire-headroom-un-projet-open-source-pour-reduire-la-facture-des-tokens-100357.html
+RTK
+  * CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single Rust binary, zero dependencies
+  * https://www.rtk-ai.app/
+  * https://github.com/rtk-ai/rtk
+Openwolf
+  * Sharper context. Fewer tokens. Open-source middleware for Claude Code.
+  * https://openwolf.com/
+  * https://github.com/cytostack/openwolf