informatique:ai_lm
Différences
Ci-dessous, les différences entre deux révisions de la page.
| Les deux révisions précédentesRévision précédenteProchaine révision | Révision précédente | ||
| informatique:ai_lm [03/05/2026 11:02] – [NanoLLM] cyrille | informatique:ai_lm [23/06/2026 05:40] (Version actuelle) – [Compilation pour GPU] cyrille | ||
|---|---|---|---|
| Ligne 53: | Ligne 53: | ||
| * [[/ | * [[/ | ||
| + | Installer [[/ | ||
| ==== Online services ==== | ==== Online services ==== | ||
| Ligne 264: | Ligne 265: | ||
| # RTX 5060 : 120 | # RTX 5060 : 120 | ||
| - | $ export CUDA_VERSION=12.9 | + | $ export CUDA_VERSION=12.9 |
| + | $ export CUDA_VERSION=13.3 | ||
| + | $ cmake -B build -DGGML_CUDA=ON \ | ||
| | | ||
| | | ||
| Ligne 286: | Ligne 289: | ||
| -- Build files have been written to: / | -- Build files have been written to: / | ||
| - | $ time cmake --build build --config Release -j 10 | + | $ time cmake --build build --clean-first |
| # host: i7-1360P + SSD | # host: i7-1360P + SSD | ||
| Ligne 302: | Ligne 305: | ||
| user 61m37, | user 61m37, | ||
| sys 2m37, | sys 2m37, | ||
| - | </ | ||
| - | Avec CUDA 13.1 llama.cpp plante direct à la 1ère requête, mais sans message dans syslog | + | # host: Core(TM) Ultra 7 270K Plus |
| - | < | + | real 3m6.637s |
| - | / | + | user 27m13.877s |
| - | CUDA error: invalid argument | + | sys 1m24.687s |
| - | current device: 0, in function ggml_cuda_mul_mat_q at / | + | |
| </ | </ | ||
| Ligne 418: | Ligne 419: | ||
| Et fait un reboot puis ça fonctionne. Les perfs: 2.6 plus rapide que sans SYCL (36.34 vs 13.94). | Et fait un reboot puis ça fonctionne. Les perfs: 2.6 plus rapide que sans SYCL (36.34 vs 13.94). | ||
| + | |||
| + | ==== mistral.rs ==== | ||
| + | |||
| + | Aucun rapport avec Mistral.ai | ||
| + | |||
| + | https:// | ||
| + | |||
| + | * Any Hugging Face model, zero config | ||
| + | * True multimodality: | ||
| + | * Smart quantization | ||
| + | * Built-in web UI | ||
| + | * Hardware-aware | ||
| + | * Flexible SDKs: Python package and Rust crate to build your projects. | ||
| + | * Native agentic support: built-in agentic loop with web search, local Python code execution with model feedback, session management, and custom tool hooks. | ||
| + | |||
| + | À l' | ||
| + | * la compilation est très longue (743 fichiers) et s' | ||
| + | * brancher le eGpu avant, sinon faudra re-installer 😩 | ||
| + | * ça va activer '' | ||
| + | |||
| + | |||
| ==== ollama ==== | ==== ollama ==== | ||
| Ligne 459: | Ligne 481: | ||
| https:// | https:// | ||
| + | ===== Réduction de tokens ===== | ||
| + | |||
| + | Headroom | ||
| + | * Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server. | ||
| + | * https:// | ||
| + | * https:// | ||
| + | * https:// | ||
| + | |||
| + | RTK | ||
| + | * CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single Rust binary, zero dependencies | ||
| + | * https:// | ||
| + | * https:// | ||
| + | |||
| + | Openwolf | ||
| + | * Sharper context. Fewer tokens. Open-source middleware for Claude Code. | ||
| + | * https:// | ||
| + | * https:// | ||
informatique/ai_lm.1777798940.txt.gz · Dernière modification : de cyrille
