Différences

Ci-dessous, les différences entre deux révisions de la page.

--- informatique:ai_lm [21/04/2026 08:58] – ↷ Liens modifiés en raison d'un déplacement. cyrille
+++ informatique:ai_lm [03/05/2026 11:02] (Version actuelle) – [NanoLLM] cyrille
@@ Ligne 15: / Ligne 15: @@
   * [[/informatique/ai_lm/gpu_bench|GPU Benchmarks]]
   * [[/informatique/ai_lm/ai_vision|AI Vision]]
+  * [[/informatique/ai_lm/ai_agent|AI Agent]]
 ===== Glossaire =====
@@ Ligne 36: / Ligne 37: @@
   * **RAG** (Retrieval-Augmented Generation): combine deux capacités de l’IA -> la récupération d’informations et la génération de texte.
     * **ReRanking** (//nettoyage intelligent//) consiste à réévaluer et réorganiser les résultat de la phase de retrieval (RAG) pour ne garder que les éléments les plus pertinents et supprimer les redondances
-  * [[/informatique/ai_lm/ai_coding#agents_ia|Agents IA]]
+  * [[/informatique/ai_lm/ai_agent|Agents IA]]
+  * **CoT** Chain of Thought - Un modèle en mode CoT répond en exposant ses étapes de raisonnement, en mode no CoT il répond directement
 Classification de modèles ouverts: [[https://www.ibm.com/fr-fr/products/watsonx-ai/foundation-models|Foundation models]] by Ibm
@@ Ligne 49: / Ligne 51: @@
   * https://www.glukhov.org/fr/post/2025/05/ollama-cpu-cores-usage/
-==== Estimations ====
+  * [[/informatique/ai_lm/gpu_bench|GPU Benchmarks]]
-**Devstral avec llama.cpp sur RTX 3060 12 Go.**
-by ChatGPT :
-| Modèle            | Contexte (seq_len) | Batch_size recommandé | Remarques                                |
-| ----------------- | ------------------ | --------------------- | ---------------------------------------- |
-| Devstral Small 7B | 1024               | 4                     | Très sûr, VRAM ample                     |
-| Devstral Small 7B | 2048               | 2‑3                   | Bon compromis vitesse/VRAM               |
-| Devstral Small 7B | 4096               | 1‑2                   | VRAM presque saturée                     |
-| Devstral 13B      | 1024               | 2                     | VRAM limitée                             |
-| Devstral 13B      | 2048               | 1‑2                   | Optimal, attention VRAM                  |
-| Devstral 13B      | 4096               | 1                     | VRAM saturée, offload CPU conseillé      |
-| Devstral 13B      | 8192               | 1                     | Possible mais contexte long → risque OOM |
-by LeChat:
-| contexte (tokens) | modèle (paramètres) | VRAM estimée (Go) | Batch size optimal | Latence estimée (tok/s) | Notes |
-| 512 | 7B | ~5.5 | 8 | 15-25 | Idéal pour des tâches courtes et rapides. |
-| 1024 | 7B | ~6.0 | 4 | 10-20 | Bon compromis pour des prompts moyens. |
-| 2048 | 7B | ~7.0 | 2 | 5-15 | Nécessite une gestion fine de la VRAM. |
-| 4096 | 7B | ~8.5 | 1 | 3-10 | Proche de la limite VRAM, risque de ralentissement. |
-| 512 | 13B | ~9.0 | 4 | 8-15 | Modèle plus gros, latence accrue. |
-| 1024 | 13B | ~10.0 | 2 | 4-10 | VRAM presque saturée, batch_size réduit. |
-| 2048 | 13B | ~11.5 | 1 | 2-8 | Risque élevé de dépassement VRAM, latence importante. |
 ==== Online services ====
@@ Ligne 333: / Ligne 311: @@
 </code>
+=== Compilation pour CPU (SYCL) ===
+Linux OneApi toolkit
+  * https://www.intel.com/content/www/us/en/docs/oneapi-toolkit/installation-guide-linux/latest/install-oneapi-toolkit-with-apt.html
+    * 71 paquets pour 2.3 Go
+    * Relire https://github.com/ggml-org/llama.cpp/blob/master/docs/backend/SYCL.md#i-setup-environment pour n'installer que les paquets nécessaire
+Par défaut ''intel-oneapi-toolkit'' installe tout ce monde :
+  intel-oneapi-ccl-2022.0 intel-oneapi-ccl-devel intel-oneapi-ccl-devel-2022.0 intel-oneapi-common-licensing intel-oneapi-common-licensing-2026.0
+  intel-oneapi-common-oneapi-vars intel-oneapi-common-oneapi-vars-2026.0 intel-oneapi-common-vars intel-oneapi-compiler-cpp-eclipse-cfg-2026.0
+  intel-oneapi-compiler-dpcpp-cpp intel-oneapi-compiler-dpcpp-cpp-2026.0 intel-oneapi-compiler-dpcpp-cpp-common-2026.0
+  intel-oneapi-compiler-dpcpp-cpp-runtime-2026.0 intel-oneapi-compiler-dpcpp-eclipse-cfg-2026.0 intel-oneapi-compiler-fortran-2026.0
+  intel-oneapi-compiler-fortran-common-2026.0 intel-oneapi-compiler-fortran-runtime-2026.0 intel-oneapi-compiler-shared-2026.0
+  intel-oneapi-compiler-shared-common-2026.0 intel-oneapi-compiler-shared-runtime-2026.0 intel-oneapi-dev-utilities intel-oneapi-dev-utilities-2026.0
+  intel-oneapi-dev-utilities-eclipse-cfg-2026.0 intel-oneapi-dnnl-2026.0 intel-oneapi-dnnl-devel intel-oneapi-dnnl-devel-2026.0
+  intel-oneapi-dpcpp-cpp-2026.0 intel-oneapi-dpcpp-debugger-2026.0 intel-oneapi-icc-eclipse-plugin-cpp-2026.0 intel-oneapi-ipp-2026.0
+  intel-oneapi-ipp-devel intel-oneapi-ipp-devel-2026.0 intel-oneapi-ippcp-2026.0 intel-oneapi-ippcp-devel intel-oneapi-ippcp-devel-2026.0
+  intel-oneapi-libdpstd-devel-2022.12 intel-oneapi-mkl-classic-devel-2026.0 intel-oneapi-mkl-classic-include-2026.0 intel-oneapi-mkl-cluster-2026.0
+  intel-oneapi-mkl-cluster-devel-2026.0 intel-oneapi-mkl-core-2026.0 intel-oneapi-mkl-core-devel-2026.0 intel-oneapi-mkl-devel
+  intel-oneapi-mkl-devel-2026.0 intel-oneapi-mkl-sycl-2026.0 intel-oneapi-mkl-sycl-blas-2026.0 intel-oneapi-mkl-sycl-data-fitting-2026.0
+  intel-oneapi-mkl-sycl-devel-2026.0 intel-oneapi-mkl-sycl-dft-2026.0 intel-oneapi-mkl-sycl-include-2026.0 intel-oneapi-mkl-sycl-lapack-2026.0
+  intel-oneapi-mkl-sycl-rng-2026.0 intel-oneapi-mkl-sycl-sparse-2026.0 intel-oneapi-mkl-sycl-stats-2026.0 intel-oneapi-mkl-sycl-vm-2026.0
+  intel-oneapi-mpi-2021.18 intel-oneapi-mpi-devel intel-oneapi-mpi-devel-2021.18 intel-oneapi-openmp-2026.0 intel-oneapi-openmp-common-2026.0
+  intel-oneapi-tbb-2023.0 intel-oneapi-tbb-devel intel-oneapi-tbb-devel-2023.0 intel-oneapi-tcm-1.5 intel-oneapi-tlt intel-oneapi-tlt-2026.0
+  intel-oneapi-toolkit intel-oneapi-toolkit-env-2026.0 intel-oneapi-toolkit-getting-started-2026.0 intel-oneapi-umf-1.1 intel-oneapi-vtune
+<code>
+$ source /opt/intel/oneapi/setvars.sh
+$ sycl-ls
+[opencl:cpu][opencl:0] Intel(R) OpenCL, 13th Gen Intel(R) Core(TM) i7-1360P OpenCL 3.0 (Build 0) [2026.21.3.0.31_160000]
+</code>
+En fait ça ne va pas car
+<code bash>
+$ ./llama-ls-sycl-device
+./llama-ls-sycl-device: error while loading shared libraries: libsycl.so.8: cannot open shared object file: No such file or directory
+# Probleme de version 😩
+$ find /opt/intel/oneapi -name "libsycl.so*"
+/opt/intel/oneapi/2026.0/lib/libsycl.so.9.0.0
+/opt/intel/oneapi/2026.0/lib/libsycl.so.9.0.0-gdb.py
+/opt/intel/oneapi/2026.0/lib/libsycl.so
+/opt/intel/oneapi/2026.0/lib/libsycl.so.9
+/opt/intel/oneapi/compiler/2026.0/lib/libsycl.so.9.0.0
+/opt/intel/oneapi/compiler/2026.0/lib/libsycl.so.9.0.0-gdb.py
+/opt/intel/oneapi/compiler/2026.0/lib/libsycl.so
+/opt/intel/oneapi/compiler/2026.0/lib/libsycl.so.9
+</code>
+Ok, passe à la compilation comme expliqué sur https://github.com/ggml-org/llama.cpp/blob/master/docs/backend/SYCL.md#ii-build-llamacpp pour que le binaire utilise la version SYCL installée par ''intel-oneapi-toolkit''.
+<code bash>
+./examples/sycl/build.sh
+</code>
+Compilation sans erreur, mais ... "what():  can not find preferred GPU platform" 😩
+<code>
+$ ./build/bin/llama-ls-sycl-device
+# idem avec
+$ ./build/bin/llama-bench -p 0 -n 128,256,512
+[New LWP 35410]
+[New LWP 35409]
+[New LWP 35408]
+[New LWP 35407]
+[New LWP 35406]
+[New LWP 35405]
+[New LWP 35404]
+[New LWP 35403]
+[New LWP 35402]
+[New LWP 35401]
+[New LWP 35400]
+[New LWP 35399]
+[New LWP 35398]
+[New LWP 35397]
+[New LWP 35396]
+This GDB supports auto-downloading debuginfo from the following URLs:
+  <https://debuginfod.ubuntu.com>
+Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
+Debuginfod has been disabled.
+...
+Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
+x000079304a910813 in __GI___wait4 (pid=35411, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
+warning: 30	../sysdeps/unix/sysv/linux/wait4.c: Aucun fichier ou dossier de ce nom
+#0  0x000079304a910813 in __GI___wait4 (pid=35411, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
+	in ../sysdeps/unix/sysv/linux/wait4.c
+#1  0x000079304e48aa1a in ggml_print_backtrace () from /home/cyrille/Code/bronx/AI_Coding/llama.cpp-SYCL/build/bin/libggml-base.so.0
+#2  0x000079304e4a3d76 in ggml_uncaught_exception() () from /home/cyrille/Code/bronx/AI_Coding/llama.cpp-SYCL/build/bin/libggml-base.so.0
+#3  0x000079304acbb0da in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
+#4  0x000079304aca5a55 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
+#5  0x000079304acbb391 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
+#6  0x000079304b19e765 in dpct::dev_mgr::dev_mgr() () from /home/cyrille/Code/bronx/AI_Coding/llama.cpp-SYCL/build/bin/libggml-sycl.so.0
+#7  0x000079304b16e8f3 in ggml_backend_sycl_print_sycl_devices () from /home/cyrille/Code/bronx/AI_Coding/llama.cpp-SYCL/build/bin/libggml-sycl.so.0
+#8  0x0000000000405527 in main ()
+[Inferior 1 (process 35394) detached]
+terminate called after throwing an instance of 'std::runtime_error'
+  what():  can not find preferred GPU platform
+PLEASE submit a bug report to https://software.intel.com/en-us/support/priority-support and include the crash backtrace and instructions to reproduce the bug.
+Abandon (core dumped)
+</code>
+Et fait un reboot puis ça fonctionne. Les perfs: 2.6 plus rapide que sans SYCL (36.34 vs 13.94).
 ==== ollama ====
@@ Ligne 366: / Ligne 451: @@
   * https://dusty-nv.github.io/NanoLLM/
   * https://www.jetson-ai-lab.com/tutorial_nano-llm.html
 Todo
   * [[https://towardsdatascience.com/how-to-build-an-openai-compatible-api-87c8edea2f06/|How to build an OpenAI-compatible API]]
+==== ZML ====
+https://github.com/zml/zml/