# Cortex (Inference) > Run local Small Language Models (SLMs) in the browser # Cortex Module `Infèrence_Éngine // WébGPU_Accélerated` **ᚠ ᛫ ᛟ ᛫ ᚱ ᛫ ᛒ ᛫ ᛟ ᛫ ᚲ** The Cortex module is the "Voice" of the AI. It handles local text generation using quantized Small Language Models (SLMs) running directly in the client's browser via WebGPU. > *N̷o̴ ̷c̶l̵o̷u̴d̸.̴ ̷N̵o̶ ̸l̷a̴t̸e̸n̴c̵y̷.̸ ̷P̴u̴r̵e̴ ̶t̵h̷o̵u̸g̸h̸t̵.̷* *** ## Features * **Local Inference**: Runs entirely on the user's device (Edge AI). * **WebGPU Acceleration**: Utilizes the GPU via `@mlc-ai/web-llm` for near-native performance. * **Offline Capable**: Once the model is cached, it works without internet. * **Privacy First**: No prompt data leaves the user's device. *** ## Supported Models | Model | Parameters | Quantization | Size | Use Case | | -------------- | ---------- | ------------ | ------- | ------------------------------------ | | `smollm2-135m` | 135M | q4f16\_1 | \~100MB | Low-end devices, simple dialogue | | `llama3-8b` | 8B | q4f16\_1 | \~4.5GB | High-end desktops, complex reasoning | *** ## Usage ### Initialization Initialize the Cortex engine. This triggers the model download if not cached. ```typescript import { createCortex } from 'forbocai'; // Initialize with a lightweight model const cortex = createCortex({ model: 'smollm2-135m', gpu: true }); // Load the model (Async) await cortex.init(); // Check status (Note: status is internal/private in strict mode, returns ready state via init) console.log("Cortex Ready"); ``` ### Generation Generate text or dialogue. ```typescript const response = await cortex.complete("Hello, how are you?", { temperature: 0.7, maxTokens: 100 }); console.log(response); ``` ### Streaming Stream the response character-by-character for a retro terminal effect. ```typescript const stream = cortex.completeStream("Tell me a story about a cyber-knight."); for await (const chunk of stream) { process.stdout.write(chunk); } ``` *** ## Performance Tips 1. **Warmup**: The first generation might be slower due to shader compilation. 2. **Caching**: The model is cached in the browser Cache API. Subsequent loads are instant. 3. **VRAM**: Ensure the user has enough VRAM. `smollm2-135m` requires \~200MB, while `llama3-8b` needs \~6GB. *** **ᚠ ᛫ ᛟ ᛫ ᚱ ᛫ ᛒ ᛫ ᛟ ᛫ ᚲ**