***

title: Cortex (Inference)
description: Local llama.cpp and remote cortex helpers in the UE runtime
slug: ue/cortex
---------------

The current UE plugin exposes both local node-backed cortex helpers and remote API-backed cortex helpers.

## Local Cortex

Local inference is driven by `Cortex/CortexThunks.h` and currently loads a model from a file path.

```cpp
#include "RuntimeStore.h"
#include "CLI/CliOperations.h"

auto Store = createSDKStore();

FCortexStatus Local = SDKOps::InitCortex(
    Store,
    TEXT("smollm2-135m"));

FCortexResponse Response =
    SDKOps::CompleteCortex(Store, TEXT("Write one short sentence."));
```

Important points:

* `InitCortex` takes a model ID (e.g. `smollm2-135m`) which maps to a HuggingFace GGUF URL — the SDK downloads it automatically on first use to `{ProjectSavedDir}/ForbocAI/models/`
* You can also pass a direct file path to a `.gguf` model
* Local completion uses the currently initialized llama.cpp context (Metal GPU on Mac)
* `GenerateEmbedding` is also available through `SDKOps::GenerateEmbedding(...)` — the embedding model (`all-MiniLM-L6-v2`) is also downloaded automatically on first use

## Remote Cortex

```cpp
#include "RuntimeStore.h"
#include "CLI/CliOperations.h"
#include "RuntimeConfig.h"

auto Store = createSDKStore();
SDKConfig::SetApiConfig(TEXT("https://api.forboc.ai"), ApiKey);

TArray<FCortexModelInfo> Models = SDKOps::ListCortexModels(Store);
FCortexStatus Remote = SDKOps::InitRemoteCortex(Store, TEXT("api-integrated"));
FCortexResponse RemoteResp =
    SDKOps::CompleteRemoteCortex(Store, Remote.Id, TEXT("Summarize this."));
```

## Protocol Requirement

`rtk::processNPC(...)` with `rtk::LocalProtocolRuntime()` uses local cortex for `ExecuteInference` instructions. Remote fallback is intentionally disabled in that path.

If local cortex is not initialized and the API requests inference, `processNPC(...)` fails fast.

## Key Types

* `FCortexStatus`
* `FCortexConfig`
* `FCortexResponse`
* `FCortexModelInfo`

`FCortexConfig` also supports optional stop sequences and a JSON schema payload for remote completion requests.
