UnifiedEngine API Reference
UnifiedEngine exposes an OpenAI-compatible HTTP API served entirely from your Mac. Any OpenAI SDK that supports a custom base URL can talk to it unchanged.
Base URL
http://127.0.0.1:38180 To expose the daemon to a trusted network, bind a non-loopback host and set an API key:
UE_API_KEY="replace-with-a-secret" \
cargo run -p ue-daemon -- \
--host 0.0.0.0 \
--port 38180 \
--model /path/to/model.gguf \
--cors-origin "https://example.com" \
--max-request-bytes 1048576 \
--max-concurrent-requests 4 \
--requests-per-minute 60 Authentication
When the daemon binds to a non-loopback host, an API key is
required. Send it as a bearer token on every /v1/*
request:
Authorization: Bearer <api-key> /health stays unauthenticated for local readiness
checks. In the macOS app, keys are stored in Keychain and passed
to the helper daemon through UE_API_KEY.
/health Unauthenticated readiness probe.
{
"status": "ok",
"version": "0.0.3"
} /v1/status
Returns runtime, backend, and active model details. Local model
paths are redacted unless the daemon is started with
--expose-model-path.
{
"status": "running",
"version": "0.0.3",
"endpoint": "http://127.0.0.1:38180",
"started_at": 1730000000,
"runtime": {
"runtime": "llama.cpp",
"model_id": "local-gguf",
"backend": "metal",
"ready": true
},
"active_model": {
"id": "local-gguf",
"object": "model",
"owned_by": "unifiedengine",
"format": "gguf",
"status": "valid"
}
} /v1/models Returns active model metadata.
/v1/chat/completions Supported fields:
modelmessagestemperaturemax_tokensstream
Unsupported OpenAI fields are ignored in v0.0.3.
Non-streaming
curl http://127.0.0.1:38180/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <api-key-if-configured>" \
-d '{"model":"local-gguf","messages":[{"role":"user","content":"Hello"}],"stream":false}' Streaming (SSE)
curl -N http://127.0.0.1:38180/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <api-key-if-configured>" \
-d '{"model":"local-gguf","messages":[{"role":"user","content":"Hello"}],"stream":true}' Limits & validation
The control plane validates every request and enforces the limits you configure at startup:
--max-request-bytesMaximum request body size--max-concurrent-requestsIn-flight request ceiling--requests-per-minutePer-client rate limit--cors-originAllowed cross-origin callerx-request-idReturned on every response for tracing