Skip to main content

Eval

The Eval router scores the health of your memory store, tracks drift over time, and provides a CI/CD gate to fail builds when memory quality drops below a threshold. Set recall baselines to catch regressions.

Base URL: https://novyx-ram-api.fly.dev

Tier: All (gate requires Pro+, drift detail requires Pro+)

Plan limits: Free: 3 runs/day, 1 baseline · Starter: 30 runs/day, 5 baselines · Pro+: unlimited


Run Eval

POST /v1/eval/run

Run a health evaluation of your memory store. Returns an overall health score (0–100) with a breakdown across four dimensions.

Request body

ParameterTypeRequiredDefaultDescription
min_scorenumberNoMinimum score threshold (0–100). If set, response includes passed

Response fields

FieldTypeDescription
eval_idstringEvaluation identifier
health_scorenumberOverall health score (0–100)
breakdownobjectScore breakdown (see below)
memory_countnumberTotal memories
conflict_countnumberMemories with conflicts
stale_countnumberStale memories
passedboolean | nullPass/fail result (only if min_score set)
drift_detailobject | nullDrift analysis (Pro+ only)
created_atstringISO 8601 timestamp

Breakdown fields:

FieldTypeDescription
recall_consistencynumberHow well baselines are recalled (0–100)
drift_scorenumberMemory drift measurement (0–100)
conflict_scorenumberConflict health (0–100)
staleness_scorenumberFreshness of memories (0–100)

Examples

from novyx import Novyx

nx = Novyx(api_key="nram_your_key")

result = nx.eval_run()
print(f"Health: {result['health_score']}/100")
print(f"Recall: {result['breakdown']['recall_consistency']}")
print(f"Drift: {result['breakdown']['drift_score']}")

Response

{
"eval_id": "eval_a1b2c3d4",
"health_score": 87.5,
"breakdown": {
"recall_consistency": 95.0,
"drift_score": 82.0,
"conflict_score": 90.0,
"staleness_score": 83.0
},
"memory_count": 142,
"conflict_count": 2,
"stale_count": 8,
"passed": null,
"drift_detail": null,
"created_at": "2026-03-09T12:00:00Z"
}

Errors

StatusCodeCause
429RATE_LIMITEDDaily eval limit exceeded

CI/CD Gate

POST /v1/eval/gate

Run an eval with a required minimum score. Returns 200 on pass, 422 on fail — designed for CI/CD pipelines.

CI/CD integration

Use this in your deployment pipeline to block deploys when memory quality drops:

curl -sf -X POST .../v1/eval/gate -d '{"min_score": 80}' || exit 1

Request body

ParameterTypeRequiredDefaultDescription
min_scorenumberYesMinimum passing score (0–100)

Response fields

Same as Run Eval. passed is always set.

Examples

# Fail CI if health drops below 80
result = nx.eval_gate(min_score=80)
if not result["passed"]:
raise SystemExit(f"Eval failed: {result['health_score']}/100")

Errors

StatusCodeCause
403FEATURE_NOT_AVAILABLERequires Pro+ plan
422GATE_FAILEDHealth score below min_score (response body includes full eval)

Eval History

GET /v1/eval/history

List past eval runs. Retention depends on your plan: Free 7 days, Starter 30 days, Pro+ 90 days.

Query parameters

ParameterTypeRequiredDefaultDescription
limitnumberNo50Max results (1–200)
offsetnumberNo0Pagination offset

Response fields

FieldTypeDescription
entriesarrayArray of eval history entries
total_countnumberTotal entries (within retention window)
has_morebooleanWhether more pages exist

Each entry includes eval_id, health_score, breakdown, memory_count, passed, and created_at.

Examples

history = nx.eval_history(limit=10)
for entry in history["entries"]:
print(f"{entry['created_at']}: {entry['health_score']}/100")

Drift Analysis

GET /v1/eval/drift

Analyze how your memory store has changed over a time window.

Query parameters

ParameterTypeRequiredDefaultDescription
daysnumberNo7Analysis window (1–90 days)

Response fields

FieldTypeDescription
drift_scorenumberOverall drift score
period_daysnumberAnalysis window
memory_count_deltanumberNet change in memory count
avg_importance_deltanumberChange in average importance (Pro+)
top_new_topicsstring[]Emerging topics (Pro+)
top_lost_topicsstring[]Declining topics (Pro+)
tag_shiftsarrayTag count changes (Pro+)

Examples

drift = nx.eval_drift(days=14)
print(f"Drift score: {drift['drift_score']}")
print(f"Memory delta: {drift['memory_count_delta']}")

Errors

StatusCodeCause
403FEATURE_NOT_AVAILABLERequires Starter+ plan

Create Baseline

POST /v1/eval/baselines

Create a recall baseline — a query/expected-answer pair that eval checks on every run.

Request body

ParameterTypeRequiredDefaultDescription
querystringYesRecall query (1–500 characters)
expected_observationstringYesExpected top result (1–2000 characters)

Response fields

FieldTypeDescription
idstringBaseline identifier
querystringRecall query
expected_observationstringExpected result
expected_scorenumber | nullSimilarity score
created_atstringISO 8601 timestamp

Examples

baseline = nx.create_baseline(
query="What UI theme does the user prefer?",
expected_observation="User prefers dark mode and compact layouts",
)
print(baseline["id"])

Response

{
"id": "bl_a1b2c3d4",
"query": "What UI theme does the user prefer?",
"expected_observation": "User prefers dark mode and compact layouts",
"expected_score": null,
"created_at": "2026-03-09T12:00:00Z"
}

Errors

StatusCodeCause
429RATE_LIMITEDBaseline limit exceeded for your plan

List Baselines

GET /v1/eval/baselines

List all recall baselines.

Response fields

FieldTypeDescription
baselinesarrayArray of baseline objects
total_countnumberTotal baselines

Examples

result = nx.list_baselines()
for bl in result["baselines"]:
print(f"{bl['query']}{bl['expected_observation'][:40]}...")

Delete Baseline

DELETE /v1/eval/baselines/{baseline_id}

Delete a recall baseline.

Path parameters

ParameterTypeDescription
baseline_idstringBaseline identifier

Examples

nx.delete_baseline("bl_a1b2c3d4")

Response

Returns 204 No Content on success.

Errors

StatusCodeCause
404NOT_FOUNDBaseline does not exist