Skip to main content

Technical Evaluator OS

Live build

Evaluation console — API surface, deterministic outputs, traces, security posture, JSON-first.

Console mode — Technical-evaluator voice: deterministic, citation-first, no marketing language.

Monday-morning value: Monday 8am: see API quotas, trace samples, and the evaluation checklist for the pilot you're scoping.

Checking session…

API Surface

Documented endpoints, auth requirements, and the live build identity from /api/health. No fabricated SLAs.

Live build identity
Probing /api/health…
GET
/api/health
public

Service health, version, revision, environment, Stripe webhook readiness, mailing-address config — the deterministic build identity probe.

GET
/api/subscriptions/manage
cookie_session

Authed user subscription: status, tier, period end, Stripe-backed flag, trial classification.

POST
/api/subscriptions/start-trial
cookie_session

Internal no-card trial start — fail-closed; respects single-trial-per-user contract.

GET
/api/tools
public

Tool registry list — supports ?category and ?featured filters. Rate-limited.

GET
/api/tools/[id]
public

Tool detail by id — schema-first, validated inputs, deterministic outputs.

GET
/api/dashboard/usage
cookie_session

Per-user usage analytics: 14-day daily series + totals + top tools.

GET
/api/dashboard/tool-sessions
cookie_session

Authed user saved tool sessions — supports trace replay use cases.

GET
/api/dashboard/api-keys
cookie_session

Authed user API keys list (does NOT echo plaintext secret material).

Catalog reflects routes verifiable in app/api/*. Auth tier is enforced server-side, not just UI-hidden.

Evaluation Checklist

The decision-matrix worksheet. Score per criterion, weight by severity, leave a note. Saved to your account across devices.

Verdict score

0%(0/30)

0 pass0 fail8 pending0 n/a

Tenant isolation enforced server-side (RLS, not UI-only)

security · weight 5

Foundational; UI-only isolation is a P0 finding in any audit.

weight:5

Secret material never returned to client or logged

security · weight 5

API key disclosure is a P0; one-time-show pattern is the bar.

weight:5

Deterministic health endpoint with version + revision

reliability · weight 4

Required for safe rollouts and incident triage.

weight:4

Public endpoints rate-limited with standard 429 headers

reliability · weight 4

Protects vendor from abuse + protects your bill.

weight:4

Pricing tiers documented with per-tier limits, not "contact sales"

cost · weight 3

Self-serve evaluability — a hard requirement for pilot decisions.

weight:3

Stripe (or equivalent) webhook signature verification

integration · weight 4

Webhooks without signature checks are an exploit primitive.

weight:4

LLM provider abstraction (not single-vendor lock-in)

integration · weight 3

Risk mitigation — provider pricing/availability changes.

weight:3

Per-persona usage docs + sample evaluations exist

support · weight 2

Faster onboarding for the evaluator and the eventual buyer.

weight:2

Sign in to save your verdict across devices. Scores are private to your account.

Security Posture

Per-control summary with evidence pointers. "Enforced" means code-backed; "documented" means architectural intent; "planned" means not yet shipped.

6 enforced2 documented1 planned/ 9 total
  • Supabase RLS for tenant data

    enforced

    evidence: All write paths go through createSupabaseServerClient with the authed user JWT; service_role is server-only and never bundled.

  • Cookie-based session, httpOnly

    enforced

    evidence: Supabase sets httpOnly + SameSite cookies; useAuth/useSubscription read from session, not localStorage.

  • Rate limiting on public endpoints

    enforced

    evidence: lib/rateLimit + checkRateLimitDurable wraps /api/tools, /api/tools/[id], and other public endpoints; returns 429 with standard headers.

  • Stripe webhook signature verification

    enforced

    evidence: /api/health surfaces stripe webhook readiness; webhook handler validates signature before processing.

  • Secret material never logged or echoed

    enforced

    evidence: Hard contract — /api/health redacts placeholder values; api-keys endpoint never returns plaintext after creation.

  • Persona-OS workspace surfaces unlisted from sitemap

    enforced

    evidence: Every /<persona>-os page sets robots: { index: false, follow: true } — they're workspaces, not marketing.

  • CSRF: cookie SameSite=Lax + state-changing routes POST only

    documented

    evidence: Supabase default SameSite cookies; tool execution routes accept POST with JSON body and validate origin via Next.js framework.

  • Per-user usage caps surface on dashboard

    documented

    evidence: /api/dashboard/usage exposes per-user 30-day usage events; tier gates are enforced server-side, not just UI-hidden.

  • Audit trail for write actions

    planned

    evidence: bss_usage_events table records reads/writes today; dedicated audit_log table planned for admin-grade attribution.

For an external attestation, request the latest security-audit memo from the BSS owner — independent of the surface here.

Integration Surface

Third-party dependencies, their purpose, and what fails when each goes down. Required = hard dep (BSS outage); not required = soft dep (graceful degradation).

2 hard · 4 soft
  • Supabase

    auth

    hard dep

    Auth (email+password / OAuth), Postgres, RLS, storage.

    Failure mode

    Hard dep — outage = no auth, no DB reads/writes. Fail-closed at request boundary.

  • Stripe

    payments

    soft dep

    Subscription billing, checkout, webhook-driven entitlement sync.

    Failure mode

    Soft dep — workspace remains readable; new subscriptions block until Stripe restored. Existing entitlements unaffected.

  • Resend / SES (provider-flagged)

    email

    soft dep

    Transactional + notification email.

    Failure mode

    Soft dep — UI flows succeed; email retry queue picks up later. Surfaced in /api/health send-gate.

  • Vercel platform

    observability

    hard dep

    Edge runtime, deploy infra, env var management.

    Failure mode

    Hard dep — platform outage = full app down. Mitigated by Vercel SLA + DNS-level fallback.

  • OpenAI / Anthropic (provider-agnostic)

    ai

    soft dep

    LLM completions for content-factory + AI-assisted tool runs (where enabled).

    Failure mode

    Soft dep — non-AI tools fully functional; AI tools fail-closed with provider error.

  • Cloudflare (CDN + DNS)

    storage

    soft dep

    Asset CDN + DNS + DDoS shield.

    Failure mode

    Soft dep — origin serves on Cloudflare bypass. Static assets degrade gracefully.

Catalog is intentionally short — fewer dependencies = smaller blast radius.

Trace Inspector

Real per-execution traces from your workspace (bss_tool_sessions). Pick a trace to inspect its raw inputs/outputs.

Run a reference call — prove determinism

Executes the real roi-calculator tool (pure arithmetic, no network, no LLM) twice with fixed inputs, hashes each normalized result (SHA-256), and asserts they are byte-identical. See reproducibility instead of reading a claim.

Canned inputs

{
  "currentAnnualCost": 120000,
  "estimatedAnnualSavings": 48000,
  "implementationCost": 25000,
  "timeToImplement": 3,
  "discountRate": 10
}

Run the reference call to capture the verdict, both run hashes, and timing.

Loading sessions…

Cost Projection

Pilot-cost model per published tier. Annual = monthly × 12 — no fabricated discount.

Linear projection only — does not assume volume discount.

free

$0/mo

$0/yr (monthly × 12, no fabricated discount)

seats: 1 evaluator

tool budget: Limited per-tool runs; rate-limited for fairness

api budget: Public endpoints only; no API keys

Pilot cost (3mo × 1 seat)

$0

Best for: validating tool quality, schema review, security posture audit. Not a long-term workspace.

starter

$29/mo

$348/yr (monthly × 12, no fabricated discount)

seats: 1 paid seat

tool budget: Generous monthly tool runs; saved tool sessions

api budget: Personal API key, standard rate limit

Pilot cost (3mo × 1 seat)

$87

Best for: a single technical evaluator running pilots; the smallest paid commitment that unlocks saved artifacts.

pro

$99/mo

$1188/yr (monthly × 12, no fabricated discount)

seats: Up to small team

tool budget: Highest per-month tool runs + advanced tools

api budget: Pro API tier, higher rate limit, per-key analytics

Pilot cost (3mo × 1 seat)

$297

Best for: a team scoping a department-wide rollout; full saved-session history + advanced analytics.

Honesty: published rates only. For multi-seat or annual prepay quotes, the evaluator should request a written quote from the BSS owner — those terms are not surfaced here.

Reference Architecture

A deterministic diagram of how BSS fits into a stack — nodes, protocols, no marketing arrows.

HTTPSinternal RPCHTTPS (JWT)HTTPS (server key)HTTPS (server key)internalwebhook HTTPSBrowser (Next.js client)clientVercel edge runtimeedgeNext.js server (nodejs runtime)serviceSupabase (Postgres + Auth)dataStripeexternalLLM providersexternalRate limit store (durable)service
  • Browser (Next.js client)React 19 + Tailwind; hydrates SSR HTML; reads session cookie.
  • Vercel edge runtimeRoutes requests to nearest region; runs middleware.
  • Next.js server (nodejs runtime)App router handlers; force-dynamic where needed; SSR.
  • Supabase (Postgres + Auth)RLS-enforced data, JWT session, Storage buckets.
  • StripeCheckout, subscriptions, webhook-driven entitlement sync.
  • LLM providersAnthropic / OpenAI; provider-agnostic content-factory.
  • Rate limit store (durable)Per-IP and per-user durable token bucket.

Edge style: solid = direct call; dashed = webhook (inbound to next-server).