A full AI infrastructure migration — every Gemini call now routes through the Hybrid AI Orchestrator with 3-tier local model routing, 1-hour AI Gateway edge caching, and per-request model-source diagnostics.
TradeStance uses AI for HS code classification, fraud detection audits, sentiment scoring, and consensus pricing. Previously, several services bypassed the central Hybrid AI Orchestrator and called the Gemini API directly — meaning no caching, no fallback, no cost tracking, and no local-model offloading.
Both hsMapper.ts (HS code classification) and anomalyDetector.ts (fraud detection audit) have been refactored to route through askAI() instead of making direct Gemini API calls. This means every AI request now benefits from KV caching, automatic fallback, and unified logging.
The orchestrator now supports three priority tiers:
LOW and MEDIUM tasks run entirely on Cloudflare Workers AI at zero marginal cost. Only HIGH-priority tasks hit the paid Gemini API, and even those benefit from 1-hour edge caching.
All Gemini API requests routed through Cloudflare AI Gateway now include the cf-aig-cache-ttl: 3600 header, enabling 1-hour edge caching for repeated prompts. This dramatically reduces Gemini API costs for identical consensus queries and HS verification lookups.
Every AI request now logs which provider handled it — workers-ai, gemini-api, or cache. The Hybrid AI Dashboard displays color-coded source badges per log entry, letting operators see at a glance how traffic splits across providers. A new SQL migration adds the model_source column with an index for fast dashboard queries.
All AI logging now uses ctx.waitUntil() so log writes never block the response. The orchestrator accepts an optional ExecutionContext and defers Supabase inserts and KV cache writes to background execution, keeping response latency minimal.
Was this update useful?