🛠️ Engine RoomMarch 1, 20263 min read

Local-First AI: Stop Paying Gemini for Routine Tasks

News generation, sentiment scoring, and fallback alert generation now route through self-hosted Ollama and Cloudflare Workers AI first, escalating to Gemini only when local confidence is low or inputs exceed 8K tokens.

By TradeStance Engineering

aiollamaworkers-aicost-optimization

What Changed

The platform previously called Gemini 2.0 Flash directly for news generation (4 calls per cycle), sentiment scoring (batch + individual), and fallback alert generation — costing money on every 6-hour cron cycle. This release introduces a reusable localFirstGenerate() function that tries Ollama or Workers AI first and only escalates to Gemini when necessary.

The Gemini Escalation Rule

localFirstGenerate() implements a 5-step routing algorithm:

If input exceeds 8,000 tokens or task is forced (e.g. 30-Day Market Forecast) → Gemini directly
Otherwise → try Ollama on Hetzner (llama3.1:8b), or Workers AI (Mistral 7B) as fallback
Extract confidence from the response (JSON confidence/confidence_score field, default 0.8)
If confidence < 0.6 → re-run on Gemini (escalation)
All routing decisions logged to hybrid_ai_logs for admin monitoring

Services Refactored

7 direct Gemini calls replaced across 4 services:

newsAgentPipeline.ts: generateMarketPrice, generateSingleBrief, generateDigestBrief, generateSentimentDigest (4 calls)
sentimentService.ts: analyzeBatchWithGemini → analyzeBatchLocalFirst, analyzeWithGemini → analyzeLocalFirst (2 calls)
newsAggregator.ts: generateGeminiAlerts → generateFallbackAlerts (1 call), external_id prefix changed from gemini- to ai-

Queue Monitor: Throughput Chart

The admin Queue Monitor now includes a live throughput chart (recharts LineChart) between the Worker Health card and Queue Cards. It shows cumulative items completed (green line, left Y-axis) and total queue depth (red line, right Y-axis) over a 5-minute rolling window with 5-second polling.

Was this update useful?

#ai#ollama#workers-ai#cost-optimization#local-first

Full-Spectrum Translation: 7 Languages, Locale-Aware Rendering, and Admin VisibilityPrevious YCharts-Inspired UI Overhaul: 110 Files, One Unified Design SystemNext