News generation, sentiment scoring, and fallback alert generation now route through self-hosted Ollama and Cloudflare Workers AI first, escalating to Gemini only when local confidence is low or inputs exceed 8K tokens.
The platform previously called Gemini 2.0 Flash directly for news generation (4 calls per cycle), sentiment scoring (batch + individual), and fallback alert generation — costing money on every 6-hour cron cycle. This release introduces a reusable localFirstGenerate() function that tries Ollama or Workers AI first and only escalates to Gemini when necessary.
localFirstGenerate() implements a 5-step routing algorithm:
7 direct Gemini calls replaced across 4 services:
The admin Queue Monitor now includes a live throughput chart (recharts LineChart) between the Worker Health card and Queue Cards. It shows cumulative items completed (green line, left Y-axis) and total queue depth (red line, right Y-axis) over a 5-minute rolling window with 5-second polling.
Was this update useful?