Methodology — AI OS

Five external data sources feed an enrichment pipeline. The processed stream passes through an LLM router and a vector RAG store before reaching three consumers: the public portal, the admin dashboard, and the MCP server.

Each task is first classified by type, then dispatched to the backend that maximises quality per unit cost and latency. Fast drafts go to Groq; structured extraction and reasoning to OpenAI; bulk jobs to Runpod; embeddings and free-tier work to Ollama.

Task → backend dispatch table

SEO_DRAFT → Groq fast, cheap

CLASSIFY → Groq fast, cheap

EXTRACT → OpenAI structured output

REASONING → OpenAI quality

CRITIQUE → OpenAI quality

PLAN → OpenAI quality

BATCH → Runpod custom models

LOCAL_GEN → Ollama free

EMBEDDING → Ollama 768d vectors

Cost-optimisation formula

cost(task, backend)    = price_per_token × estimated_tokens(task)
latency(task, backend) = base_latency + tokens / throughput
quality(task, backend) = exam_score(backend, task_type)

route(task) = argmax( quality / (cost × √latency) )

Every enrichment cron is benchmarked against its expected interval and a hard SLA. Health degrades through three zones as elapsed time grows — OK → degraded → critical.

OK

degraded

critical

t = 0 1.5 × expected SLA ∞

health(cron) =
  1   if last_run_age < 1.5 × expected_interval   → OK
  0.5 if last_run_age < SLA                        → degraded
  0   otherwise                                     → critical

Agent performance is measured on five axes and collapsed to a single effectiveness score E via a complexity-weighted mean. Heavier tasks contribute proportionally more to the final score.

E = Σ(wᵢ × scoreᵢ) / Σ(wᵢ)

where:
  wᵢ      = complexity weight of task i
  scoreᵢ  = outcome score (0 – 100)

Axes (5 dimensions):
  Speed    — task latency vs budget
  Cost     — tokens × price per token
  Quality  — exam score on task type
  Safety   — guardrail pass rate
  Learning — hint reuse in context pack

Each query is embedded into a 768-dimensional vector, matched against LanceDB chunks by cosine similarity, re-ranked by recency decay, and packed into the model's context window.

💬 query

🔢 embed()

📐 cos_sim

🎯 top-K

🕒 rerank

📦 ctx_pack

🤖 prompt

query → embed(query) → cosine_similarity(query_vec, chunk_vecs)
      → top-K chunks → rerank by recency
      → context_pack → inject into prompt

GOGA classifies the current market regime from real-time signals and selects the appropriate execution strategy. Each row maps a detected condition to its indicator set and the action taken.

Condition	Indicators	Strategy
Pump Entry	1mΔ>0.5%, Vol<2, Book>1.5	Vector/Shot
Dip Buy	RSI<30, 15mΔ<-2%	DCA Grid
Swing Hold	PumpQ>60, Funding<0	Pattern/Adaptive
Avoid	VolRatio>3, RSI>80	Wait

GOGA analytics and risk

How AI OS Works

Architecture Pipeline

LLM Routing Algorithm

Enrichment Freshness Algorithm

Agent Effectiveness Scoring

RAG Retrieval Algorithm

Trading Strategy Selection Matrix