Building Kumari LLM: A Deep Dive into Our Three-Router Architecture

When We started building Kumari LLM, We knew We wanted to create something different. Not just another AI chat API, but a system that could intelligently route requests, optimize for cost and performance, and provide a seamless experience across multiple AI providers. What emerged was a unique three-router architecture that I believe sets us apart from the competition.

The Problem with Traditional AI APIs

Most AI platforms today follow a simple pattern: you send a request, they route it to a single model, and you get a response. But this approach has fundamental limitations:

We wanted to solve these problems by creating an intelligent routing system that could make real-time decisions about which model to use based on the task at hand.

The Three-Router Architecture

Our solution is built around three specialized routers, each named after Sanskrit concepts that reflect their purpose:

Architecture Overview

The following diagram shows how our three routers work together to process user requests:

Three-router architecture overview

1. Buddhi Router: The Intelligence Layer

Buddhi means "intelligence" or "discernment" in Sanskrit, and that's exactly what this router provides. It's our semantic classifier that analyzes incoming prompts and determines what type of task we're dealing with.

Buddhi router classification

The Buddhi router uses a hybrid approach:

This allows us to identify whether a user wants code generation, creative writing, mathematical computation, research synthesis, or image processing - all in under 5ms.

Domain Classification Flow

Here's how the Buddhi router classifies prompts into different domains:

Domain classification flow

2. Moolya Router: The Cost Optimizer

Moolya means "value" or "price" in Sanskrit. This router is our financial brain, ensuring we get the best value for every request.

Moolya router cost optimization

The Moolya router considers:

This means a simple question might get routed to GPT-3.5-turbo (cheap and fast), while complex reasoning goes to GPT-4 (more expensive but more capable).

3. Gati Router: The Performance Optimizer

Gati means "speed" or "velocity" in Sanskrit. This router optimizes for response time and user experience.

Gati router performance optimization

The Gati router tracks:

How the Three Routers Work Together

The magic happens in our master router, which orchestrates all three specialized routers:

Master router workflow

This creates a decision tree:

Multi-Provider Support

One of the key advantages of our architecture is seamless multi-provider support. We currently integrate with:

OpenAI

GPT-4, GPT-3.5-turbo, GPT-4o

Anthropic

Claude-3.5-Sonnet, Claude-3.5-Haiku

Google

Gemini Pro, Gemini Flash

Perplexity

Sonar, Sonar Pro

DeepSeek

DeepSeek Chat, DeepSeek Coder

XAI

Grok-2, Grok-3-Mini

Each provider is abstracted through a unified interface, so our routers can make decisions based on capability, cost, and performance rather than being locked into a single provider.

Multi-Provider Architecture

Our provider abstraction layer allows seamless switching between different AI services:

Multi-provider architecture

Performance Metrics

Our current performance metrics speak for themselves:

1.8-6.7s
Average Response Time
100%
Success Rate
40-60%
Cache Hit Rate
99.9%
Uptime

What Makes Us Different

While other platforms offer AI APIs, Kumari LLM provides:

🧠 Intelligent Routing

Automatic model selection based on task type

💰 Cost Optimization

Always choose the most cost-effective model for the job

⚡ Performance Optimization

Prioritize speed when needed

🔄 Multi-Provider

No vendor lock-in, always use the best available model

🧠 Conversation Memory

Intelligent summarization for long conversations

🛡️ Automatic Failover

If one model fails, seamlessly switch to another

📊 Real-time Monitoring

Track performance and automatically blacklist problematic models

🎯 Intelligent Caching

Smart caching with 40-60% hit rates to reduce response times

The Future

This architecture is designed to scale. We're working on:

Conclusion

Building Kumari LLM has been an incredible journey. The three-router architecture wasn't planned from the start - it evolved as I solved real problems users were facing. What started as a simple AI API has become an intelligent system that can adapt to any task, optimize for any constraint, and provide the best possible experience.

The key insight was that AI isn't just about having good models - it's about having the right model for the right task at the right time. Our three-router system makes that possible, and We believe it represents the future of AI API design.