Building Kumari LLM: A Deep Dive into Our Three-Router Architecture

When We started building Kumari LLM, We knew We wanted to create something different. Not just another AI chat API, but a system that could intelligently route requests, optimize for cost and performance, and provide a seamless experience across multiple AI providers. What emerged was a unique three-router architecture that I believe sets us apart from the competition.

The Problem with Traditional AI APIs

Most AI platforms today follow a simple pattern: you send a request, they route it to a single model, and you get a response. But this approach has fundamental limitations:

No intelligence in routing: The same model handles everything from code generation to creative writing
Cost inefficiency: Expensive models for simple tasks, cheap models for complex ones
Performance bottlenecks: No consideration for latency vs. quality trade-offs
Provider lock-in: You're stuck with one AI provider's limitations

We wanted to solve these problems by creating an intelligent routing system that could make real-time decisions about which model to use based on the task at hand.

The Three-Router Architecture

Our solution is built around three specialized routers, each named after Sanskrit concepts that reflect their purpose:

Architecture Overview

The following diagram shows how our three routers work together to process user requests:

1. Buddhi Router: The Intelligence Layer

Buddhi means "intelligence" or "discernment" in Sanskrit, and that's exactly what this router provides. It's our semantic classifier that analyzes incoming prompts and determines what type of task we're dealing with.

The Buddhi router uses a hybrid approach:

Keyword matching: Fast, rule-based classification for common patterns
Semantic embeddings: Deep learning-based similarity matching using sentence transformers
Domain classification: Maps prompts to specific capability domains

This allows us to identify whether a user wants code generation, creative writing, mathematical computation, research synthesis, or image processing - all in under 5ms.

Domain Classification Flow

Here's how the Buddhi router classifies prompts into different domains:

2. Moolya Router: The Cost Optimizer

Moolya means "value" or "price" in Sanskrit. This router is our financial brain, ensuring we get the best value for every request.

The Moolya router considers:

Token cost: Direct cost per 1K tokens for each model
Latency trade-offs: Sometimes faster models cost more, but provide better UX
Capability matching: Only considers models that can handle the task
Provider diversity: Spreads load across different AI providers

This means a simple question might get routed to GPT-3.5-turbo (cheap and fast), while complex reasoning goes to GPT-4 (more expensive but more capable).

3. Gati Router: The Performance Optimizer

Gati means "speed" or "velocity" in Sanskrit. This router optimizes for response time and user experience.

The Gati router tracks:

Real-time latency: Historical performance data for each model
Response time optimization: Prioritizes models with proven speed
Fallback mechanisms: If a fast model fails, it can switch to alternatives
User experience: Ensures consistent, predictable response times

How the Three Routers Work Together

The magic happens in our master router, which orchestrates all three specialized routers:

This creates a decision tree:

Buddhi filters models by capability
Policy determines optimization strategy
Moolya or Gati selects the optimal model
Master router handles the final selection and fallback

Multi-Provider Support

One of the key advantages of our architecture is seamless multi-provider support. We currently integrate with:

OpenAI

GPT-4, GPT-3.5-turbo, GPT-4o

Anthropic

Claude-3.5-Sonnet, Claude-3.5-Haiku

Google

Gemini Pro, Gemini Flash

Perplexity

Sonar, Sonar Pro

DeepSeek

DeepSeek Chat, DeepSeek Coder

XAI

Grok-2, Grok-3-Mini

Each provider is abstracted through a unified interface, so our routers can make decisions based on capability, cost, and performance rather than being locked into a single provider.

Multi-Provider Architecture

Our provider abstraction layer allows seamless switching between different AI services:

Performance Metrics

Our current performance metrics speak for themselves:

1.8-6.7s

Average Response Time

100%

Success Rate

40-60%

Cache Hit Rate

99.9%

Uptime

What Makes Us Different

While other platforms offer AI APIs, Kumari LLM provides:

🧠 Intelligent Routing

Automatic model selection based on task type

💰 Cost Optimization

Always choose the most cost-effective model for the job

⚡ Performance Optimization

Prioritize speed when needed

🔄 Multi-Provider

No vendor lock-in, always use the best available model

🧠 Conversation Memory

Intelligent summarization for long conversations

🛡️ Automatic Failover

If one model fails, seamlessly switch to another

📊 Real-time Monitoring

Track performance and automatically blacklist problematic models

🎯 Intelligent Caching

Smart caching with 40-60% hit rates to reduce response times

The Future

This architecture is designed to scale. We're working on:

Custom Model Training: Fine-tuned models for specific domains
Advanced Prompt Engineering: Automatic prompt optimization
Multi-Modal Support: Enhanced image and video processing
Enterprise Features: Advanced analytics and team management
API Marketplace: Allow developers to contribute custom models

Conclusion

Building Kumari LLM has been an incredible journey. The three-router architecture wasn't planned from the start - it evolved as I solved real problems users were facing. What started as a simple AI API has become an intelligent system that can adapt to any task, optimize for any constraint, and provide the best possible experience.

The key insight was that AI isn't just about having good models - it's about having the right model for the right task at the right time. Our three-router system makes that possible, and We believe it represents the future of AI API design.