BMAD-METHOD/bmad-system/llm-integration/universal-llm-interface.md

# Universal LLM Interface

## Multi-Provider LLM Abstraction for Enhanced BMAD System

The Universal LLM Interface provides seamless integration with multiple LLM providers, enabling the BMAD system to work with Claude, GPT, Gemini, DeepSeek, Llama, and any future LLM while optimizing for cost, capability, and performance.

### LLM Abstraction Architecture

#### Universal LLM Provider Framework
```yaml
llm_provider_architecture:
  core_abstraction:
    universal_interface:
      - standardized_request_format: "Common interface for all LLM interactions"
      - response_normalization: "Unified response structure across providers"
      - capability_detection: "Automatic detection of LLM-specific capabilities"
      - error_handling: "Graceful degradation and fallback mechanisms"
      - cost_tracking: "Real-time cost monitoring and optimization"

    provider_adapters:
      anthropic_claude:
        - api_integration: "Native Claude API integration"
        - tool_use_support: "Advanced tool use capabilities"
        - function_calling: "Native function calling support"
        - streaming_support: "Real-time streaming responses"
        - context_windows: "Large context window optimization"

      openai_gpt:
        - gpt4_integration: "GPT-4 and GPT-4 Turbo support"
        - function_calling: "OpenAI function calling format"
        - vision_capabilities: "GPT-4 Vision integration"
        - code_interpreter: "Code execution capabilities"
        - assistant_api: "OpenAI Assistant API integration"

      google_gemini:
        - gemini_pro_integration: "Gemini Pro and Ultra support"
        - multimodal_capabilities: "Text, image, and video processing"
        - code_execution: "Native code execution environment"
        - safety_filters: "Built-in safety and content filtering"
        - vertex_ai_integration: "Enterprise Vertex AI support"

      deepseek_coder:
        - code_specialization: "Code-focused LLM optimization"
        - repository_understanding: "Large codebase comprehension"
        - code_generation: "Advanced code generation capabilities"
        - technical_reasoning: "Deep technical problem solving"

      meta_llama:
        - open_source_integration: "Llama 2 and Code Llama support"
        - local_deployment: "On-premises deployment support"
        - fine_tuning: "Custom model fine-tuning capabilities"
        - privacy_preservation: "Complete data privacy control"

      custom_providers:
        - plugin_architecture: "Support for custom LLM providers"
        - api_adaptation: "Automatic API format adaptation"
        - capability_mapping: "Custom capability definition"
        - performance_monitoring: "Custom provider performance tracking"
```

#### LLM Capability Detection and Routing
```python
async def detect_llm_capabilities(provider_name, model_name):
    """
    Automatically detect and catalog LLM capabilities for intelligent routing
    """
    capability_detection = {
        'provider': provider_name,
        'model': model_name,
        'capabilities': {},
        'performance_metrics': {},
        'cost_metrics': {},
        'limitations': {}
    }

    # Test core capabilities
    core_capabilities = await test_core_llm_capabilities(provider_name, model_name)

    # Test specialized capabilities
    specialized_capabilities = {
        'code_generation': await test_code_generation_capability(provider_name, model_name),
        'code_analysis': await test_code_analysis_capability(provider_name, model_name),
        'function_calling': await test_function_calling_capability(provider_name, model_name),
        'tool_use': await test_tool_use_capability(provider_name, model_name),
        'multimodal': await test_multimodal_capability(provider_name, model_name),
        'reasoning': await test_reasoning_capability(provider_name, model_name),
        'context_handling': await test_context_handling_capability(provider_name, model_name),
        'streaming': await test_streaming_capability(provider_name, model_name)
    }

    # Performance benchmarking
    performance_metrics = await benchmark_llm_performance(provider_name, model_name)

    # Cost analysis
    cost_metrics = await analyze_llm_costs(provider_name, model_name)

    capability_detection.update({
        'capabilities': {**core_capabilities, **specialized_capabilities},
        'performance_metrics': performance_metrics,
        'cost_metrics': cost_metrics,
        'detection_timestamp': datetime.utcnow().isoformat(),
        'confidence_score': calculate_capability_confidence(core_capabilities, specialized_capabilities)
    })

    return capability_detection

async def intelligent_llm_routing(task_requirements, available_providers):
    """
    Intelligently route tasks to optimal LLM based on capabilities, cost, and performance
    """
    routing_analysis = {
        'task_requirements': task_requirements,
        'candidate_providers': [],
        'routing_decision': {},
        'fallback_options': [],
        'cost_optimization': {}
    }

    # Analyze task requirements
    task_analysis = await analyze_task_requirements(task_requirements)

    # Score each available provider
    for provider in available_providers:
        provider_score = await score_provider_for_task(provider, task_analysis)

        routing_candidate = {
            'provider': provider,
            'capability_match': provider_score['capability_match'],
            'performance_score': provider_score['performance_score'],
            'cost_efficiency': provider_score['cost_efficiency'],
            'reliability_score': provider_score['reliability_score'],
            'overall_score': calculate_overall_provider_score(provider_score)
        }

        routing_analysis['candidate_providers'].append(routing_candidate)

    # Select optimal provider
    optimal_provider = select_optimal_provider(routing_analysis['candidate_providers'])

    # Define fallback strategy
    fallback_providers = define_fallback_strategy(
        routing_analysis['candidate_providers'],
        optimal_provider
    )

    routing_analysis.update({
        'routing_decision': optimal_provider,
        'fallback_options': fallback_providers,
        'cost_optimization': calculate_cost_optimization(optimal_provider, task_analysis)
    })

    return routing_analysis

class UniversalLLMInterface:
    """
    Universal interface for interacting with multiple LLM providers
    """

    def __init__(self):
        self.providers = {}
        self.capability_cache = {}
        self.cost_tracker = CostTracker()
        self.performance_monitor = PerformanceMonitor()

    async def register_provider(self, provider_name, provider_config):
        """Register a new LLM provider"""
        provider_adapter = await create_provider_adapter(provider_name, provider_config)

        # Test provider connectivity
        connectivity_test = await test_provider_connectivity(provider_adapter)

        if connectivity_test.success:
            self.providers[provider_name] = provider_adapter

            # Detect and cache capabilities
            capabilities = await detect_llm_capabilities(
                provider_name,
                provider_config.get('model', 'default')
            )
            self.capability_cache[provider_name] = capabilities

            return {
                'registration_status': 'success',
                'provider': provider_name,
                'capabilities': capabilities,
                'ready_for_use': True
            }
        else:
            return {
                'registration_status': 'failed',
                'provider': provider_name,
                'error': connectivity_test.error,
                'ready_for_use': False
            }

    async def execute_task(self, task_definition, routing_preferences=None):
        """
        Execute a task using the optimal LLM provider
        """
        # Determine optimal provider
        routing_decision = await intelligent_llm_routing(
            task_definition,
            list(self.providers.keys())
        )

        optimal_provider = routing_decision['routing_decision']['provider']

        # Execute task with monitoring
        execution_start = datetime.utcnow()

        try:
            # Execute with primary provider
            result = await self.providers[optimal_provider].execute_task(task_definition)

            execution_duration = (datetime.utcnow() - execution_start).total_seconds()

            # Track performance and costs
            await self.performance_monitor.record_execution(
                optimal_provider,
                task_definition,
                result,
                execution_duration
            )

            await self.cost_tracker.record_usage(
                optimal_provider,
                task_definition,
                result
            )

            return {
                'result': result,
                'provider_used': optimal_provider,
                'execution_time': execution_duration,
                'routing_analysis': routing_decision,
                'status': 'success'
            }

        except Exception as e:
            # Try fallback providers
            for fallback_provider in routing_decision['fallback_options']:
                try:
                    fallback_result = await self.providers[fallback_provider['provider']].execute_task(
                        task_definition
                    )

                    execution_duration = (datetime.utcnow() - execution_start).total_seconds()

                    return {
                        'result': fallback_result,
                        'provider_used': fallback_provider['provider'],
                        'execution_time': execution_duration,
                        'primary_provider_failed': optimal_provider,
                        'fallback_used': True,
                        'status': 'success_with_fallback'
                    }

                except Exception as fallback_error:
                    continue

            # All providers failed
            return {
                'status': 'failed',
                'primary_provider': optimal_provider,
                'primary_error': str(e),
                'fallback_attempts': len(routing_decision['fallback_options']),
                'execution_time': (datetime.utcnow() - execution_start).total_seconds()
            }
```

### Provider-Specific Adapters

#### Claude Adapter Implementation
```python
class ClaudeAdapter:
    """
    Adapter for Anthropic Claude API integration
    """

    def __init__(self, config):
        self.config = config
        self.client = anthropic.Anthropic(api_key=config['api_key'])
        self.model = config.get('model', 'claude-3-sonnet-20240229')

    async def execute_task(self, task_definition):
        """Execute task using Claude API"""

        # Convert universal task format to Claude format
        claude_request = await self.convert_to_claude_format(task_definition)

        # Handle different task types
        if task_definition['type'] == 'code_analysis':
            return await self.execute_code_analysis(claude_request)
        elif task_definition['type'] == 'code_generation':
            return await self.execute_code_generation(claude_request)
        elif task_definition['type'] == 'reasoning':
            return await self.execute_reasoning_task(claude_request)
        elif task_definition['type'] == 'tool_use':
            return await self.execute_tool_use_task(claude_request)
        else:
            return await self.execute_general_task(claude_request)

    async def execute_tool_use_task(self, claude_request):
        """Execute task with Claude tool use capabilities"""

        # Define available tools for Claude
        tools = [
            {
                "name": "code_analyzer",
                "description": "Analyze code structure and patterns",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "code": {"type": "string"},
                        "language": {"type": "string"},
                        "analysis_type": {"type": "string"}
                    }
                }
            },
            {
                "name": "file_navigator",
                "description": "Navigate and understand file structures",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "path": {"type": "string"},
                        "operation": {"type": "string"}
                    }
                }
            }
        ]

        response = await self.client.messages.create(
            model=self.model,
            max_tokens=4000,
            tools=tools,
            messages=claude_request['messages']
        )

        # Handle tool use responses
        if response.stop_reason == "tool_use":
            tool_results = []
            for tool_use in response.content:
                if tool_use.type == "tool_use":
                    tool_result = await self.execute_tool(tool_use.name, tool_use.input)
                    tool_results.append(tool_result)

            # Continue conversation with tool results
            follow_up_response = await self.client.messages.create(
                model=self.model,
                max_tokens=4000,
                messages=[
                    *claude_request['messages'],
                    {"role": "assistant", "content": response.content},
                    {"role": "user", "content": [{"type": "tool_result", "tool_use_id": tool_use.id, "content": str(result)} for tool_use, result in zip(response.content, tool_results)]}
                ]
            )

            return {
                'response': follow_up_response.content[0].text,
                'tool_uses': tool_results,
                'tokens_used': response.usage.input_tokens + response.usage.output_tokens + follow_up_response.usage.input_tokens + follow_up_response.usage.output_tokens
            }

        return {
            'response': response.content[0].text,
            'tokens_used': response.usage.input_tokens + response.usage.output_tokens
        }

class GPTAdapter:
    """
    Adapter for OpenAI GPT API integration
    """

    def __init__(self, config):
        self.config = config
        self.client = openai.OpenAI(api_key=config['api_key'])
        self.model = config.get('model', 'gpt-4-turbo-preview')

    async def execute_task(self, task_definition):
        """Execute task using OpenAI GPT API"""

        # Convert universal task format to OpenAI format
        openai_request = await self.convert_to_openai_format(task_definition)

        # Handle function calling for tool use
        if task_definition['type'] == 'tool_use':
            return await self.execute_function_calling_task(openai_request)
        else:
            return await self.execute_chat_completion(openai_request)

    async def execute_function_calling_task(self, openai_request):
        """Execute task with OpenAI function calling"""

        functions = [
            {
                "name": "analyze_code",
                "description": "Analyze code structure and identify patterns",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "code": {"type": "string", "description": "The code to analyze"},
                        "language": {"type": "string", "description": "Programming language"},
                        "focus": {"type": "string", "description": "Analysis focus area"}
                    },
                    "required": ["code", "language"]
                }
            }
        ]

        response = await self.client.chat.completions.create(
            model=self.model,
            messages=openai_request['messages'],
            functions=functions,
            function_call="auto"
        )

        # Handle function calls
        if response.choices[0].message.function_call:
            function_name = response.choices[0].message.function_call.name
            function_args = json.loads(response.choices[0].message.function_call.arguments)

            function_result = await self.execute_function(function_name, function_args)

            # Continue conversation with function result
            follow_up_response = await self.client.chat.completions.create(
                model=self.model,
                messages=[
                    *openai_request['messages'],
                    {
                        "role": "assistant",
                        "content": None,
                        "function_call": response.choices[0].message.function_call
                    },
                    {
                        "role": "function",
                        "name": function_name,
                        "content": str(function_result)
                    }
                ]
            )

            return {
                'response': follow_up_response.choices[0].message.content,
                'function_calls': [{
                    'name': function_name,
                    'arguments': function_args,
                    'result': function_result
                }],
                'tokens_used': response.usage.total_tokens + follow_up_response.usage.total_tokens
            }

        return {
            'response': response.choices[0].message.content,
            'tokens_used': response.usage.total_tokens
        }
```

### Cost Optimization Engine

#### Intelligent Cost Management
```python
class CostOptimizationEngine:
    """
    Intelligent cost optimization for multi-LLM usage
    """

    def __init__(self):
        self.cost_models = {}
        self.usage_history = []
        self.budget_limits = {}
        self.cost_alerts = []

    async def optimize_llm_selection(self, task_requirements, available_providers):
        """
        Select LLM based on cost efficiency while maintaining quality
        """
        optimization_analysis = {
            'task_requirements': task_requirements,
            'cost_analysis': {},
            'quality_predictions': {},
            'optimization_recommendation': {}
        }

        # Estimate costs for each provider
        for provider in available_providers:
            cost_estimate = await self.estimate_task_cost(task_requirements, provider)
            quality_prediction = await self.predict_task_quality(task_requirements, provider)

            optimization_analysis['cost_analysis'][provider] = cost_estimate
            optimization_analysis['quality_predictions'][provider] = quality_prediction

        # Calculate cost-quality efficiency
        efficiency_scores = {}
        for provider in available_providers:
            cost = optimization_analysis['cost_analysis'][provider]['estimated_cost']
            quality = optimization_analysis['quality_predictions'][provider]['quality_score']

            # Higher quality per dollar is better
            efficiency_scores[provider] = quality / cost if cost > 0 else 0

        # Select most efficient provider
        optimal_provider = max(efficiency_scores.items(), key=lambda x: x[1])

        optimization_analysis['optimization_recommendation'] = {
            'recommended_provider': optimal_provider[0],
            'efficiency_score': optimal_provider[1],
            'cost_savings': calculate_cost_savings(optimization_analysis),
            'quality_impact': assess_quality_impact(optimization_analysis, optimal_provider[0])
        }

        return optimization_analysis

    async def monitor_budget_usage(self):
        """
        Monitor and alert on budget usage across all LLM providers
        """
        budget_status = {}

        for provider, budget_limit in self.budget_limits.items():
            current_usage = await self.calculate_current_usage(provider)

            budget_status[provider] = {
                'budget_limit': budget_limit,
                'current_usage': current_usage,
                'remaining_budget': budget_limit - current_usage,
                'usage_percentage': (current_usage / budget_limit) * 100,
                'projected_monthly_usage': await self.project_monthly_usage(provider)
            }

            # Generate alerts for high usage
            if budget_status[provider]['usage_percentage'] > 80:
                alert = {
                    'provider': provider,
                    'alert_type': 'budget_warning',
                    'usage_percentage': budget_status[provider]['usage_percentage'],
                    'projected_overage': budget_status[provider]['projected_monthly_usage'] - budget_limit,
                    'recommended_actions': await self.generate_cost_reduction_recommendations(provider)
                }
                self.cost_alerts.append(alert)

        return {
            'budget_status': budget_status,
            'alerts': self.cost_alerts,
            'optimization_recommendations': await self.generate_optimization_recommendations(budget_status)
        }
```

### LLM Integration Commands

```bash
# LLM provider management
bmad llm register --provider "anthropic" --model "claude-3-sonnet" --api-key "sk-..."
bmad llm register --provider "openai" --model "gpt-4-turbo" --api-key "sk-..."
bmad llm register --provider "google" --model "gemini-pro" --credentials "path/to/creds.json"

# LLM capability testing and optimization
bmad llm test-capabilities --provider "all" --benchmark-performance
bmad llm optimize --cost-efficiency --quality-threshold "0.8"
bmad llm route --task "code-generation" --show-reasoning

# Cost management and monitoring
bmad llm costs --analyze --time-period "last-month"
bmad llm budget --set-limit "anthropic:1000" "openai:500"
bmad llm optimize-costs --aggressive --maintain-quality

# LLM performance monitoring
bmad llm monitor --real-time --performance-alerts
bmad llm benchmark --compare-providers --task-types "code,reasoning,analysis"
bmad llm health --check-all-providers --connectivity-test
```

This Universal LLM Interface creates a truly provider-agnostic system that can intelligently route tasks to the optimal LLM while optimizing for cost, performance, and quality. The system learns from usage patterns to continuously improve routing decisions and cost efficiency.