573 lines
23 KiB
Markdown
573 lines
23 KiB
Markdown
# Universal LLM Interface
|
|
|
|
## Multi-Provider LLM Abstraction for Enhanced BMAD System
|
|
|
|
The Universal LLM Interface provides seamless integration with multiple LLM providers, enabling the BMAD system to work with Claude, GPT, Gemini, DeepSeek, Llama, and any future LLM while optimizing for cost, capability, and performance.
|
|
|
|
### LLM Abstraction Architecture
|
|
|
|
#### Universal LLM Provider Framework
|
|
```yaml
|
|
llm_provider_architecture:
|
|
core_abstraction:
|
|
universal_interface:
|
|
- standardized_request_format: "Common interface for all LLM interactions"
|
|
- response_normalization: "Unified response structure across providers"
|
|
- capability_detection: "Automatic detection of LLM-specific capabilities"
|
|
- error_handling: "Graceful degradation and fallback mechanisms"
|
|
- cost_tracking: "Real-time cost monitoring and optimization"
|
|
|
|
provider_adapters:
|
|
anthropic_claude:
|
|
- api_integration: "Native Claude API integration"
|
|
- tool_use_support: "Advanced tool use capabilities"
|
|
- function_calling: "Native function calling support"
|
|
- streaming_support: "Real-time streaming responses"
|
|
- context_windows: "Large context window optimization"
|
|
|
|
openai_gpt:
|
|
- gpt4_integration: "GPT-4 and GPT-4 Turbo support"
|
|
- function_calling: "OpenAI function calling format"
|
|
- vision_capabilities: "GPT-4 Vision integration"
|
|
- code_interpreter: "Code execution capabilities"
|
|
- assistant_api: "OpenAI Assistant API integration"
|
|
|
|
google_gemini:
|
|
- gemini_pro_integration: "Gemini Pro and Ultra support"
|
|
- multimodal_capabilities: "Text, image, and video processing"
|
|
- code_execution: "Native code execution environment"
|
|
- safety_filters: "Built-in safety and content filtering"
|
|
- vertex_ai_integration: "Enterprise Vertex AI support"
|
|
|
|
deepseek_coder:
|
|
- code_specialization: "Code-focused LLM optimization"
|
|
- repository_understanding: "Large codebase comprehension"
|
|
- code_generation: "Advanced code generation capabilities"
|
|
- technical_reasoning: "Deep technical problem solving"
|
|
|
|
meta_llama:
|
|
- open_source_integration: "Llama 2 and Code Llama support"
|
|
- local_deployment: "On-premises deployment support"
|
|
- fine_tuning: "Custom model fine-tuning capabilities"
|
|
- privacy_preservation: "Complete data privacy control"
|
|
|
|
custom_providers:
|
|
- plugin_architecture: "Support for custom LLM providers"
|
|
- api_adaptation: "Automatic API format adaptation"
|
|
- capability_mapping: "Custom capability definition"
|
|
- performance_monitoring: "Custom provider performance tracking"
|
|
```
|
|
|
|
#### LLM Capability Detection and Routing
|
|
```python
|
|
async def detect_llm_capabilities(provider_name, model_name):
|
|
"""
|
|
Automatically detect and catalog LLM capabilities for intelligent routing
|
|
"""
|
|
capability_detection = {
|
|
'provider': provider_name,
|
|
'model': model_name,
|
|
'capabilities': {},
|
|
'performance_metrics': {},
|
|
'cost_metrics': {},
|
|
'limitations': {}
|
|
}
|
|
|
|
# Test core capabilities
|
|
core_capabilities = await test_core_llm_capabilities(provider_name, model_name)
|
|
|
|
# Test specialized capabilities
|
|
specialized_capabilities = {
|
|
'code_generation': await test_code_generation_capability(provider_name, model_name),
|
|
'code_analysis': await test_code_analysis_capability(provider_name, model_name),
|
|
'function_calling': await test_function_calling_capability(provider_name, model_name),
|
|
'tool_use': await test_tool_use_capability(provider_name, model_name),
|
|
'multimodal': await test_multimodal_capability(provider_name, model_name),
|
|
'reasoning': await test_reasoning_capability(provider_name, model_name),
|
|
'context_handling': await test_context_handling_capability(provider_name, model_name),
|
|
'streaming': await test_streaming_capability(provider_name, model_name)
|
|
}
|
|
|
|
# Performance benchmarking
|
|
performance_metrics = await benchmark_llm_performance(provider_name, model_name)
|
|
|
|
# Cost analysis
|
|
cost_metrics = await analyze_llm_costs(provider_name, model_name)
|
|
|
|
capability_detection.update({
|
|
'capabilities': {**core_capabilities, **specialized_capabilities},
|
|
'performance_metrics': performance_metrics,
|
|
'cost_metrics': cost_metrics,
|
|
'detection_timestamp': datetime.utcnow().isoformat(),
|
|
'confidence_score': calculate_capability_confidence(core_capabilities, specialized_capabilities)
|
|
})
|
|
|
|
return capability_detection
|
|
|
|
async def intelligent_llm_routing(task_requirements, available_providers):
|
|
"""
|
|
Intelligently route tasks to optimal LLM based on capabilities, cost, and performance
|
|
"""
|
|
routing_analysis = {
|
|
'task_requirements': task_requirements,
|
|
'candidate_providers': [],
|
|
'routing_decision': {},
|
|
'fallback_options': [],
|
|
'cost_optimization': {}
|
|
}
|
|
|
|
# Analyze task requirements
|
|
task_analysis = await analyze_task_requirements(task_requirements)
|
|
|
|
# Score each available provider
|
|
for provider in available_providers:
|
|
provider_score = await score_provider_for_task(provider, task_analysis)
|
|
|
|
routing_candidate = {
|
|
'provider': provider,
|
|
'capability_match': provider_score['capability_match'],
|
|
'performance_score': provider_score['performance_score'],
|
|
'cost_efficiency': provider_score['cost_efficiency'],
|
|
'reliability_score': provider_score['reliability_score'],
|
|
'overall_score': calculate_overall_provider_score(provider_score)
|
|
}
|
|
|
|
routing_analysis['candidate_providers'].append(routing_candidate)
|
|
|
|
# Select optimal provider
|
|
optimal_provider = select_optimal_provider(routing_analysis['candidate_providers'])
|
|
|
|
# Define fallback strategy
|
|
fallback_providers = define_fallback_strategy(
|
|
routing_analysis['candidate_providers'],
|
|
optimal_provider
|
|
)
|
|
|
|
routing_analysis.update({
|
|
'routing_decision': optimal_provider,
|
|
'fallback_options': fallback_providers,
|
|
'cost_optimization': calculate_cost_optimization(optimal_provider, task_analysis)
|
|
})
|
|
|
|
return routing_analysis
|
|
|
|
class UniversalLLMInterface:
|
|
"""
|
|
Universal interface for interacting with multiple LLM providers
|
|
"""
|
|
|
|
def __init__(self):
|
|
self.providers = {}
|
|
self.capability_cache = {}
|
|
self.cost_tracker = CostTracker()
|
|
self.performance_monitor = PerformanceMonitor()
|
|
|
|
async def register_provider(self, provider_name, provider_config):
|
|
"""Register a new LLM provider"""
|
|
provider_adapter = await create_provider_adapter(provider_name, provider_config)
|
|
|
|
# Test provider connectivity
|
|
connectivity_test = await test_provider_connectivity(provider_adapter)
|
|
|
|
if connectivity_test.success:
|
|
self.providers[provider_name] = provider_adapter
|
|
|
|
# Detect and cache capabilities
|
|
capabilities = await detect_llm_capabilities(
|
|
provider_name,
|
|
provider_config.get('model', 'default')
|
|
)
|
|
self.capability_cache[provider_name] = capabilities
|
|
|
|
return {
|
|
'registration_status': 'success',
|
|
'provider': provider_name,
|
|
'capabilities': capabilities,
|
|
'ready_for_use': True
|
|
}
|
|
else:
|
|
return {
|
|
'registration_status': 'failed',
|
|
'provider': provider_name,
|
|
'error': connectivity_test.error,
|
|
'ready_for_use': False
|
|
}
|
|
|
|
async def execute_task(self, task_definition, routing_preferences=None):
|
|
"""
|
|
Execute a task using the optimal LLM provider
|
|
"""
|
|
# Determine optimal provider
|
|
routing_decision = await intelligent_llm_routing(
|
|
task_definition,
|
|
list(self.providers.keys())
|
|
)
|
|
|
|
optimal_provider = routing_decision['routing_decision']['provider']
|
|
|
|
# Execute task with monitoring
|
|
execution_start = datetime.utcnow()
|
|
|
|
try:
|
|
# Execute with primary provider
|
|
result = await self.providers[optimal_provider].execute_task(task_definition)
|
|
|
|
execution_duration = (datetime.utcnow() - execution_start).total_seconds()
|
|
|
|
# Track performance and costs
|
|
await self.performance_monitor.record_execution(
|
|
optimal_provider,
|
|
task_definition,
|
|
result,
|
|
execution_duration
|
|
)
|
|
|
|
await self.cost_tracker.record_usage(
|
|
optimal_provider,
|
|
task_definition,
|
|
result
|
|
)
|
|
|
|
return {
|
|
'result': result,
|
|
'provider_used': optimal_provider,
|
|
'execution_time': execution_duration,
|
|
'routing_analysis': routing_decision,
|
|
'status': 'success'
|
|
}
|
|
|
|
except Exception as e:
|
|
# Try fallback providers
|
|
for fallback_provider in routing_decision['fallback_options']:
|
|
try:
|
|
fallback_result = await self.providers[fallback_provider['provider']].execute_task(
|
|
task_definition
|
|
)
|
|
|
|
execution_duration = (datetime.utcnow() - execution_start).total_seconds()
|
|
|
|
return {
|
|
'result': fallback_result,
|
|
'provider_used': fallback_provider['provider'],
|
|
'execution_time': execution_duration,
|
|
'primary_provider_failed': optimal_provider,
|
|
'fallback_used': True,
|
|
'status': 'success_with_fallback'
|
|
}
|
|
|
|
except Exception as fallback_error:
|
|
continue
|
|
|
|
# All providers failed
|
|
return {
|
|
'status': 'failed',
|
|
'primary_provider': optimal_provider,
|
|
'primary_error': str(e),
|
|
'fallback_attempts': len(routing_decision['fallback_options']),
|
|
'execution_time': (datetime.utcnow() - execution_start).total_seconds()
|
|
}
|
|
```
|
|
|
|
### Provider-Specific Adapters
|
|
|
|
#### Claude Adapter Implementation
|
|
```python
|
|
class ClaudeAdapter:
|
|
"""
|
|
Adapter for Anthropic Claude API integration
|
|
"""
|
|
|
|
def __init__(self, config):
|
|
self.config = config
|
|
self.client = anthropic.Anthropic(api_key=config['api_key'])
|
|
self.model = config.get('model', 'claude-3-sonnet-20240229')
|
|
|
|
async def execute_task(self, task_definition):
|
|
"""Execute task using Claude API"""
|
|
|
|
# Convert universal task format to Claude format
|
|
claude_request = await self.convert_to_claude_format(task_definition)
|
|
|
|
# Handle different task types
|
|
if task_definition['type'] == 'code_analysis':
|
|
return await self.execute_code_analysis(claude_request)
|
|
elif task_definition['type'] == 'code_generation':
|
|
return await self.execute_code_generation(claude_request)
|
|
elif task_definition['type'] == 'reasoning':
|
|
return await self.execute_reasoning_task(claude_request)
|
|
elif task_definition['type'] == 'tool_use':
|
|
return await self.execute_tool_use_task(claude_request)
|
|
else:
|
|
return await self.execute_general_task(claude_request)
|
|
|
|
async def execute_tool_use_task(self, claude_request):
|
|
"""Execute task with Claude tool use capabilities"""
|
|
|
|
# Define available tools for Claude
|
|
tools = [
|
|
{
|
|
"name": "code_analyzer",
|
|
"description": "Analyze code structure and patterns",
|
|
"input_schema": {
|
|
"type": "object",
|
|
"properties": {
|
|
"code": {"type": "string"},
|
|
"language": {"type": "string"},
|
|
"analysis_type": {"type": "string"}
|
|
}
|
|
}
|
|
},
|
|
{
|
|
"name": "file_navigator",
|
|
"description": "Navigate and understand file structures",
|
|
"input_schema": {
|
|
"type": "object",
|
|
"properties": {
|
|
"path": {"type": "string"},
|
|
"operation": {"type": "string"}
|
|
}
|
|
}
|
|
}
|
|
]
|
|
|
|
response = await self.client.messages.create(
|
|
model=self.model,
|
|
max_tokens=4000,
|
|
tools=tools,
|
|
messages=claude_request['messages']
|
|
)
|
|
|
|
# Handle tool use responses
|
|
if response.stop_reason == "tool_use":
|
|
tool_results = []
|
|
for tool_use in response.content:
|
|
if tool_use.type == "tool_use":
|
|
tool_result = await self.execute_tool(tool_use.name, tool_use.input)
|
|
tool_results.append(tool_result)
|
|
|
|
# Continue conversation with tool results
|
|
follow_up_response = await self.client.messages.create(
|
|
model=self.model,
|
|
max_tokens=4000,
|
|
messages=[
|
|
*claude_request['messages'],
|
|
{"role": "assistant", "content": response.content},
|
|
{"role": "user", "content": [{"type": "tool_result", "tool_use_id": tool_use.id, "content": str(result)} for tool_use, result in zip(response.content, tool_results)]}
|
|
]
|
|
)
|
|
|
|
return {
|
|
'response': follow_up_response.content[0].text,
|
|
'tool_uses': tool_results,
|
|
'tokens_used': response.usage.input_tokens + response.usage.output_tokens + follow_up_response.usage.input_tokens + follow_up_response.usage.output_tokens
|
|
}
|
|
|
|
return {
|
|
'response': response.content[0].text,
|
|
'tokens_used': response.usage.input_tokens + response.usage.output_tokens
|
|
}
|
|
|
|
class GPTAdapter:
|
|
"""
|
|
Adapter for OpenAI GPT API integration
|
|
"""
|
|
|
|
def __init__(self, config):
|
|
self.config = config
|
|
self.client = openai.OpenAI(api_key=config['api_key'])
|
|
self.model = config.get('model', 'gpt-4-turbo-preview')
|
|
|
|
async def execute_task(self, task_definition):
|
|
"""Execute task using OpenAI GPT API"""
|
|
|
|
# Convert universal task format to OpenAI format
|
|
openai_request = await self.convert_to_openai_format(task_definition)
|
|
|
|
# Handle function calling for tool use
|
|
if task_definition['type'] == 'tool_use':
|
|
return await self.execute_function_calling_task(openai_request)
|
|
else:
|
|
return await self.execute_chat_completion(openai_request)
|
|
|
|
async def execute_function_calling_task(self, openai_request):
|
|
"""Execute task with OpenAI function calling"""
|
|
|
|
functions = [
|
|
{
|
|
"name": "analyze_code",
|
|
"description": "Analyze code structure and identify patterns",
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {
|
|
"code": {"type": "string", "description": "The code to analyze"},
|
|
"language": {"type": "string", "description": "Programming language"},
|
|
"focus": {"type": "string", "description": "Analysis focus area"}
|
|
},
|
|
"required": ["code", "language"]
|
|
}
|
|
}
|
|
]
|
|
|
|
response = await self.client.chat.completions.create(
|
|
model=self.model,
|
|
messages=openai_request['messages'],
|
|
functions=functions,
|
|
function_call="auto"
|
|
)
|
|
|
|
# Handle function calls
|
|
if response.choices[0].message.function_call:
|
|
function_name = response.choices[0].message.function_call.name
|
|
function_args = json.loads(response.choices[0].message.function_call.arguments)
|
|
|
|
function_result = await self.execute_function(function_name, function_args)
|
|
|
|
# Continue conversation with function result
|
|
follow_up_response = await self.client.chat.completions.create(
|
|
model=self.model,
|
|
messages=[
|
|
*openai_request['messages'],
|
|
{
|
|
"role": "assistant",
|
|
"content": None,
|
|
"function_call": response.choices[0].message.function_call
|
|
},
|
|
{
|
|
"role": "function",
|
|
"name": function_name,
|
|
"content": str(function_result)
|
|
}
|
|
]
|
|
)
|
|
|
|
return {
|
|
'response': follow_up_response.choices[0].message.content,
|
|
'function_calls': [{
|
|
'name': function_name,
|
|
'arguments': function_args,
|
|
'result': function_result
|
|
}],
|
|
'tokens_used': response.usage.total_tokens + follow_up_response.usage.total_tokens
|
|
}
|
|
|
|
return {
|
|
'response': response.choices[0].message.content,
|
|
'tokens_used': response.usage.total_tokens
|
|
}
|
|
```
|
|
|
|
### Cost Optimization Engine
|
|
|
|
#### Intelligent Cost Management
|
|
```python
|
|
class CostOptimizationEngine:
|
|
"""
|
|
Intelligent cost optimization for multi-LLM usage
|
|
"""
|
|
|
|
def __init__(self):
|
|
self.cost_models = {}
|
|
self.usage_history = []
|
|
self.budget_limits = {}
|
|
self.cost_alerts = []
|
|
|
|
async def optimize_llm_selection(self, task_requirements, available_providers):
|
|
"""
|
|
Select LLM based on cost efficiency while maintaining quality
|
|
"""
|
|
optimization_analysis = {
|
|
'task_requirements': task_requirements,
|
|
'cost_analysis': {},
|
|
'quality_predictions': {},
|
|
'optimization_recommendation': {}
|
|
}
|
|
|
|
# Estimate costs for each provider
|
|
for provider in available_providers:
|
|
cost_estimate = await self.estimate_task_cost(task_requirements, provider)
|
|
quality_prediction = await self.predict_task_quality(task_requirements, provider)
|
|
|
|
optimization_analysis['cost_analysis'][provider] = cost_estimate
|
|
optimization_analysis['quality_predictions'][provider] = quality_prediction
|
|
|
|
# Calculate cost-quality efficiency
|
|
efficiency_scores = {}
|
|
for provider in available_providers:
|
|
cost = optimization_analysis['cost_analysis'][provider]['estimated_cost']
|
|
quality = optimization_analysis['quality_predictions'][provider]['quality_score']
|
|
|
|
# Higher quality per dollar is better
|
|
efficiency_scores[provider] = quality / cost if cost > 0 else 0
|
|
|
|
# Select most efficient provider
|
|
optimal_provider = max(efficiency_scores.items(), key=lambda x: x[1])
|
|
|
|
optimization_analysis['optimization_recommendation'] = {
|
|
'recommended_provider': optimal_provider[0],
|
|
'efficiency_score': optimal_provider[1],
|
|
'cost_savings': calculate_cost_savings(optimization_analysis),
|
|
'quality_impact': assess_quality_impact(optimization_analysis, optimal_provider[0])
|
|
}
|
|
|
|
return optimization_analysis
|
|
|
|
async def monitor_budget_usage(self):
|
|
"""
|
|
Monitor and alert on budget usage across all LLM providers
|
|
"""
|
|
budget_status = {}
|
|
|
|
for provider, budget_limit in self.budget_limits.items():
|
|
current_usage = await self.calculate_current_usage(provider)
|
|
|
|
budget_status[provider] = {
|
|
'budget_limit': budget_limit,
|
|
'current_usage': current_usage,
|
|
'remaining_budget': budget_limit - current_usage,
|
|
'usage_percentage': (current_usage / budget_limit) * 100,
|
|
'projected_monthly_usage': await self.project_monthly_usage(provider)
|
|
}
|
|
|
|
# Generate alerts for high usage
|
|
if budget_status[provider]['usage_percentage'] > 80:
|
|
alert = {
|
|
'provider': provider,
|
|
'alert_type': 'budget_warning',
|
|
'usage_percentage': budget_status[provider]['usage_percentage'],
|
|
'projected_overage': budget_status[provider]['projected_monthly_usage'] - budget_limit,
|
|
'recommended_actions': await self.generate_cost_reduction_recommendations(provider)
|
|
}
|
|
self.cost_alerts.append(alert)
|
|
|
|
return {
|
|
'budget_status': budget_status,
|
|
'alerts': self.cost_alerts,
|
|
'optimization_recommendations': await self.generate_optimization_recommendations(budget_status)
|
|
}
|
|
```
|
|
|
|
### LLM Integration Commands
|
|
|
|
```bash
|
|
# LLM provider management
|
|
bmad llm register --provider "anthropic" --model "claude-3-sonnet" --api-key "sk-..."
|
|
bmad llm register --provider "openai" --model "gpt-4-turbo" --api-key "sk-..."
|
|
bmad llm register --provider "google" --model "gemini-pro" --credentials "path/to/creds.json"
|
|
|
|
# LLM capability testing and optimization
|
|
bmad llm test-capabilities --provider "all" --benchmark-performance
|
|
bmad llm optimize --cost-efficiency --quality-threshold "0.8"
|
|
bmad llm route --task "code-generation" --show-reasoning
|
|
|
|
# Cost management and monitoring
|
|
bmad llm costs --analyze --time-period "last-month"
|
|
bmad llm budget --set-limit "anthropic:1000" "openai:500"
|
|
bmad llm optimize-costs --aggressive --maintain-quality
|
|
|
|
# LLM performance monitoring
|
|
bmad llm monitor --real-time --performance-alerts
|
|
bmad llm benchmark --compare-providers --task-types "code,reasoning,analysis"
|
|
bmad llm health --check-all-providers --connectivity-test
|
|
```
|
|
|
|
This Universal LLM Interface creates a truly provider-agnostic system that can intelligently route tasks to the optimal LLM while optimizing for cost, performance, and quality. The system learns from usage patterns to continuously improve routing decisions and cost efficiency. |