blog-banner

Production-Ready Multi-Model Gateway: Choosing Between AWS Bedrock and LiteLLM

When building production applications that require multiple AI models, you face a critical architectural decision: how to manage model routing, fallbacks, and provider abstraction without creating technical debt.
This guide examines two approaches—AWS Bedrock's fully managed solution and LiteLLM's open-source gateway—and provides a complete production deployment for LiteLLM when it's the right choice.

The Multi-Model Challenge

Your application needs different models for different tasks: Claude Sonnet for complex reasoning, GPT-4 for specific capabilities, Haiku for high-throughput simple tasks. Managing multiple provider SDKs creates several problems:

Different request formats across providers:

# OpenAI

response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)

# Anthropic (different structure)
response = anthropic.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)

# AWS Bedrock (entirely different SDK)
response = bedrock.invoke_model(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello"}]
})
)

Operational complexity: Three billing systems, three sets of credentials, inconsistent error handling, and code changes required to switch models.

Cost visibility: Without unified tracking, understanding which models drive costs becomes difficult.

AWS Bedrock: The Fully Managed Approach

AWS Bedrock should be your default choice for multi-model AI deployments. It provides native AWS integration with several advantages:

Bedrock Strengths

Fully managed infrastructure: No containers, no database maintenance, no gateway upgrades. AWS handles availability, scaling, and security patches.

Available models: Claude (Anthropic), Llama (Meta), Mistral, Cohere, Titan (Amazon), Jurassic (AI21), and others—all accessible through a single API.

Unified billing and cost allocation: Native AWS Cost Explorer integration, resource tagging for cost attribution, and consolidated billing across all models.

Security and compliance: VPC endpoints for private connectivity, AWS IAM for access control, encryption at rest and in transit, and compliance certifications (SOC, HIPAA, GDPR).

Built-in capabilities: Model evaluation tools, knowledge bases, agents framework, and guardrails for content filtering.

Bedrock Limitations

Model availability: Not all providers are available. If you need models outside Bedrock's catalog (newer GPT versions, specialized models, or providers not yet integrated), Bedrock isn't an option.

Cost structure: Bedrock pricing includes AWS markup over direct provider pricing. For high-volume applications, this can represent a high additional cost.

Flexibility constraints: Limited control over caching strategies, routing logic, and fallback behavior compared to self-managed solutions.

Regional availability: Some models are only available in specific AWS regions, potentially increasing latency for global deployments.

When to Use Bedrock

• All required models are available in Bedrock
• You prioritize operational simplicity over cost optimization
• AWS-native compliance and security features are requirements
• Your organization standardizes on AWS managed services

LiteLLM: When You Need More Control

LiteLLM is appropriate when:

1. Required models aren't in Bedrock: You need OpenAI GPT-4, specific versions, or providers not yet available in Bedrock
2. Cost optimization is critical: Direct provider pricing without AWS markup matters for high-volume workloads
3. Advanced routing logic: You need sophisticated fallback chains, custom load balancing, or A/B testing between models
4. Caching requirements: Aggressive caching strategies to reduce costs and latency beyond what Bedrock offers

LiteLLM provides a unified OpenAI-compatible API that translates requests to any provider. It's a proven open-source solution (used in production by many organizations) that runs on your infrastructure.

LiteLLM Architecture Overview

[Application] → [ALB] → [ECS Tasks] → [AI Providers]

                          ↓

                    [RDS PostgreSQL] ← Logs, config, analytics

                    [ElastiCache Redis] ← Caching, rate limiting

                    [Secrets Manager] ← API keys

Unified request format:

# Same format for any provider
response = requests.post(
"https://your-gateway.com/v1/chat/completions",
headers={"Authorization": f"Bearer {api_key}"},
json={
"model": "claude-3-sonnet", # or "gpt-4", or "bedrock/claude-sonnet"
"messages": [{"role": "user", "content": "Hello"}]
}
)

Key capabilities:

• Request/response caching to reduce costs
• Automatic fallbacks when primary models fail
• Load balancing across multiple API keys
• Detailed cost tracking and budget alerts
• Usage analytics per user, team, or feature

Infrastructure cost: Approximately $80-150/month for moderate usage (1-5M requests), scaling with traffic.

Production Deployment Guide: LiteLLM on AWS ECS

This deployment is production-ready with proper security, monitoring, and scalability. Budget approximately 60-90 minutes for complete setup, including testing.

Prerequisites

• AWS Account with appropriate IAM permissions
• AWS CLI configured: aws configure
• Docker installed locally
• Domain name (optional, can use ALB DNS)
• API keys for required providers

Step 1: Database Infrastructure (RDS PostgreSQL)

LiteLLM requires a database for configuration, request logs, and analytics.

Create RDS security group:

aws ec2 create-security-group \
--group-name litellm-rds-sg \
--description "LiteLLM RDS security group" \
--vpc-id vpc-12345678

Create the database:

aws rds create-db-instance \
--db-instance-identifier litellm-db \
--db-instance-class db.t3.micro \
--engine postgres \
--engine-version 15.4 \
--master-username litellm \
--master-user-password 'YourSecurePassword123!' \
--allocated-storage 20 \
--vpc-security-group-ids sg-rds12345678 \
--db-subnet-group-name your-db-subnet-group \
--backup-retention-period 7 \
--no-publicly-accessible \
--storage-encrypted

Wait for availability and get the endpoint:

aws rds wait db-instance-available --db-instance-identifier litellm-db

aws rds describe-db-instances \
--db-instance-identifier litellm-db \
--query 'DBInstances[0].Endpoint.Address' \
--output text

Step 2: Caching Layer (ElastiCache Redis)

Redis handles request caching and distributed rate limiting across ECS tasks.

Create Redis security group:

aws ec2 create-security-group \
--group-name litellm-redis-sg \
--description "LiteLLM Redis security group" \
--vpc-id vpc-12345678

Create Redis cluster:

aws elasticache create-cache-cluster \
--cache-cluster-id litellm-redis \
--cache-node-type cache.t3.micro \
--engine redis \
--num-cache-nodes 1 \
--security-group-ids sg-redis12345678 \
--cache-subnet-group-name your-cache-subnet-group

aws elasticache wait cache-cluster-available --cache-cluster-id litellm-redis

Get Redis endpoint:

aws elasticache describe-cache-clusters \
--cache-cluster-id litellm-redis \
--show-cache-node-info \
--query 'CacheClusters[0].CacheNodes[0].Endpoint.Address' \
--output text

Step 3: Secrets Management

Store all sensitive credentials in AWS Secrets Manager.

Store provider API keys:

# OpenAI
aws secretsmanager create-secret \
--name litellm/openai-api-key \
--secret-string "sk-proj-your-openai-key"

# Anthropic
aws secretsmanager create-secret \
--name litellm/anthropic-api-key \
--secret-string "sk-ant-your-anthropic-key"

# Master key for client authentication
aws secretsmanager create-secret \
--name litellm/master-key \
--secret-string "sk-litellm-$(openssl rand -hex 32)"

# Database password
aws secretsmanager create-secret \
--name litellm/db-password \
--secret-string "YourSecurePassword123!"

Step 4: LiteLLM Configuration

Create litellm-config.yaml defining models, routing, and policies:

model_list:
# OpenAI Models (not available in Bedrock)
- model_name: gpt-4
litellm_params:
model: gpt-4
api_key: os.environ/OPENAI_API_KEY

- model_name: gpt-3.5-turbo
litellm_params:
model: gpt-3.5-turbo
api_key: os.environ/OPENAI_API_KEY

# Anthropic Direct (better pricing than Bedrock)
- model_name: claude-3-opus
litellm_params:
model: claude-3-opus-20240229
api_key: os.environ/ANTHROPIC_API_KEY

- model_name: claude-3-sonnet
litellm_params:
model: claude-3-sonnet-20240229
api_key: os.environ/ANTHROPIC_API_KEY

  - model_name: claude-3-sonnet

    litellm_params:

      model: claude-3-sonnet-20240229

      api_key: os.environ/ANTHROPIC_API_KEY

 

  - model_name: claude-3-haiku

    litellm_params:

      model: claude-3-haiku-20240307

      api_key: os.environ/ANTHROPIC_API_KEY

 

  # AWS Bedrock (for models where managed service is preferred)

  - model_name: bedrock-claude-sonnet

    litellm_params:

      model: bedrock/anthropic.claude-3-sonnet-20240229-v1:0

      aws_region_name: us-east-1

 

  - model_name: bedrock-claude-haiku

    litellm_params:

      model: bedrock/anthropic.claude-3-haiku-20240307-v1:0

      aws_region_name: us-east-1

 

# Routing and resilience

litellm_settings:

  # Automatic fallbacks

  fallbacks:

    - model: claude-3-sonnet

      fallback: [gpt-4, claude-3-opus]

    - model: gpt-4

      fallback: [claude-3-opus, bedrock-claude-sonnet]

 

  # Caching configuration

  cache:

    type: redis

    host: ${REDIS_HOST}

    port: 6379

    ttl: 3600  # 1 hour cache

 

  # Rate limiting

  max_requests_per_minute: 100

 

  # Budget controls

  max_budget: 1000  # $1000/month

  budget_duration: 30d

 

# Database and authentication

general_settings:

  database_url: postgresql://litellm:${DB_PASSWORD}@${DB_HOST}:5432/litellm

  master_key: ${LITELLM_MASTER_KEY}

 

  # Logging configuration

  log_requests: true

  log_response_body: false  # Privacy: don't log full responses

This configuration will be mounted into ECS tasks.

Step 5: ECS Infrastructure

Create ECS cluster:

aws ecs create-cluster --cluster-name litellm-cluster

Create ECS security group:

aws ec2 create-security-group \
--group-name litellm-ecs-sg \
--description "LiteLLM ECS tasks" \
--vpc-id vpc-12345678

Create IAM execution role:

Save as ecs-trust-policy.json:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"Service": "ecs-tasks.amazonaws.com"},
"Action": "sts:AssumeRole"
}]
}
Create role and attach policies:
aws iam create-role \
--role-name litellm-ecs-execution-role \
--assume-role-policy-document file://ecs-trust-policy.json

aws iam attach-role-policy \
--role-name litellm-ecs-execution-role \
--policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy

aws iam attach-role-policy \
--role-name litellm-ecs-execution-role \
--policy-arn arn:aws:iam::aws:policy/SecretsManagerReadWrite

Create IAM task role (for Bedrock access):

aws iam create-role \
--role-name litellm-ecs-task-role \
--assume-role-policy-document file://ecs-trust-policy.json

aws iam attach-role-policy \
--role-name litellm-ecs-task-role \
--policy-arn arn:aws:iam::aws:policy/AmazonBedrockFullAccess

Create task definition:

Save as litellm-task-definition.json:
{
"family": "litellm-task",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "512",
"memory": "1024",
"executionRoleArn": "arn:aws:iam::123456789012:role/litellm-ecs-execution-role",
"taskRoleArn": "arn:aws:iam::123456789012:role/litellm-ecs-task-role",
"containerDefinitions": [{
"name": "litellm",
"image": "ghcr.io/berriai/litellm:main-latest",
"portMappings": [{
"containerPort": 4000,
"protocol": "tcp"
}],
"environment": [
{"name": "DB_HOST", "value": "litellm-db.xxxxxxxxx.us-east-1.rds.amazonaws.com"},
{"name": "REDIS_HOST", "value": "litellm-redis.xxxxxx.0001.use1.cache.amazonaws.com"},
{"name": "AWS_REGION_NAME", "value": "us-east-1"}
],
"secrets": [
{"name": "OPENAI_API_KEY", "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:litellm/openai-api-key"},
{"name": "ANTHROPIC_API_KEY", "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:litellm/anthropic-api-key"},
{"name": "LITELLM_MASTER_KEY", "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:litellm/master-key"},
{"name": "DB_PASSWORD", "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:litellm/db-password"}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/litellm",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
},
"command": ["--config", "/app/config.yaml", "--port", "4000"]
}]
}

Register task definition:

aws ecs register-task-definition --cli-input-json file://litellm-task-definition.json

Step 6: Load Balancer Configuration

Create Application Load Balancer:

aws elbv2 create-load-balancer \
--name litellm-alb \
--subnets subnet-12345678 subnet-87654321 \
--security-groups sg-alb12345678 \
--scheme internet-facing \
--type application

Create target group:

aws elbv2 create-target-group \
--name litellm-tg \
--protocol HTTP \
--port 4000 \
--vpc-id vpc-12345678 \
--target-type ip \
--health-check-path /health \
--health-check-interval-seconds 30

Create HTTPS listener (production):

aws elbv2 create-listener \
--load-balancer-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:loadbalancer/app/litellm-alb/xxxxx \
--protocol HTTPS \
--port 443 \
--certificates CertificateArn=arn:aws:acm:us-east-1:123456789012:certificate/xxxxx \
--default-actions Type=forward,TargetGroupArn=arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/litellm-tg/xxxxx

Step 7: Deploy ECS Service

Create the service:

aws ecs create-service \
--cluster litellm-cluster \
--service-name litellm-service \
--task-definition litellm-task \
--desired-count 2 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[subnet-12345678,subnet-87654321],securityGroups=[sg-ecs12345678],assignPublicIp=DISABLED}" \
--load-balancers targetGroupArn=arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/litellm-tg/xxxxx,containerName=litellm,containerPort=4000

Wait for stability:

aws ecs wait services-stable --cluster litellm-cluster --services litellm-service

Get ALB endpoint:

aws elbv2 describe-load-balancers \
--names litellm-alb \
--query 'LoadBalancers[0].DNSName' \
--output text

Step 8: Configure Auto-Scaling

Production deployments need auto-scaling to handle traffic variability:

# Register scalable target
aws application-autoscaling register-scalable-target \
--service-namespace ecs \
--resource-id service/litellm-cluster/litellm-service \
--scalable-dimension ecs:service:DesiredCount \
--min-capacity 2 \
--max-capacity 10

# CPU-based scaling policy
aws application-autoscaling put-scaling-policy \
--policy-name litellm-cpu-scaling \
--service-namespace ecs \
--resource-id service/litellm-cluster/litellm-service \
--scalable-dimension ecs:service:DesiredCount \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ECSServiceAverageCPUUtilization"
},
"ScaleInCooldown": 300,
"ScaleOutCooldown": 60
}'

Production Testing

Health check:

curl https://your-gateway.com/health
# Expected: {"status": "healthy"}

Model invocation:

curl -X POST https://your-gateway.com/v1/chat/completions \
-H "Authorization: Bearer sk-litellm-your-master-key" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-3-haiku",
"messages": [{"role": "user", "content": "Say hello in 5 words"}]
}'

Fallback behavior:

curl -X POST https://your-gateway.com/v1/chat/completions \
-H "Authorization: Bearer sk-litellm-your-master-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Hello"}],
"fallbacks": ["claude-3-sonnet", "bedrock-claude-sonnet"]
}'

Production Operations

Monitoring and Observability

CloudWatch metrics to track:

# ECS CPU utilization
aws cloudwatch get-metric-statistics \
--namespace AWS/ECS \
--metric-name CPUUtilization \
--dimensions Name=ServiceName,Value=litellm-service \
--start-time 2024-01-01T00:00:00Z \
--end-time 2024-01-01T23:59:59Z \
--period 3600 \
--statistics Average

Cost analysis from PostgreSQL:

-- Cost per model (last 7 days)
SELECT
model,
SUM(cost) as total_cost,
COUNT(*) as request_count,
AVG(cost) as avg_cost_per_request
FROM request_logs
WHERE timestamp > NOW() - INTERVAL '7 days'
GROUP BY model
ORDER BY total_cost DESC;

-- Error rates by provider
SELECT
model,
status_code,
COUNT(*) as error_count
FROM request_logs
WHERE status_code >= 400
AND timestamp > NOW() - INTERVAL '24 hours'
GROUP BY model, status_code
ORDER BY error_count DESC;

Common Production Issues

Issue: High first-request latency

Cause: ECS cold starts
Solution: Maintain minimum 2 tasks, enable ALB connection draining

Issue: Rate limiting despite fallbacks

Cause: All requests try primary model first
Solution: Implement client-side load balancing based on rate limit status

Issue: Redis cache ineffective

Cause: Prompt variations preventing cache hits
Solution: Normalize prompts (trim whitespace, normalize formatting) before caching

Issue: Database storage growth

Cause: Logging full request/response bodies
Solution: Set log_response_body: false, implement PostgreSQL log rotation

Decision Framework: Bedrock vs LiteLLM

Use this framework to choose the right approach:

Choose AWS Bedrock When:

• All required models are available in Bedrock
• Operational simplicity is the priority
• AWS-native compliance features are required
• Organization standardizes on AWS managed services
• Team has limited DevOps resources

Choose LiteLLM When:

• Required models aren't available in Bedrock (OpenAI GPT-4, specific providers)
• Direct provider pricing matters for cost optimization
• Advanced routing logic is needed (complex fallbacks, A/B testing)
• Aggressive caching requirements to reduce costs
• Need detailed analytics and cost tracking beyond AWS Cost Explorer

Hybrid Approach:

Many production systems use both:

Primary: AWS Bedrock for available models (managed operations)
Secondary: LiteLLM for models not in Bedrock (flexibility)
Routing: Application-level logic determines which gateway to use

Conclusion

AWS Bedrock should be your default choice for multi-model deployments when it meets your requirements. It eliminates operational complexity and provides native AWS integration.

LiteLLM becomes the right choice when you need models outside Bedrock's catalog, require cost optimization at scale, or need advanced routing capabilities. The deployment outlined here is production-ready with proper security, monitoring, and scalability.

Both approaches solve the same core problem—abstracting away provider-specific APIs—but make different trade-offs between operational simplicity and flexibility. Choose based on your specific requirements, and remember that hybrid approaches combining both solutions are often the most pragmatic for production systems.