Logging and Monitoring
Comprehensive guide to observability, logging, and monitoring with Servin Container Runtime.
Monitoring Architecture
Observability Stack
Complete monitoring and observability solution:
Observability Architecture:
┌─────────────────────────────────────────────────────────┐
│ Dashboards & Alerts │
│ (Grafana, Kibana) │
├─────────────────────────────────────────────────────────┤
│ Metrics & Log Storage │
│ (Prometheus, Elasticsearch) │
├─────────────────────────────────────────────────────────┤
│ Collection & Aggregation │
│ (Node Exporter, Fluentd, Beats) │
├─────────────────────────────────────────────────────────┤
│ Servin Runtime │
│ (Containers, Images, Volumes) │
└─────────────────────────────────────────────────────────┘
Monitoring Components
- Metrics Collection: Prometheus, InfluxDB, DataDog
- Log Aggregation: Fluentd, Filebeat, Logstash
- Visualization: Grafana, Kibana, Custom dashboards
- Alerting: AlertManager, PagerDuty, Slack integration
- Tracing: Jaeger, Zipkin, OpenTelemetry
Container Logging
Basic Logging
Access container logs with various options:
# View container logs
servin logs nginx-container
# Follow logs in real-time
servin logs --follow nginx-container
# Show timestamps
servin logs --timestamps nginx-container
# Show last N lines
servin logs --tail 50 nginx-container
# Show logs since specific time
servin logs --since 2024-01-01T00:00:00Z nginx-container
servin logs --since 1h nginx-container
# Show logs until specific time
servin logs --until 2024-01-01T12:00:00Z nginx-container
# Filter logs by timestamp range
servin logs --since 1h --until 30m nginx-container
# Show logs for multiple containers
servin logs web-server db-server cache-server
Advanced Logging Options
Configure detailed logging behavior:
# Show logs with details
servin logs --details nginx-container
# Limit log output size
servin logs --tail 100 --since 1h nginx-container
# Output logs in JSON format
servin logs --format json nginx-container
# Save logs to file
servin logs nginx-container > container-logs.txt
# Continuous log monitoring
servin logs --follow --tail 0 nginx-container | tee -a monitor.log
# Filter logs with grep
servin logs nginx-container | grep ERROR
# Show logs for all containers
servin logs $(servin ps -q)
Log Drivers
Configure different logging drivers:
# JSON file driver (default)
servin run --log-driver json-file nginx:latest
# Syslog driver
servin run --log-driver syslog \
--log-opt syslog-address=udp://logs.company.com:514 \
nginx:latest
# Fluentd driver
servin run --log-driver fluentd \
--log-opt fluentd-address=fluentd.company.com:24224 \
--log-opt tag=nginx.access \
nginx:latest
# Journald driver
servin run --log-driver journald nginx:latest
# Splunk driver
servin run --log-driver splunk \
--log-opt splunk-token=your-token \
--log-opt splunk-url=https://splunk.company.com:8088 \
nginx:latest
# AWS CloudWatch driver
servin run --log-driver awslogs \
--log-opt awslogs-group=myapp \
--log-opt awslogs-region=us-west-2 \
nginx:latest
Log Aggregation
Centralized Logging with Fluentd
Deploy Fluentd for log collection:
# Fluentd configuration
# fluent.conf
<source>
@type forward
port 24224
bind 0.0.0.0
</source>
<filter servin.**>
@type parser
key_name log
<parse>
@type json
</parse>
</filter>
<match servin.**>
@type elasticsearch
host elasticsearch
port 9200
index_name servin-logs
type_name container
</match>
# Deploy Fluentd
servin run -d \
--name fluentd \
-p 24224:24224 \
-v $(pwd)/fluent.conf:/fluentd/etc/fluent.conf \
-v /var/lib/servin/containers:/var/lib/servin/containers:ro \
fluent/fluentd:latest
# Configure containers to use Fluentd
servin run -d \
--name web-app \
--log-driver fluentd \
--log-opt fluentd-address=localhost:24224 \
--log-opt tag=webapp.access \
nginx:latest
ELK Stack Integration
Set up Elasticsearch, Logstash, and Kibana:
# Elasticsearch
servin run -d \
--name elasticsearch \
-p 9200:9200 \
-p 9300:9300 \
-e "discovery.type=single-node" \
-e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
-v es-data:/usr/share/elasticsearch/data \
elasticsearch:7.15.2
# Logstash
# logstash.conf
input {
beats {
port => 5044
}
}
filter {
if [fields][container_name] {
mutate {
add_field => { "container" => "%{[fields][container_name]}" }
}
}
}
output {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "servin-logs-%{+YYYY.MM.dd}"
}
}
servin run -d \
--name logstash \
-p 5044:5044 \
-v $(pwd)/logstash.conf:/usr/share/logstash/pipeline/logstash.conf \
logstash:7.15.2
# Kibana
servin run -d \
--name kibana \
-p 5601:5601 \
-e ELASTICSEARCH_HOSTS=http://elasticsearch:9200 \
kibana:7.15.2
# Filebeat for log shipping
# filebeat.yml
filebeat.inputs:
- type: container
paths:
- '/var/lib/servin/containers/*/*.log'
processors:
- add_docker_metadata:
host: "unix:///var/run/servin.sock"
output.logstash:
hosts: ["logstash:5044"]
servin run -d \
--name filebeat \
--user=root \
-v $(pwd)/filebeat.yml:/usr/share/filebeat/filebeat.yml \
-v /var/lib/servin/containers:/var/lib/servin/containers:ro \
-v /var/run/servin.sock:/var/run/servin.sock:ro \
elastic/filebeat:7.15.2
Metrics Collection
Prometheus Integration
Set up Prometheus for metrics collection:
# Prometheus configuration
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "rules/*.yml"
scrape_configs:
- job_name: 'servin'
static_configs:
- targets: ['localhost:9323']
metrics_path: '/metrics'
scrape_interval: 5s
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
# Deploy Prometheus
servin run -d \
--name prometheus \
-p 9090:9090 \
-v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
-v prometheus-data:/prometheus \
prom/prometheus:latest
# Node Exporter for host metrics
servin run -d \
--name node-exporter \
-p 9100:9100 \
--pid=host \
-v "/:/host:ro,rslave" \
prom/node-exporter:latest \
--path.rootfs=/host
# cAdvisor for container metrics
servin run -d \
--name cadvisor \
-p 8080:8080 \
--privileged \
--device=/dev/kmsg \
-v /:/rootfs:ro \
-v /var/run:/var/run:ro \
-v /sys:/sys:ro \
-v /var/lib/servin/:/var/lib/servin:ro \
-v /dev/disk/:/dev/disk:ro \
gcr.io/cadvisor/cadvisor:latest
Custom Metrics
Expose application metrics:
# Application with Prometheus metrics
# app.py
from prometheus_client import Counter, Histogram, generate_latest
import time
REQUEST_COUNT = Counter('app_requests_total', 'Total requests')
REQUEST_LATENCY = Histogram('app_request_duration_seconds', 'Request latency')
@REQUEST_LATENCY.time()
def process_request():
REQUEST_COUNT.inc()
time.sleep(0.1)
return "OK"
# Metrics endpoint
@app.route('/metrics')
def metrics():
return generate_latest()
# Deploy application with metrics
servin run -d \
--name app-with-metrics \
-p 8000:8000 \
-p 8001:8001 \
myapp:latest
# Scrape application metrics
# Add to prometheus.yml:
- job_name: 'myapp'
static_configs:
- targets: ['app-with-metrics:8001']
Visualization and Dashboards
Grafana Setup
Deploy Grafana for visualization:
# Deploy Grafana
servin run -d \
--name grafana \
-p 3000:3000 \
-e GF_SECURITY_ADMIN_PASSWORD=admin123 \
-v grafana-data:/var/lib/grafana \
grafana/grafana:latest
# Configure Prometheus datasource
curl -X POST http://admin:admin123@localhost:3000/api/datasources \
-H "Content-Type: application/json" \
-d '{
"name": "Prometheus",
"type": "prometheus",
"url": "http://prometheus:9090",
"access": "proxy",
"isDefault": true
}'
Container Dashboard
Create comprehensive container monitoring dashboard:
{
"dashboard": {
"title": "Servin Container Monitoring",
"panels": [
{
"title": "Container CPU Usage",
"type": "graph",
"targets": [
{
"expr": "rate(container_cpu_usage_seconds_total[5m])",
"legendFormat": ""
}
]
},
{
"title": "Container Memory Usage",
"type": "graph",
"targets": [
{
"expr": "container_memory_usage_bytes",
"legendFormat": ""
}
]
},
{
"title": "Container Network I/O",
"type": "graph",
"targets": [
{
"expr": "rate(container_network_receive_bytes_total[5m])",
"legendFormat": " RX"
},
{
"expr": "rate(container_network_transmit_bytes_total[5m])",
"legendFormat": " TX"
}
]
},
{
"title": "Container Disk I/O",
"type": "graph",
"targets": [
{
"expr": "rate(container_fs_reads_bytes_total[5m])",
"legendFormat": " Read"
},
{
"expr": "rate(container_fs_writes_bytes_total[5m])",
"legendFormat": " Write"
}
]
}
]
}
}
Alerting
AlertManager Configuration
Set up alert management:
# alertmanager.yml
global:
smtp_smarthost: 'mail.company.com:587'
smtp_from: 'alerts@company.com'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
email_configs:
- to: 'admin@company.com'
subject: 'Alert: '
body: |
Alert:
Description:
slack_configs:
- api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
channel: '#alerts'
title: 'Servin Alert'
text: ''
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
# Deploy AlertManager
servin run -d \
--name alertmanager \
-p 9093:9093 \
-v $(pwd)/alertmanager.yml:/etc/alertmanager/alertmanager.yml \
prom/alertmanager:latest
Alert Rules
Define alerting rules:
# container-alerts.yml
groups:
- name: container.rules
rules:
- alert: ContainerHighCPU
expr: rate(container_cpu_usage_seconds_total[5m]) * 100 > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Container high CPU usage"
description: "Container CPU usage is above 80%"
- alert: ContainerHighMemory
expr: container_memory_usage_bytes / container_spec_memory_limit_bytes * 100 > 90
for: 5m
labels:
severity: critical
annotations:
summary: "Container high memory usage"
description: "Container memory usage is above 90%"
- alert: ContainerDown
expr: up{job="servin"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Servin daemon is down"
description: "Servin daemon has been down for more than 1 minute"
- alert: ContainerRestarting
expr: increase(container_restart_count[1h]) > 5
for: 1m
labels:
severity: warning
annotations:
summary: "Container restarting frequently"
description: "Container has restarted times in the last hour"
- alert: ContainerVolumeUsage
expr: container_fs_usage_bytes / container_fs_limit_bytes * 100 > 90
for: 5m
labels:
severity: warning
annotations:
summary: "Container volume usage high"
description: "Container volume usage is above 90%"
Health Checks and Monitoring
Container Health Checks
Implement comprehensive health monitoring:
# Container with health check
servin run -d \
--name web-app \
--health-cmd "curl -f http://localhost:8080/health" \
--health-interval 30s \
--health-timeout 10s \
--health-retries 3 \
--health-start-period 60s \
nginx:latest
# Custom health check script
servin run -d \
--name app-with-health \
--health-cmd "/app/health-check.sh" \
--health-interval 30s \
myapp:latest
# Monitor health status
servin inspect app-with-health --format ""
# View health check logs
servin inspect app-with-health --format ""
# List unhealthy containers
servin ps --filter health=unhealthy
Service Discovery
Implement service discovery for monitoring:
# Consul for service discovery
servin run -d \
--name consul \
-p 8500:8500 \
-e CONSUL_BIND_INTERFACE=eth0 \
consul:latest
# Register services with Consul
curl -X PUT http://localhost:8500/v1/agent/service/register \
-d '{
"Name": "web-app",
"Address": "192.168.1.100",
"Port": 8080,
"Check": {
"HTTP": "http://192.168.1.100:8080/health",
"Interval": "30s"
}
}'
# Prometheus with Consul discovery
# prometheus.yml
scrape_configs:
- job_name: 'consul'
consul_sd_configs:
- server: 'consul:8500'
relabel_configs:
- source_labels: [__meta_consul_service]
target_label: job
Performance Monitoring
Resource Monitoring
Monitor system and container resources:
# Real-time container stats
servin stats
# Historical resource usage
servin run --rm \
-v /var/run/servin.sock:/var/run/servin.sock \
monitoring/container-stats:latest
# System resource monitoring
servin run -d \
--name resource-monitor \
--privileged \
-v /proc:/host/proc:ro \
-v /sys:/host/sys:ro \
-v /:/rootfs:ro \
monitoring/system-stats:latest
# Network monitoring
servin run -d \
--name network-monitor \
--net=host \
--cap-add=NET_ADMIN \
monitoring/network-stats:latest
Application Performance Monitoring
Monitor application performance:
# APM with Elastic APM
servin run -d \
--name apm-server \
-p 8200:8200 \
-e output.elasticsearch.hosts=elasticsearch:9200 \
elastic/apm-server:7.15.2
# Application with APM agent
servin run -d \
--name instrumented-app \
-e ELASTIC_APM_SERVER_URL=http://apm-server:8200 \
-e ELASTIC_APM_SERVICE_NAME=myapp \
-e ELASTIC_APM_ENVIRONMENT=production \
myapp:instrumented
# Custom performance metrics
servin run -d \
--name perf-monitor \
-v /var/run/servin.sock:/var/run/servin.sock \
-v performance-data:/data \
monitoring/performance:latest
Distributed Tracing
Jaeger Integration
Set up distributed tracing:
# Jaeger all-in-one
servin run -d \
--name jaeger \
-p 16686:16686 \
-p 14268:14268 \
-p 14250:14250 \
jaegertracing/all-in-one:latest
# Application with tracing
servin run -d \
--name traced-app \
-e JAEGER_ENDPOINT=http://jaeger:14268/api/traces \
-e JAEGER_SERVICE_NAME=myapp \
-e JAEGER_SAMPLER_TYPE=const \
-e JAEGER_SAMPLER_PARAM=1 \
myapp:traced
# OpenTelemetry collector
# otel-config.yml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
exporters:
jaeger:
endpoint: jaeger:14250
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [jaeger]
servin run -d \
--name otel-collector \
-p 4317:4317 \
-p 4318:4318 \
-v $(pwd)/otel-config.yml:/etc/otel-collector-config.yml \
otel/opentelemetry-collector:latest \
--config=/etc/otel-collector-config.yml
Log Analytics
Advanced Log Analysis
Implement sophisticated log analysis:
# Log analysis with ELK
# logstash-advanced.conf
input {
beats {
port => 5044
}
}
filter {
if [fields][container_name] {
grok {
match => {
"message" => "%{COMBINEDAPACHELOG}"
}
}
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
mutate {
convert => { "response" => "integer" }
convert => { "bytes" => "integer" }
}
if [response] >= 400 {
mutate {
add_tag => [ "error" ]
}
}
}
}
output {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "servin-logs-%{+YYYY.MM.dd}"
}
}
# Real-time log analysis
servin run -d \
--name log-analyzer \
-v log-analysis-rules:/etc/rules \
-e ELASTICSEARCH_URL=http://elasticsearch:9200 \
monitoring/log-analyzer:latest
# Anomaly detection
servin run -d \
--name anomaly-detector \
-e ML_MODEL_PATH=/models/anomaly-model.pkl \
-v anomaly-models:/models \
monitoring/anomaly-detector:latest
Automation and Integration
Monitoring Automation
Automate monitoring deployment and management:
#!/bin/bash
# deploy-monitoring.sh
# Deploy monitoring stack
servin-compose -f monitoring-stack.yml up -d
# Wait for services to be ready
sleep 30
# Configure Grafana datasources
curl -X POST http://admin:admin123@localhost:3000/api/datasources \
-H "Content-Type: application/json" \
-d @datasource-config.json
# Import dashboards
for dashboard in dashboards/*.json; do
curl -X POST http://admin:admin123@localhost:3000/api/dashboards/db \
-H "Content-Type: application/json" \
-d @"$dashboard"
done
# Setup alerts
curl -X POST http://localhost:9093/api/v1/alerts \
-H "Content-Type: application/json" \
-d @alert-rules.json
echo "Monitoring stack deployed successfully"
Integration with CI/CD
Integrate monitoring with deployment pipelines:
# .github/workflows/deploy-with-monitoring.yml
name: Deploy with Monitoring
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Deploy application
run: |
servin run -d --name myapp myapp:$
- name: Setup monitoring
run: |
# Add monitoring labels
servin update myapp \
--label monitoring.enabled=true \
--label monitoring.service=myapp \
--label monitoring.version=$
- name: Configure health checks
run: |
servin update myapp \
--health-cmd "curl -f http://localhost:8080/health" \
--health-interval 30s
- name: Register with service discovery
run: |
curl -X PUT http://consul:8500/v1/agent/service/register \
-d @service-definition.json
This comprehensive logging and monitoring guide covers all aspects of observability with Servin, from basic logging to advanced distributed tracing and automated monitoring solutions.