Monitoring Bare Metal Servers: Prometheus, Grafana, and Alerts
Datadog charges $23/host/month for infrastructure monitoring. New Relic starts at $0.30/GB of ingested data. On bare metal, you can run the same monitoring stack — Prometheus, Grafana, and alerting — for $0 extra. Here is how to set it up on a RAW server.
Why Self-Host Monitoring
SaaS monitoring tools are expensive and scale linearly with infrastructure. A 10-server fleet on Datadog costs $230+/mo just for basic metrics. Prometheus and Grafana are open-source, run on your own hardware, and give you unlimited retention, custom dashboards, and full data ownership.
On a RAW server, the entire monitoring stack consumes roughly 200 MB of RAM and negligible CPU. It runs alongside your application at no additional cost.
The Monitoring Stack
- Prometheus — time-series database that scrapes metrics from your services
- Node Exporter — exposes CPU, memory, disk, and network metrics from your server
- Grafana — visualization layer for building dashboards and exploring data
- Alertmanager — routes alerts to Slack, Discord, email, or PagerDuty
Step 1: Install Node Exporter
Node Exporter collects hardware and OS metrics. Install it as a systemd service so it starts automatically on boot.
# Deploy a RAW server
npx rawhq deploy
# SSH in
ssh root@your-server-ip
# Download and install Node Exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
tar xzf node_exporter-1.8.2.linux-amd64.tar.gz
cp node_exporter-1.8.2.linux-amd64/node_exporter /usr/local/bin/
# Create systemd service
cat > /etc/systemd/system/node_exporter.service <<EOF
[Unit]
Description=Node Exporter
After=network.target
[Service]
ExecStart=/usr/local/bin/node_exporter
Restart=always
[Install]
WantedBy=multi-user.target
EOF
systemctl daemon-reload
systemctl enable --now node_exporterNode Exporter now serves metrics on port 9100. Verify with curl localhost:9100/metrics.
Step 2: Install Prometheus
# Download Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.53.0/prometheus-2.53.0.linux-amd64.tar.gz
tar xzf prometheus-2.53.0.linux-amd64.tar.gz
cp prometheus-2.53.0.linux-amd64/prometheus /usr/local/bin/
cp prometheus-2.53.0.linux-amd64/promtool /usr/local/bin/
mkdir -p /etc/prometheus /var/lib/prometheusPrometheus Configuration
Create /etc/prometheus/prometheus.yml with your scrape targets:
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "alerts.yml"
alerting:
alertmanagers:
- static_configs:
- targets: ["localhost:9093"]
scrape_configs:
- job_name: "node"
static_configs:
- targets: ["localhost:9100"]
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]Start Prometheus the same way — create a systemd unit pointing to /usr/local/bin/prometheus with flags --config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus.
Step 3: Alert Rules
Create /etc/prometheus/alerts.yml to define conditions that trigger notifications:
groups:
- name: server
rules:
- alert: HighCpuUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
for: 5m
labels:
severity: warning
annotations:
summary: "CPU usage above 85% for 5 minutes"
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
for: 2m
labels:
severity: critical
annotations:
summary: "Disk space below 10%"
- alert: HighMemoryUsage
expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 90
for: 5m
labels:
severity: warning
annotations:
summary: "Memory usage above 90%"Step 4: Alertmanager with Slack and Discord
Install Alertmanager and configure it to route alerts to your team channels.
# Download Alertmanager
wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
tar xzf alertmanager-0.27.0.linux-amd64.tar.gz
cp alertmanager-0.27.0.linux-amd64/alertmanager /usr/local/bin/Create /etc/alertmanager/alertmanager.yml:
route:
receiver: "slack"
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receivers:
- name: "slack"
slack_configs:
- api_url: "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
channel: "#alerts"
title: "Server Alert"
text: "Alert: {{ .CommonAnnotations.summary }}"
- name: "discord"
webhook_configs:
- url: "https://discord.com/api/webhooks/YOUR/WEBHOOK"For Discord, use the webhook_configs receiver with your Discord webhook URL. Alertmanager sends JSON payloads that Discord renders as embed messages.
Step 5: Install Grafana
apt install -y apt-transport-https software-properties-common
wget -qO- https://apt.grafana.com/gpg.key | gpg --dearmor > /usr/share/keyrings/grafana.gpg
echo "deb [signed-by=/usr/share/keyrings/grafana.gpg] https://apt.grafana.com stable main" > /etc/apt/sources.list.d/grafana.list
apt update && apt install -y grafana
systemctl enable --now grafana-serverGrafana runs on port 3000. Log in with admin/admin and add Prometheus as a data source at http://localhost:9090.
Dashboard Templates
Import community dashboards to get started instantly:
- Dashboard 1860 — Node Exporter Full: CPU, RAM, disk, network in one view
- Dashboard 13978 — Node Exporter Quick: lightweight overview for multi-server setups
- Dashboard 3662 — Prometheus Stats: monitor Prometheus itself
Import via Grafana UI: Dashboards → Import → enter the dashboard ID → select your Prometheus data source.
Resource Monitoring Best Practices
- Set retention wisely — 15 days of metrics at 15s intervals uses roughly 2 GB. Adjust with
--storage.tsdb.retention.time=30d - Monitor the monitor — set up alerts for Prometheus itself (scrape failures, storage errors)
- Use recording rules — pre-compute expensive queries to keep dashboards fast
- Secure endpoints — bind Node Exporter and Prometheus to 127.0.0.1 or use firewall rules
- Label consistently — use standardized labels (environment, service, region) across all exporters
Cost Comparison
Self-hosted monitoring costs nothing beyond the server you are already paying for. At 50 servers, you save over $13,000/year compared to Datadog.
Deploy Monitoring on RAW
npx rawhq deploy7-day free trial. 13 seconds to deploy. Run Prometheus, Grafana, and Alertmanager on the same dedicated server as your app for $0 extra.