Monitoring a node

To ensure the node's ongoing health, performance, and reliability, you need a robust monitoring and alerting system. This allows you to proactively identify and address issues before they lead to missed attestations, downtime, or even slashing.

This guide provides a comprehensive overview and step-by-step instructions for setting up a professional-grade monitoring solution.

Monitoring Benefits

BenefitDescription
Performance OptimizationContinuous monitoring helps you understand your validator's performance, identify bottlenecks, and make informed decisions to maximize your rewards.
Proactive Problem SolvingReceive immediate notifications about critical issues, such as your node going offline, allowing for a swift response to minimize downtime.
Security AuditingMonitoring can help you detect unusual activity that might indicate a security breach, such as unexpected changes in peer count or resource usage.
Peace of MindA well-configured alerting system lets you step away from your terminal, confident that you will be notified if your attention is needed.

Prometheus, Grafana, and Alertmanager

The industry standard for monitoring blockchain nodes is a powerful, open-source trio:

BenefitDescription
PrometheusA time-series database that "scrapes" (collects) metrics from your validator's various components at regular intervals. It is the engine of your monitoring stack.
GrafanaA visualization tool that connects to Prometheus as a data source. It allows you to build detailed, easy-to-read dashboards with graphs and charts of your key metrics.
AlertmanagerAn extension of Prometheus that handles alerts. It deduplicates, groups, and routes alerts to various notification channels like Telegram, Slack, or email.

Setting Up Your Monitoring Stack

Here's how to get this monitoring stack up and running on your validator server.

Install the Monitoring Components

First, you'll need to install Prometheus, Grafana, and Node Exporter (a Prometheus exporter for hardware and OS metrics).

sudo apt-get update
sudo apt-get install -y prometheus prometheus-node-exporter grafana

Configure Prometheus to Scrape Metrics

Prometheus needs to be told where to find the metrics from your execution client, consensus client, and the server itself. This is done in the prometheus.yml configuration file.

Ensure that your execution (e.g., RETH) and consensus (e.g., Lighthouse) clients are started with the flags that enable their Prometheus metrics endpoints. These are typically flags like --metrics.

sudo nano /etc/prometheus/prometheus.yml

Replace the contents of the file with the following configuration. This example assumes you are running Reth, Lighthouse, and Node Exporter on the same machine. Adjust the ports if you have a different setup.

global:
    scrape_interval: 15s
 
scrape_configs:
    - job_name: 'node_exporter'
    static_configs:
        - targets: ['localhost:9100']
 
    - job_name: 'execution_client'
    static_configs:
        - targets: ['localhost:9001'] # Set your custom Reth metrics port
 
    - job_name: 'consensus_client'
    static_configs:
        - targets: ['localhost:5054'] # Default Lighthouse metrics port

Note

When running the Reth client, use the --metrics 127.0.0.1:9001 flag to set the metrics port, where 9001 can be customized to suit your needs.

Restart Prometheus to apply the changes:

sudo systemctl restart prometheus

Set Up Your Grafana Dashboard

Start the Grafana Server:

sudo systemctl start grafana-server
sudo systemctl enable grafana-server

To access Grafana, open a web browser and navigate to http://<your_server_ip>:3000. The default login is admin for both username and password. You'll be prompted to change the password on your first login.

Add Prometheus as a Data Source:

  1. Click the gear icon on the left-hand menu and select "Data Sources."
  2. Click "Add data source" and choose "Prometheus."
  3. For the URL, enter http://localhost:9090.
  4. Click "Save & Test."

Import a Pre-built Dashboard: The community has created excellent dashboards that are ready to use.

  1. On the left-hand menu, click the "+" icon and select "Import."
  2. You can import a dashboard by its ID. A popular and comprehensive dashboard for EVM nodes is available at Grafana Dashboards. Search for "Ethereum Validator" to find suitable options. For example, a commonly used dashboard for Lighthouse has the ID 11578.
  3. Enter the dashboard ID and click "Load."
  4. Select your Prometheus data source from the dropdown and click "Import."

Validator's Key Metrics

Your Grafana dashboard will now display a wealth of information. Here are some of the most important metrics to watch:

MetricDescription
Attestation EffectivenessHow effectively your attestations are included in the chain. Consistently high effectiveness is crucial for maximizing rewards.
Attestation Inclusion DistanceThe number of slots between when your attestation was made and when it was included. A lower distance is better.
Missed AttestationsThe number of times your validator failed to attest. This should be as close to zero as possible.
Proposed BlocksThe number of blocks your validator has successfully proposed.
Missed Block ProposalsThe number of times your validator was selected to propose a block but failed to do so. This is a critical issue to investigate immediately.
Validator BalanceYour validator's ETH balance over time.
Sync Committee ParticipationIndicates if your validator is participating in the sync committee, which provides higher rewards.

Node Health:

MetricDescription
Execution and Consensus Client Peer CountThe number of peers your clients are connected to. A sudden drop could indicate a network issue.
CPU and Memory UsageHigh and sustained resource usage might indicate a problem or the need for a hardware upgrade.
Disk Space UsageMonitor this to avoid your node going offline due to a full disk.
Network TrafficHelps you understand your bandwidth usage.

Configuring Alerts

To begin configuring Prometheus to send alerts to Alertmanager for critical events, first create an alerting rules file:

sudo nano /etc/prometheus/alerts.yml

Add the following rules to the file. These are examples; you may need to adjust the expressions based on the specific metrics your clients expose.

groups:
- name: validator_alerts
  rules:
  - alert: ValidatorNodeDown
    expr: up{job=~"consensus_client|execution_client"} == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "Validator node component is down"
      description: "The {{ $labels.job }} on {{ $labels.instance }} is down."
 
  - alert: MissedBlockProposal
    expr: increase(validator_block_proposal_missed_total[5m]) > 0
    labels:
      severity: critical
    annotations:
      summary: "Missed a block proposal"
      description: "Validator has missed a block proposal in the last 5 minutes."
 
  - alert: HighMissedAttestations
    expr: increase(validator_attestations_missed_total[15m]) > 5
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High number of missed attestations"
      description: "Validator has missed more than 5 attestations in the last 15 minutes."
 
  - alert: LowPeerCount
    expr: consensus_peers < 10 or execution_peers < 10
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Low peer count"
      description: "The {{ $labels.job }} has a low peer count."

Configure Prometheus to Use the Alert Rules and Alertmanager

Edit your prometheus.yml file to include the alert rules and the Alertmanager target.

sudo nano /etc/prometheus/prometheus.yml

Add the following sections:

rule_files:
  - "alerts.yml"
 
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - localhost:9093

Restart Prometheus: sudo systemctl restart prometheus

Configure Alertmanager

Next, you need to install and configure Alertmanager to send notifications. The example below targets specifically Telegram.

sudo apt-get install -y alertmanager

Edit the Alertmanager Configuration:

sudo nano /etc/alertmanager/alertmanager.yml

Replace the contents with your desired notification channel configuration. You will need to get a bot token and chat ID from Telegram.

route:
    receiver: 'telegram'
 
receivers:
- name: 'telegram'
    telegram_configs:
    - bot_token: 'YOUR_BOT_TOKEN'
    chat_id: YOUR_CHAT_ID

Start and enable Alertmanager:

sudo systemctl start alertmanager
sudo systemctl enable alertmanager