Monitoring a node
To ensure the node's ongoing health, performance, and reliability, you need a robust monitoring and alerting system. This allows you to proactively identify and address issues before they lead to missed attestations, downtime, or even slashing.
This guide provides a comprehensive overview and step-by-step instructions for setting up a professional-grade monitoring solution.
Monitoring Benefits
Benefit | Description |
---|---|
Performance Optimization | Continuous monitoring helps you understand your validator's performance, identify bottlenecks, and make informed decisions to maximize your rewards. |
Proactive Problem Solving | Receive immediate notifications about critical issues, such as your node going offline, allowing for a swift response to minimize downtime. |
Security Auditing | Monitoring can help you detect unusual activity that might indicate a security breach, such as unexpected changes in peer count or resource usage. |
Peace of Mind | A well-configured alerting system lets you step away from your terminal, confident that you will be notified if your attention is needed. |
Prometheus, Grafana, and Alertmanager
The industry standard for monitoring blockchain nodes is a powerful, open-source trio:
Benefit | Description |
---|---|
Prometheus | A time-series database that "scrapes" (collects) metrics from your validator's various components at regular intervals. It is the engine of your monitoring stack. |
Grafana | A visualization tool that connects to Prometheus as a data source. It allows you to build detailed, easy-to-read dashboards with graphs and charts of your key metrics. |
Alertmanager | An extension of Prometheus that handles alerts. It deduplicates, groups, and routes alerts to various notification channels like Telegram, Slack, or email. |
Setting Up Your Monitoring Stack
Here's how to get this monitoring stack up and running on your validator server.
Install the Monitoring Components
First, you'll need to install Prometheus, Grafana, and Node Exporter (a Prometheus exporter for hardware and OS metrics).
Configure Prometheus to Scrape Metrics
Prometheus needs to be told where to find the metrics from your execution client, consensus client, and the server itself. This is done in the prometheus.yml
configuration file.
Ensure that your execution (e.g., RETH) and consensus (e.g., Lighthouse) clients are started with the flags that enable their Prometheus metrics endpoints. These are typically flags like --metrics
.
Replace the contents of the file with the following configuration. This example assumes you are running Reth, Lighthouse, and Node Exporter on the same machine. Adjust the ports if you have a different setup.
Note
When running the Reth client, use the --metrics 127.0.0.1:9001
flag to set the metrics port, where 9001
can
be customized to suit your needs.
Restart Prometheus to apply the changes:
Set Up Your Grafana Dashboard
Start the Grafana Server:
To access Grafana, open a web browser and navigate to http://<your_server_ip>:3000
. The default login is admin
for both username and password. You'll be prompted to change the password on your first login.
Add Prometheus as a Data Source:
- Click the gear icon on the left-hand menu and select "Data Sources."
- Click "Add data source" and choose "Prometheus."
- For the URL, enter
http://localhost:9090
. - Click "Save & Test."
Import a Pre-built Dashboard: The community has created excellent dashboards that are ready to use.
- On the left-hand menu, click the "+" icon and select "Import."
- You can import a dashboard by its ID. A popular and comprehensive dashboard for EVM nodes is available at Grafana Dashboards. Search for "Ethereum Validator" to find suitable options. For example, a commonly used dashboard for Lighthouse has the ID
11578
. - Enter the dashboard ID and click "Load."
- Select your Prometheus data source from the dropdown and click "Import."
Validator's Key Metrics
Your Grafana dashboard will now display a wealth of information. Here are some of the most important metrics to watch:
Metric | Description |
---|---|
Attestation Effectiveness | How effectively your attestations are included in the chain. Consistently high effectiveness is crucial for maximizing rewards. |
Attestation Inclusion Distance | The number of slots between when your attestation was made and when it was included. A lower distance is better. |
Missed Attestations | The number of times your validator failed to attest. This should be as close to zero as possible. |
Proposed Blocks | The number of blocks your validator has successfully proposed. |
Missed Block Proposals | The number of times your validator was selected to propose a block but failed to do so. This is a critical issue to investigate immediately. |
Validator Balance | Your validator's ETH balance over time. |
Sync Committee Participation | Indicates if your validator is participating in the sync committee, which provides higher rewards. |
Node Health:
Metric | Description |
---|---|
Execution and Consensus Client Peer Count | The number of peers your clients are connected to. A sudden drop could indicate a network issue. |
CPU and Memory Usage | High and sustained resource usage might indicate a problem or the need for a hardware upgrade. |
Disk Space Usage | Monitor this to avoid your node going offline due to a full disk. |
Network Traffic | Helps you understand your bandwidth usage. |
Configuring Alerts
To begin configuring Prometheus to send alerts to Alertmanager for critical events, first create an alerting rules file:
Add the following rules to the file. These are examples; you may need to adjust the expressions based on the specific metrics your clients expose.
Configure Prometheus to Use the Alert Rules and Alertmanager
Edit your prometheus.yml
file to include the alert rules and the Alertmanager target.
Add the following sections:
Restart Prometheus: sudo systemctl restart prometheus
Configure Alertmanager
Next, you need to install and configure Alertmanager to send notifications. The example below targets specifically Telegram.
Edit the Alertmanager Configuration:
Replace the contents with your desired notification channel configuration. You will need to get a bot token and chat ID from Telegram.
Start and enable Alertmanager: