Apache SkyWalking— the APM tool for distributed systems– has historically focused on providing observability around tracing and metrics, but service performance is often affected by the host. The newest release, SkyWalking 8.4.0, introduces a new feature for monitoring virtual machines. Users can easily detect possible problems from the dashboard– for example, when CPU usage is overloaded, when there’s not enough memory or disk space, or when the network status is unhealthy, etc.
How it works
SkyWalking leverages Prometheus and OpenTelemetry for collecting metrics data as we did for Istio control panel metrics; Prometheus is mature and widely used, and we expect to see increased adoption of the new CNCF project, OpenTelemetry. The SkyWalking OAP Server receives these metrics data of OpenCensus format from OpenTelemetry. The process is as follows:
- Prometheus Node Exporter collects metrics data from the VMs.
- OpenTelemetry Collector fetches metrics from Node Exporters via Prometheus Receiver, and pushes metrics to SkyWalking OAP Server via the OpenCensus GRPC Exporter.
- The SkyWalking OAP Server parses the expression with MAL to filter/calculate/aggregate and store the results. The expression rules are in `/config/otel-oc-rules/vm.yaml`.
- We can now see the data on the SkyWalking WebUI dashboard.
What to monitor
SkyWalking provides default monitoring metrics including:
- CPU Usage (%)
- Memory RAM Usage (MB)
- Memory Swap Usage (MB)
- CPU Average Used
- CPU Load
- Memory RAM (total/available/used MB)
- Memory Swap (total/free MB)
- File System Mount point Usage (%)
- Disk R/W (KB/s)
- Network Bandwidth Usage (receive/transmit KB/s)
- Network Status (tcp_curr_estab/tcp_tw/tcp_alloc/sockets_used/udp_inuse)
- File fd Allocated
The following is how it looks when we monitor Linux:
How to use
To enable this feature, we need to install Prometheus Node Exporter and OpenTelemetry Collector and activate the VM monitoring rules in SkyWalking OAP Server.
Install Prometheus Node Exporter
Linux Node Exporter exposes metrics on port `9100` by default. When it is running, we can get the metrics from the `/metrics` endpoint. Use a web browser or command `curl` to verify.
We should see all the metrics from the output like:
Note: We only need to install Node Exporter, rather than Prometheus server.
If you want to get more information about Prometheus Node Exporter see: https://prometheus.io/docs/guides/node-exporter/
Install OpenTelemetry Collector
We can quickly install a OpenTelemetry Collector instance by using `docker-compose` with the following steps:
1. Create a directory to store the configuration files, like `/usr/local/otel`.
2. Create `docker-compose.yaml` and `otel-collector-config.yaml` in this directory represented below.
3. In this directory use the command `docker-compose` to start up the container:
After the container is up and running, you should see metrics already exported in the logs:
If you want to get more information about OpenTelemetry Collector, see: https://opentelemetry.io/docs/collector/
Set up SkyWalking OAP Server
To activate the OC handler and VM-relevant rules, set your environment variables.
Note: If there are other rules already activated , you can add vm with use `,` as a separator.
Start the SkyWalking OAP Server.
After all of the above steps are completed, check out the SkyWalking WebUI. The `VM` dashboard provides the default metrics of all observed virtual machines.
Note: Clear the browser local cache if you used it to access deployments of previous SkyWalking versions.
– Read more about the SkyWalking 8.4 release highlights.
– Get more SkyWalking updates on Twitter.
– Read more about SkyWalking from Tetrate on our blog.
– Sign up to hear more about SkyWalking and observability from Tetrate.
– Questions and feedback can be addressed to firstname.lastname@example.org.
Apache SkyWalking contributor Kai Wan and SkyWalking founder Sheng Wu are engineers at Tetrate. Tetrate helps organizations adopt open source service mesh tools, including Istio, Envoy, and Apache SkyWalking, so they can manage microservices, run service mesh on any infrastructure, and modernize their applications.