Skip to content

Prometheus metrics and you

===========================

We use Prometheus to monitor metrics of our application. The metrics can be one of following

  • constants - constants that doesn't change (often), f.e. php version
  • counters - counters that only ever increase in value, f.e. processed and failed requests
  • gauges - metric that changes over time, f.e. CPU load
  • histograms - bucketed metrics, somewhat complicated, f.e. requests per path
  • summaries - summarized histograms, complicated, f.e. total failed requests per path per user

One metric consists of - name - its unique identifier - labels - more details about the metric - value - the value itself (most likely an int or float, but can also be string or array)

Example obtained from our production cluster:

trader_resource_flags_set{container="standard-service", endpoint="web", flag="ga-event-sent-purchase", instance="10.23.208.84:8080", job="trader-daemon-metrics-standard-service", namespace="trader-prod", pod="trader-daemon-metrics-7d885cb84d-pcdtf", prometheus="monitoring/kube-prometheus-stack-prometheus", resource="order", scope="Application", service="trader-daemon-metrics-standard-service"} 3852

and to explain what each value means (details about the custom labels can be found in ResourceFlagsMetrics.php):

trader_resource_flags_set{
    container="standard-service",                             # The name of the container in k8s
    endpoint="web",                                           # How the metrics were obtained
    service="trader-daemon-metrics-standard-service"          # The service name in k8s
    instance="10.23.208.84:8080",                             # The IP and port of the service
    job="trader-daemon-metrics-standard-service",             # The name of the job in k8s
    namespace="trader-prod",                                  # Namespace of k8s
    pod="trader-daemon-metrics-7d885cb84d-pcdtf",             # Pod name is k8s
    prometheus="monitoring/kube-prometheus-stack-prometheus", # Which monitor captured the metric
    flag="ga-event-sent-purchase",                            # Custom label set by us in code
    resource="order",                                         # Custom label set by us in code
    scope="Application",                                      # Custom label set by us in code
} 3852                                                        # Finally, the value itself, int in this case

How Metrics are Processed

To process the metrics, kubernetes runs Prometheus clients in specialized nodes in metrics.k8s.io/v1beta1/nodes nodelist. standard-service chart provides a configuration option that will signal the metrics client to connect to GET {service-name}:{service-port}/metrics HTTP endpoint on the service.

To enable the metrics integration, specify

serviceMonitor:
  enabled: true
in the service and be sure to implement GET /metrics endpoint. The metrics can then be viewed in grafana. The Prometheus client contacts the endpoint roughly every two seconds.

When using this default configuration, the /metrics path is disabled in ingress and is rendered inaccessible outside of cluster so there would be no need to implement authentication.

Trader: PerPod and Application Wide Metrics

Trader exposes two scopes of metrics FTMO\System\Metrics\MetricsScope: - PerPod - metrics that are important for the given pod, f.e. number of failed requests, apcu usage - Application - metrics that are across the whole logic, f.e. number of completed orders

PerPod metrics are served with Controller_metrics and Prometheus connects directly to every pod running with webserver.yaml values.

Application metrics are served by a specialized deployment named trader-daemon-metrics. It is an extremely stupid HTTP server that contains no routing and will reply with metrics to every request. The logic itself can be seen in MetricsServer.php, the deployment Helm setup na values are deploy-daemon-metrics.sh and daemon-metrics.yaml.

It is imperative that only one instance of the metric daemon runs at the time! Otherwise, the metrics will get summarized when aggregated!

You can also display the current Application metrics using the metrics:command command.

Implementing You Own Metrics

In order to implement you own metrics, you should first determine the scope: - PerPod - significant for one running pod, most likely nothing to do with business logic - Application - significant for business

If you think you should use PerPod metrics, consult metrics already emitted by k8s cluster to Prometheus, you will probably find what you are looking for.

Next, you should know which metric to use, currently only these are implemented - Counters.php - they just ever count up - Gauges.php - they change over time, currently updated every minute

You most likely want the counter, since you can aggregate and modify the queries in Grafana to provide almost any time series data or insight.

Next, you should choose the labels that will be important for the metric. For example, you can create a metric that measures the number of purchases made and specify the country of purchase as a label.

The whole example would look like this:

<?php
declare(strict_types=1);

namespace FTMO\System\Metrics;

use FTMO\Trader\Common\Entities\Order;
use Prometheus\RegistryInterface;

final readonly class PurchaseCompletedMetric implements MetricsUpdaterInterface
{

    public function __construct(
        private RegistryInterface $registry
    ) {}

    public function updateMetrics(): void
    {
        // noop
    }

    public function registerPurchaseCompleted(Order $order): void
    {
        $this->registry->getOrRegisterCounter(
            CollectorRegistryFactory::NAMESPACE,
            Counters::PurchaseCompleted->value,
            'count of purchases completed',
            [
                'country',
                'scope',
            ],
        )->inc([
            $order->getCountry(),
            MetricsScope::Application->name,
        ]);
    }
}

Please make sure to add the scope label.

‼ Beware of adding or removing of labels from current metrics! This is not supported and fixing it is a PITA!

After calling registerPurchaseCompleted, you should see a new metric added that might look like this

# HELP trader_purchase_completed count of purchases completed
# TYPE trader_purchase_completed counter
trader_purchase_completed{country="CZ",scope="Application"} 1

In order to register more advanced Gauges metric, please consult current implementations.