Prometheus metrics and you
===========================
We use Prometheus to monitor metrics of our application. The metrics can be one of following
- constants - constants that doesn't change (often), f.e. php version
- counters - counters that only ever increase in value, f.e. processed and failed requests
- gauges - metric that changes over time, f.e. CPU load
- histograms - bucketed metrics, somewhat complicated, f.e. requests per path
- summaries - summarized histograms, complicated, f.e. total failed requests per path per user
One metric consists of - name - its unique identifier - labels - more details about the metric - value - the value itself (most likely an int or float, but can also be string or array)
Example obtained from our production cluster:
trader_resource_flags_set{container="standard-service", endpoint="web", flag="ga-event-sent-purchase", instance="10.23.208.84:8080", job="trader-daemon-metrics-standard-service", namespace="trader-prod", pod="trader-daemon-metrics-7d885cb84d-pcdtf", prometheus="monitoring/kube-prometheus-stack-prometheus", resource="order", scope="Application", service="trader-daemon-metrics-standard-service"} 3852
and to explain what each value means (details about the custom labels can be found in ResourceFlagsMetrics.php):
trader_resource_flags_set{
container="standard-service", # The name of the container in k8s
endpoint="web", # How the metrics were obtained
service="trader-daemon-metrics-standard-service" # The service name in k8s
instance="10.23.208.84:8080", # The IP and port of the service
job="trader-daemon-metrics-standard-service", # The name of the job in k8s
namespace="trader-prod", # Namespace of k8s
pod="trader-daemon-metrics-7d885cb84d-pcdtf", # Pod name is k8s
prometheus="monitoring/kube-prometheus-stack-prometheus", # Which monitor captured the metric
flag="ga-event-sent-purchase", # Custom label set by us in code
resource="order", # Custom label set by us in code
scope="Application", # Custom label set by us in code
} 3852 # Finally, the value itself, int in this case
How Metrics are Processed¶
To process the metrics, kubernetes runs Prometheus clients in specialized nodes in metrics.k8s.io/v1beta1/nodes
nodelist. standard-service chart provides a configuration option that will signal the metrics client to
connect to GET {service-name}:{service-port}/metrics HTTP endpoint on the service.
To enable the metrics integration, specify
in the service and be sure to implementGET /metrics endpoint. The metrics can then be viewed in grafana.
The Prometheus client contacts the endpoint roughly every two seconds.
When using this default configuration, the /metrics path is disabled in ingress and is rendered inaccessible
outside of cluster so there would be no need to implement authentication.
Trader: PerPod and Application Wide Metrics¶
Trader exposes two scopes of metrics FTMO\System\Metrics\MetricsScope:
- PerPod - metrics that are important for the given pod, f.e. number of failed requests, apcu usage
- Application - metrics that are across the whole logic, f.e. number of completed orders
PerPod metrics are served with Controller_metrics and Prometheus connects directly to every pod running
with webserver.yaml values.
Application metrics are served by a specialized deployment named trader-daemon-metrics.
It is an extremely stupid HTTP server that contains no routing and will reply with metrics to every request.
The logic itself can be seen in MetricsServer.php, the deployment
Helm setup na values are deploy-daemon-metrics.sh and
daemon-metrics.yaml.
It is imperative that only one instance of the metric daemon runs at the time! Otherwise, the metrics will get summarized when aggregated!
You can also display the current Application metrics using the metrics:command command.
Implementing You Own Metrics¶
In order to implement you own metrics, you should first determine the scope:
- PerPod - significant for one running pod, most likely nothing to do with business logic
- Application - significant for business
If you think you should use PerPod metrics, consult metrics already emitted by k8s cluster to Prometheus,
you will probably find what you are looking for.
Next, you should know which metric to use, currently only these are implemented - Counters.php - they just ever count up - Gauges.php - they change over time, currently updated every minute
You most likely want the counter, since you can aggregate and modify the queries in Grafana
to provide almost any time series data or insight.
Next, you should choose the labels that will be important for the metric. For example, you can create a metric that measures the number of purchases made and specify the country of purchase as a label.
The whole example would look like this:
<?php
declare(strict_types=1);
namespace FTMO\System\Metrics;
use FTMO\Trader\Common\Entities\Order;
use Prometheus\RegistryInterface;
final readonly class PurchaseCompletedMetric implements MetricsUpdaterInterface
{
public function __construct(
private RegistryInterface $registry
) {}
public function updateMetrics(): void
{
// noop
}
public function registerPurchaseCompleted(Order $order): void
{
$this->registry->getOrRegisterCounter(
CollectorRegistryFactory::NAMESPACE,
Counters::PurchaseCompleted->value,
'count of purchases completed',
[
'country',
'scope',
],
)->inc([
$order->getCountry(),
MetricsScope::Application->name,
]);
}
}
Please make sure to add the
scopelabel.‼ Beware of adding or removing of labels from current metrics! This is not supported and fixing it is a PITA!
After calling registerPurchaseCompleted, you should see a new metric added that might look like this
# HELP trader_purchase_completed count of purchases completed
# TYPE trader_purchase_completed counter
trader_purchase_completed{country="CZ",scope="Application"} 1
In order to register more advanced Gauges metric, please consult current implementations.