OpsCanary
observabilityprometheusPractitioner

Mastering Recording Rules in Prometheus: Boost Your Observability

5 min read Prometheus DocsApr 28, 2026
Share
PractitionerHands-on experience recommended

Recording rules exist to solve a fundamental problem in observability: the need for efficient querying of frequently accessed or computationally heavy metrics. By precomputing these expressions, you save time and resources, allowing your monitoring setup to scale effectively. This means you can focus on actionable insights rather than waiting for complex calculations during each query.

In Prometheus, you define recording rules within rule groups, which are evaluated at regular intervals. Each rule must have a valid metric name for recording and can include parameters like evaluation_interval, which dictates how often the rules are evaluated, and limit, which controls the number of alerts or series produced. For example, a simple recording rule might look like this:

YAML
groups:
- name: example rules
  record: prometheus_http_requests_total
  expr: sum by (code) (prometheus_http_requests_total)

In production, be mindful of the limitations. If your defined limit is exceeded, all series produced by the rule are discarded, and any alerts tied to that rule are cleared. This can lead to gaps in your monitoring data, so set your limits thoughtfully. Additionally, if a rule group takes too long to evaluate, the next evaluation is skipped, potentially leaving you without timely alerts.

Key takeaways

  • Define recording rules to precompute expensive queries and save time.
  • Use valid metric names for recording rules and valid label values for alerting rules.
  • Set the `evaluation_interval` to control how often rules are evaluated.
  • Be cautious with the `limit` parameter to avoid losing critical alert data.
  • Monitor evaluation times to prevent skipped evaluations in rule groups.

Why it matters

In production, efficient metric computation can significantly reduce load on your Prometheus server, leading to faster response times and better resource utilization. This directly impacts your ability to react to incidents swiftly.

Code examples

shiki
promtool check rules /path/to/example.rules.yml
shiki
groups: - name: example rules - record: prometheus_http_requests_total: sum expr: sum by (code) (prometheus_http_requests_total)
shiki
# The name of the time series to output to. Must be a valid metric name. record: <string> # The PromQL expression to evaluate. Every evaluation cycle this is # evaluated at the current time, and the result recorded as a new set of # time series with the metric name as given by 'record'. expr: <string> # Labels to add or overwrite before storing the result. labels: [<labelname>:<labelvalue>]

When NOT to use this

The official docs don't call out specific anti-patterns here. Use your judgment based on your scale and requirements.

Want the complete reference?

Read official docs

Test what you just learned

Quiz questions written from this article

Take the quiz →

Get the daily digest

One email. 5 articles. Every morning.

No spam. Unsubscribe anytime.