You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
node_metrics output plugin producing stale data for no longer existent network devices, it could be observed in mimir logs and file output dump.
It is triggered somehow by veth* virtual network devices created for docker containers, metrics are repeatedly send long after device is gone. We are using docker nodes in Swarm mode to run application build and tests (Jenkins agents) so containers are short lived instances.
To Reproduce
create docker container with network
metrics dumped by file output (correct date on host 2024-09-18T06:23:54):
failed pushing to ingester opentelemetry-mimir-3: user=anonymous: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2024-09-17T13:04:09.163Z and is from series node_network_transmit_errs_total{device="vethb672c5c", host_name="xxxx.xxx.xxxx", metrics_agent="fluent-bit", metrics_source="host-metrics"}
Steps to reproduce the problem:
Expected behavior
No stale metrics delivered
Environment
Version used: fluent-bit-3.1.7-1.x86_64
Configuration:
[SERVICE]
# Flush
# =====
# set an interval of seconds before to flush records to a destination
flush 1
# Daemon
# ======
# instruct Fluent Bit to run in foreground or background mode.
daemon Off
# Log_Level
# =========
# Set the verbosity level of the service, values can be:
#
# - error
# - warning
# - info
# - debug
# - trace
#
# by default 'info' is set, that means it includes 'error' and 'warning'.
log_level debug
# Parsers File
# ============
# specify an optional 'Parsers' configuration file
parsers_file parsers.conf
parsers_file parsers-custom.conf
# Plugins File
# ============
# specify an optional 'Plugins' configuration file to load external plugins.
plugins_file plugins.conf
# HTTP Server
# ===========
# Enable/Disable the built-in HTTP Server for metrics
http_server Off
http_listen 0.0.0.0
http_port 2020
# Storage
# =======
# Fluent Bit can use memory and filesystem buffering based mechanisms
#
# - https://docs.fluentbit.io/manual/administration/buffering-and-storage
#
# storage metrics
# ---------------
# publish storage pipeline metrics in '/api/v1/storage'. The metrics are
# exported only if the 'http_server' option is enabled.
storage.metrics on
# storage.path
# ------------
# absolute file system path to store filesystem data buffers (chunks).
#
storage.path /var/lib/fluent-bit/storage
# storage.sync
# ------------
# configure the synchronization mode used to store the data into the
# filesystem. It can take the values normal or full.
#
storage.sync normal
# storage.checksum
# ----------------
# enable the data integrity check when writing and reading data from the
# filesystem. The storage layer uses the CRC32 algorithm.
#
# storage.checksum off
# storage.backlog.mem_limit
# -------------------------
# if storage.path is set, Fluent Bit will look for data chunks that were
# not delivered and are still in the storage layer, these are called
# backlog data. This option configure a hint of maximum value of memory
# to use when processing these records.
#
# storage.backlog.mem_limit 5M
storage.total_limit_size 512M
storage.max_chunks_up 128
# Systemd services logs (docker)
[INPUT]
Name systemd
Tag systemd.*
Systemd_Filter _SYSTEMD_UNIT=docker.service
Lowercase on
Strip_Underscores on
DB /var/lib/fluent-bit/cursors/systemd.sqlite
storage.type filesystem
[INPUT]
Name node_exporter_metrics
Tag node_metrics
metrics "cpu,meminfo,diskstats,filesystem,uname,stat,time,loadavg,vmstat,netdev,filefd"
Scrape_interval 15
# Forward/fluentd input for docker services logging
[INPUT]
Name forward
Unix_Path /run/fluentd-forward.sock
Unix_Perm 0666
storage.type filesystem
[OUTPUT]
Match systemd.*
Name opensearch
Host xxxxx.xxx.xxxxxx
Port 443
HTTP_User fluentbit
HTTP_Passwd xxxxxxxx
Index systemd
Suppress_Type_Name On
Tls On
[OUTPUT]
Name opentelemetry
Match node_metrics
Host xxx.xxx.xxx
Port 443
Log_response_payload False
Tls On
logs_body_key $message
logs_span_id_message_key span_id
logs_trace_id_message_key trace_id
logs_severity_text_message_key loglevel
logs_severity_number_message_key lognum
# add user-defined labels
add_label metrics_agent fluent-bit
add_label metrics_source host-metrics
add_label host_name xxxx.xxx.xxx
[OUTPUT]
Name file
Match node_metrics
Path /var/log
File metrics.log
Environment name and version: Docker CE docker-ce-25.0.3-1.el9.x86_64
Server type and version: Dell Inspiron 5577
Operating System and version: AlmaLinux 9.3
Filters and plugins:
input node_exporter
output opentelemetry
output file (for debug)
The text was updated successfully, but these errors were encountered:
Bug Report
node_metrics output plugin producing stale data for no longer existent network devices, it could be observed in mimir logs and file output dump.
It is triggered somehow by veth* virtual network devices created for docker containers, metrics are repeatedly send long after device is gone. We are using docker nodes in Swarm mode to run application build and tests (Jenkins agents) so containers are short lived instances.
To Reproduce
Expected behavior
No stale metrics delivered
Environment
The text was updated successfully, but these errors were encountered: