Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluent Bit cannot resume logs from file storage once the connection to the upstream is restored. #9339

Open
MarkRushB opened this issue Sep 3, 2024 · 0 comments

Comments

@MarkRushB
Copy link

MarkRushB commented Sep 3, 2024

Bug Report

Describe the bug
Fluent Bit cannot resume logs from file storage once the connection to the upstream is restored.

To Reproduce

  • I used helm to deploy fluent-bit on my k8s clusters
  • I configured filestorage for my flluent-bit:
        [SERVICE]
            Daemon Off
            Flush {{ .Values.flush }}
            Log_Level {{ .Values.logLevel }}
            Parsers_File /fluent-bit/etc/parsers.conf
            Parsers_File /fluent-bit/etc/conf/custom_parsers.conf
            HTTP_Server On
            HTTP_Listen 0.0.0.0
            HTTP_Port {{ .Values.metricsPort }}
            Health_Check On

            # Persistent storage path for buffering
            storage.path /var/log/flb-storage/
            storage.sync normal
            storage.checksum off
            storage.backlog.mem_limit 5M
            storage.max_chunks_up 300

        [INPUT]
            Name tail
            Tag app_container
            Path /var/log/test.log
            # Path /var/log/*.log, /var/log/*/*.log, /var/log/*/*/*.log, /var/log/*/*/*/*.log, /var/log/*/*/*/*/*.log,
            # Exclude_Path /var/log/containers/nginx-ingress-*.log, /var/log/containers/fluent-bit-*.log, /var/log/containers/fluentbit-*.log, /var/log/pods/fluent-bit_*/*/*.log, /var/log/containers/cloudguard-*.log, /var/log/pods/checkpoint_cloudguard-*/*/*.log, /var/log/flb-storage/*
            Path_Key filename
            Parser cri
            DB /var/log/flb_kube.db
            storage.type filesystem
            Mem_Buf_Limit     5MB
            Buffer_Max_Size   1MB
            Skip_Long_Lines   Off
            Refresh_Interval 30
            Alias  app_log_file
        [OUTPUT]
            Name tcp
            Match app_container
            Host 107.162.208.134
            Port 20540
            Format json_lines
            Json_date_key false
            tls On
            tls.verify Off
            Alias sse-ingest
            storage.total_limit_size  500M
  • I used a script to generate dummy logs:
#!/bin/bash

# Define the log file path
log_file="/var/log/test.log"

# Create the log file if it doesn't exist
touch $log_file

echo "Starting to generate logs to $log_file"

# Initialize the counter
log_count=1

# Loop to generate logs
while true; do
  echo "$(date +'%Y-%m-%d %H:%M:%S') - Log entry $log_count: This is log message number $log_count" >> $log_file
  ((log_count++))  # Increment the counter
  sleep 1  # Generate a log entry every second
done
  • After running the dummy log script for a while, I attempted to manually modify the configuration by changing the TCP host in the output to a non-functional one. This configuration change in Fluent Bit triggered a pod redeployment. Following this, I noticed that logs were not being pushed to the upstream due to the unavailable connection.
  • I observed there were some chunks under file we I specified
s.zhao@gke-dv1-gcp-csse1-us-n2s8-application-5a90eb1f-3slt /var/log/flb-storage/tail.1 $ ls
1-1725392991.728518370.flb  1-1725392993.737605434.flb  1-1725392995.746954773.flb  1-1725392997.756039616.flb  1-1725392999.765454780.flb  1-1725393001.775012872.flb
1-1725392992.732981341.flb  1-1725392994.742255505.flb  1-1725392996.751538449.flb  1-1725392998.760521464.flb  1-1725393000.770326169.flb
  • then I updated the host to correct one and pods got restarted. Then from my upstream (ELK), looks like I lost those logs when the connection was unavailable.
image From Kibana, the log jumped from 215 to 363, we missed 216 - 362.

Your Environment

  • Version used: helm version: 0.38.0

Additional context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant