Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluent-bit crashes when handling envoy empty spans #9391

Open
carlos4ndre opened this issue Sep 16, 2024 · 0 comments
Open

Fluent-bit crashes when handling envoy empty spans #9391

carlos4ndre opened this issue Sep 16, 2024 · 0 comments

Comments

@carlos4ndre
Copy link

carlos4ndre commented Sep 16, 2024

Bug Report

Describe the bug

I’m having an issue using fluent-bit (3.1.7) when processing spans sent by Envoy.

The flow is like this: Envoy -> fluent-bit -> OTel collector

From what I can see in the logs, Envoy is sending some periodic data (every 10s) but with empty spans, probably some sort of health check and causes fluent-bit to crash.

Not sure if this is a well known issue. Envoy is configured to send traces over gRPC.

To Reproduce

  1. Create config files for fluent-bit, envoy and OTel collector services

fluent-bit.conf

[SERVICE]
  Flush        1
  Log_Level    debug
  Daemon       off

[INPUT]
  Name       opentelemetry
  Listen     0.0.0.0
  Port       4318
  Tag        otel

#[INPUT]
#  Name  event_type
#  Type  traces
#  Tag   otel

[OUTPUT]
  Name  stdout
  Match otel

[OUTPUT]
  Name       opentelemetry
  Match      otel
  Host       otel-collector
  Port       4318

otel-collector.yaml

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318

exporters:
  logging:
    loglevel: debug

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: []
      exporters: [logging]

envoy.yaml

admin:
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 9901

static_resources:
  listeners:
  - address:
      socket_address:
        address: 0.0.0.0
        port_value: 10000
    traffic_direction: OUTBOUND
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          tracing:
            spawn_upstream_span: true
            verbose: false
            provider:
              name: envoy.tracers.opentelemetry
              typed_config:
                "@type": type.googleapis.com/envoy.config.trace.v3.OpenTelemetryConfig
                grpc_service:
                  envoy_grpc:
                    cluster_name: fluentbit_agent
                  timeout: 2s
                service_name: front-envoy
            client_sampling:
              value: 100
            random_sampling:
              value: 100
            overall_sampling:
              value: 100
          codec_type: AUTO
          stat_prefix: ingress_http
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
          route_config:
            name: proxy_routes
            virtual_hosts:
            - name: proxy
              domains:
              - "*"
              routes:
              - match:
                  prefix: "/echo"
                direct_response:
                  status: 200
                  body:
                    inline_string: "OK"
          access_log:
          - name: envoy.access_loggers.file
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
              path: "/dev/stdout"
              log_format:
                json_format:
                  traceparent: "%REQ(TRACEPARENT)%"
                  tracestate: "%REQ(TRACESTATE)%"
  clusters:
  - name: fluentbit_agent
    type: STRICT_DNS
    lb_policy: ROUND_ROBIN
    typed_extension_protocol_options:
      envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
        "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
        explicit_http_config:
          http2_protocol_options: {}
    load_assignment:
      cluster_name: fluentbit_agent
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: fluent-bit
                port_value: 4318
  1. Create docker-compose.yaml file
services:
  envoy:
    image: envoyproxy/envoy:distroless-v1.31.1
    volumes:
    - ./envoy.yaml:/etc/envoy/envoy.yaml
    ports:
    - "10000:10000"
    - "9901:9901"

  fluent-bit:
    image: fluent/fluent-bit:3.1.7
    container_name: fluent-bit
    volumes:
    - ./fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf
    ports:
    - "4318:4318"

  otel-collector:
    image: otel/opentelemetry-collector:0.109.0
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
    - ./otel-collector.yaml:/etc/otel-collector-config.yaml
  1. Run the following command to bring everything up:
~/demo $ tree
.
├── docker-compose.yaml
├── envoy.yaml
├── fluent-bit.conf
└── otel-collector.yaml

0 directories, 4 files


$ docker compose up --build -d
[+] Running 3/3
 ✔ Container demo-otel-collector-1  Started                                                                                                            0.2s
 ✔ Container demo-envoy-1           Started                                                                                                            0.2s
 ✔ Container fluent-bit             Started
  1. Fluent-bit crashes when receiving payload sent by Envoy
$ docker ps -a
CONTAINER ID   IMAGE                                  COMMAND                  CREATED              STATUS                            PORTS                                                                   NAMES
c61911c888d6   otel/opentelemetry-collector:0.109.0   "/otelcol --config=/…"   About a minute ago   Up About a minute                 4317/tcp, 0.0.0.0:8888->8888/tcp, 55678/tcp, 0.0.0.0:55679->55679/tcp   demo-otel-collector-1
5154ff9bc29b   envoyproxy/envoy:distroless-v1.31.1    "/usr/local/bin/envo…"   About a minute ago   Up About a minute                 0.0.0.0:9901->9901/tcp, 0.0.0.0:10000->10000/tcp                        demo-envoy-1
b82479cb74c9   fluent/fluent-bit:3.1.7                "/fluent-bit/bin/flu…"   About a minute ago   Exited (133) About a minute ago                                                                           fluent-bit

$ docker logs b82479cb74c9
Fluent Bit v3.1.7
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

______ _                  _    ______ _ _           _____  __
|  ___| |                | |   | ___ (_) |         |____ |/  |
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __   / /`| |
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \ | |
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /.___/ /_| |_
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)___/

[2024/09/16 18:37:12] [ info] Configuration:
[2024/09/16 18:37:12] [ info]  flush time     | 1.000000 seconds
[2024/09/16 18:37:12] [ info]  grace          | 5 seconds
[2024/09/16 18:37:12] [ info]  daemon         | 0
[2024/09/16 18:37:12] [ info] ___________
[2024/09/16 18:37:12] [ info]  inputs:
[2024/09/16 18:37:12] [ info]      opentelemetry
[2024/09/16 18:37:12] [ info] ___________
[2024/09/16 18:37:12] [ info]  filters:
[2024/09/16 18:37:12] [ info] ___________
[2024/09/16 18:37:12] [ info]  outputs:
[2024/09/16 18:37:12] [ info]      stdout.0
[2024/09/16 18:37:12] [ info]      opentelemetry.1
[2024/09/16 18:37:12] [ info] ___________
[2024/09/16 18:37:12] [ info]  collectors:
[2024/09/16 18:37:12] [ info] [fluent bit] version=3.1.7, commit=c6e902a43a, pid=1
[2024/09/16 18:37:12] [debug] [engine] coroutine stack size: 196608 bytes (192.0K)
[2024/09/16 18:37:12] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/09/16 18:37:12] [ info] [cmetrics] version=0.9.5
[2024/09/16 18:37:12] [ info] [ctraces ] version=0.5.5
[2024/09/16 18:37:12] [ info] [input:opentelemetry:opentelemetry.0] initializing
[2024/09/16 18:37:12] [ info] [input:opentelemetry:opentelemetry.0] storage_strategy='memory' (memory only)
[2024/09/16 18:37:12] [debug] [opentelemetry:opentelemetry.0] created event channels: read=21 write=22
[2024/09/16 18:37:12] [debug] [downstream] listening on 0.0.0.0:4318
[2024/09/16 18:37:12] [ info] [input:opentelemetry:opentelemetry.0] listening on 0.0.0.0:4318
[2024/09/16 18:37:12] [debug] [stdout:stdout.0] created event channels: read=24 write=25
[2024/09/16 18:37:12] [debug] [opentelemetry:opentelemetry.1] created event channels: read=31 write=32
[2024/09/16 18:37:12] [debug] [router] match rule opentelemetry.0:stdout.0
[2024/09/16 18:37:12] [debug] [router] match rule opentelemetry.0:opentelemetry.1
[2024/09/16 18:37:12] [ info] [sp] stream processor started
[2024/09/16 18:37:12] [ info] [output:stdout:stdout.0] worker #0 started
[2024/09/16 18:37:17] [debug] [task] created task=0xffffa68396e0 id=0 OK
[2024/09/16 18:37:17] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[2024/09/16 18:37:17] [engine] caught signal (SIGSEGV)
...
|-------------------- RESOURCE SPAN --------------------|
  resource:
     - attributes:
            - service.name: 'front-envoy'
     - dropped_attributes_count: 0
  schema_url:
  [scope_span]
    schema_url:
    [spans]
[2024/09/16 18:37:17] [debug] [output:opentelemetry:opentelemetry.1] ctraces msgpack size: 1562
[2024/09/16 18:37:17] [debug] [output:stdout:stdout.0] ctr decode msgpack returned : 6
[2024/09/16 18:37:17] [debug] [out flush] cb_destroy coro_id=0
#0  0xaaaac47bcc03      in  process_traces() at plugins/out_opentelemetry/opentelemetry.c:389
#1  0xaaaac47bcc03      in  cb_opentelemetry_flush() at plugins/out_opentelemetry/opentelemetry.c:485
#2  0xaaaac4b765a7      in  co_switch() at lib/monkey/deps/flb_libco/aarch64.c:133
#3  0xffffffffffffffff  in  ???() at ???:0
  1. Now use the event_type input to show that opentelemetry output works fine with normal span payloads:
#[INPUT]
#  Name       opentelemetry
#  Listen     0.0.0.0
#  Port       4318
#  Tag        otel

[INPUT]
  Name  event_type
  Type  traces
  Tag   otel
...
  1. Send a request to generate a span
$ curl http://localhost:10000/echo
OK
  1. Check that the span is correctly forward from fluent-bit to otel-collector by looking into their logs
$ docker ps
$ docker logs <container_id>
...
|-------------------- RESOURCE SPAN --------------------|
  resource:
     - attributes:
            - service.name: 'Fluent Bit Test Service'
     - dropped_attributes_count: 5
  schema_url: https://ctraces/resource_span_schema_url
  [scope_span]
    instrumentation scope:
        - name                    : ctrace
        - version                 : a.b.c
        - dropped_attributes_count: 3
        - attributes: undefined
    schema_url: https://ctraces/scope_span_schema_url
    [spans]
         [span 'main']
             - trace_id                : 526274528d3beab4f98a82f459bdc77c
             - span_id                 : 1442ef1b690cd8b0
             - parent_span_id          : undefined
             - kind                    : 1 (internal)
             - start_time              : 1726512123955380387
             - end_time                : 1726512123955380387
             - dropped_attributes_count: 0
             - dropped_events_count    : 0
             - status:
                 - code        : 0
             - attributes:
                 - agent: 'Fluent Bit'
                 - year: 2022
                 - open_source: true
                 - temperature: 25.5
                 - my_array: [
                     'first',
                     2,
                     false,
                     [
                         3.1000000000000001,
                         5.2000000000000002,
                         6.2999999999999998
                     ]
                 ]
                 - my-list:
                     - language: 'c'

             - events:
                 - name: connect to remote server
                     - timestamp               : 1726512123955392720
                     - dropped_attributes_count: 0
                     - attributes:
                         - syscall 1: 'open()'
                         - syscall 2: 'connect()'
                         - syscall 3: 'write()'
             - [links]
         [span 'do-work']
             - trace_id                : 526274528d3beab4f98a82f459bdc77c
             - span_id                 : a58aca2253f4b054
             - parent_span_id          : 1442ef1b690cd8b0
             - kind                    : 3 (client)
             - start_time              : 1726512123955397137
             - end_time                : 1726512123955397137
             - dropped_attributes_count: 0
             - dropped_events_count    : 0
             - status:
                 - code        : 0
             - attributes: none
             - events: none
             - [links]
                 - link:
                     - trace_id             : 41354bad670f86e7a9fe6077a7ae3a4c
                     - span_id              : 820d8bab3a51c548
                     - trace_state          : aaabbbccc
                     - dropped_events_count : 2
                     - attributes           : none
...

Expected behaviour

Fluent-bit does not crash when receiving empty spans from Envoy, or allow a way to filter them.

Your Environment

  • Version used:
    • fluent-bit: 3.1.7
    • envoy: v1.31.1
    • otel-collector: 0.109.0
  • Environment name and version: Docker (27.0.3)
  • Operating System and version: MacOS (14.6.1)
  • Filters and plugins: No additional
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants