All Posts

How Fluentd Works: The Unified Logging Layer

Logs are messy. Fluentd cleans them up. Learn how this open-source data collector unifies logging from multiple sources.

Abstract AlgorithmsAbstract Algorithms
ยทยท4 min read
Share
Share on X / Twitter
Share on LinkedIn
Copy link

TLDR: Fluentd is an open-source data collector that decouples log sources from destinations. It ingests logs from 100+ sources (Nginx, Docker, syslog), normalizes them to JSON, applies filters and transformations, and routes them to 100+ outputs (Elasticsearch, S3, Kafka). Tag-based routing is the core concept.


๐Ÿ“– A Thousand Services, One Logging Chaos

Before unified logging, a typical microservices stack looks like this:

  • Nginx writes to /var/log/nginx/access.log
  • Java app writes to Log4j rotation files
  • Kubernetes pods write to stdout
  • Database writes to /var/lib/postgresql/log/

Each destination (Splunk, Elasticsearch, S3) requires custom scripts per source. Ten services ร— three destinations = 30 custom scripts, each with its own error handling and retry logic.

Fluentd solves this with a unified layer: any input goes through Fluentd, gets normalized to JSON, and routes to any output using a single config.


๐Ÿ”ข Tags and Routing: Fluentd's Core Concept

Every event in Fluentd has a tag โ€” a dot-separated string that determines where the event is routed.

<source>
  @type tail
  path /var/log/nginx/access.log
  tag web.nginx
  format nginx
</source>

<source>
  @type tail
  path /var/log/app/app.log
  tag app.backend
  format json
</source>

<match web.**>
  @type elasticsearch
  host elastic.local
  port 9200
  index_name nginx-logs
</match>

<match app.**>
  @type s3
  s3_bucket my-log-archive
  path logs/%Y/%m/%d/
</match>
  • web.nginx events match <match web.**> โ†’ go to Elasticsearch
  • app.backend events match <match app.**> โ†’ go to S3
flowchart TD
    Nginx[Nginx access.log\ntag: web.nginx] --> Fluentd
    App[App log\ntag: app.backend] --> Fluentd
    Sys[Syslog\ntag: system.kernel] --> Fluentd

    Fluentd -->|match web.**| ES[Elasticsearch]
    Fluentd -->|match app.**| S3[Amazon S3]
    Fluentd -->|match system.**| Kafka[Kafka topic]

โš™๏ธ The Plugin Architecture: Input โ†’ Filter โ†’ Buffer โ†’ Output

Fluentd's power comes from its plugin model:

Plugin typeRoleExamples
InputCollect events from sourcestail, http, forward, syslog, docker
ParserParse raw text into structured JSONnginx, apache2, json, regexp, csv
FilterTransform, enrich, or drop eventsrecord_transformer, grep, geoip
BufferBatch and retry on output failuresfile, memory
OutputSend events to destinationselasticsearch, s3, kafka, stdout

Buffer plugins are critical for reliability. Without buffering, a downstream outage (e.g., Elasticsearch restart) causes log loss. With a file buffer:

  • Events are written to disk first.
  • Flushed to the output on schedule (or when the buffer fills).
  • Retried automatically on failure with exponential backoff.
<match **>
  @type elasticsearch
  host elastic.local

  <buffer>
    @type file
    path /var/log/fluentd-buffer
    flush_interval 5s
    retry_max_times 10
  </buffer>
</match>

๐Ÿง  Filter: Enriching and Transforming Events

Filters run in the pipeline between input and output:

<filter web.nginx>
  @type record_transformer
  enable_ruby true
  <record>
    environment "production"
    hostname "#{Socket.gethostname}"
    log_level ${record["status"].to_i >= 500 ? "ERROR" : "INFO"}
  </record>
</filter>

This adds environment, hostname, and a derived log_level to every Nginx event before sending to Elasticsearch.


๐ŸŒ Fluentd vs Logstash vs Fluent Bit

FluentdLogstashFluent Bit
LanguageRuby + CJavaC
Memory footprint~60 MB~500 MB+~1 MB
Plugin ecosystem700+ plugins200+ plugins70+ plugins
Best forCentral aggregation serverElasticsearch pipelinesEdge / container collection
Kubernetes patternDeploy as DaemonSet or aggregatorSidecar or aggregatorDaemonSet (forward to Fluentd)

The common production pattern: Fluent Bit as a lightweight DaemonSet on every node, forwarding to a central Fluentd aggregation layer, then to Elasticsearch/S3.


๐Ÿ“Œ Key Takeaways

  • Fluentd collects logs from any source, normalizes to JSON, and routes to any destination via tag-based matching.
  • Plugin types: Input โ†’ Parser โ†’ Filter โ†’ Buffer โ†’ Output.
  • Buffer plugins provide durability: events survive output outages.
  • Compared to Logstash (Java, heavy) and Fluent Bit (C, ultra-light), Fluentd sits in the middle as a reliable aggregation layer.
  • Common pattern: Fluent Bit (edge) โ†’ Fluentd (aggregation) โ†’ Elasticsearch or Kafka.

๐Ÿงฉ Test Your Understanding

  1. An Nginx log event is tagged web.nginx. Which <match> rule catches it: web.** or app.**?
  2. Elasticsearch goes offline for 10 minutes. Without a buffer plugin, what happens to logs?
  3. What is the difference between a parser plugin and a filter plugin in Fluentd?
  4. Why would you run Fluent Bit on each node instead of Fluentd directly?

Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms