Some tips about using Apache Flume

Question1: Flume process report “Expected timestamp in the Flume event headers, but it was null”
Solution1: The flume process expect to receive events with timestamp, but the event doesn’t have. For sending normal text event to flume, we need to tell it to generate timestamp with every events by itself. Put below line into configuration:

Question2: HDFS Sink generates tremendous small files with high frequency even we have set “a1.sinks.k2.hdfs.rollInterval=600”
Solution2: We still need to set “rollCount” and “rollSize”, as Flume will roll file if any condition of “rollInterval”, “rollCOunt”, or “rollSize” been fulfilled.

Question3: Flume process exit and report “Exception in thread “SinkRunner-PollingRunner-DefaultSinkProcessor” java.lang.OutOfMemoryError: GC overhead limit exceeded”
Solution3: Simply add “JAVA_OPTS=”-Xms12g -Xmx12g” (My server has more than 16G physical memory) into “/usr/lib/flume-ng/bin/flume-ng”

—— My configuration file for Flume ——

The startup command for Cloudera Environment:

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.