Question1: Flume process report “Expected timestamp in the Flume event headers, but it was null”
Solution1: The flume process expect to receive events with timestamp, but the event doesn’t have. For sending normal text event to flume, we need to tell it to generate timestamp with every events by itself. Put below line into configuration:

a1.sinks.k1.hdfs.useLocalTimeStamp=true

Question2: HDFS Sink generates tremendous small files with high frequency even we have set “a1.sinks.k2.hdfs.rollInterval=600”
Solution2: We still need to set “rollCount” and “rollSize”, as Flume will roll file if any condition of “rollInterval”, “rollCOunt”, or “rollSize” been fulfilled.

a1.sinks.k1.hdfs.rollInterval=600
a1.sinks.k1.hdfs.rollCount=0
a1.sinks.k1.hdfs.rollSize=0

Question3: Flume process exit and report “Exception in thread “SinkRunner-PollingRunner-DefaultSinkProcessor” java.lang.OutOfMemoryError: GC overhead limit exceeded”
Solution3: Simply add “JAVA_OPTS=”-Xms12g -Xmx12g” (My server has more than 16G physical memory) into “/usr/lib/flume-ng/bin/flume-ng”

—— My configuration file for Flume ——

a1.sources=r1 r2
a1.sinks=k1 k2
a1.channels=c1 c2

a1.sources.r1.type=netcat
a1.sources.r1.bind=0.0.0.0
a1.sources.r1.port=44444
a1.sources.r1.channels=c1

a1.sources.r2.type=netcat
a1.sources.r2.bind=0.0.0.0
a1.sources.r2.port=55555
a1.sources.r2.channels=c2

a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=/user/realestates/CN/%Y-%m-%d/
a1.sinks.k1.hdfs.filePrefix=re-
a1.sinks.k1.hdfs.rollInterval=600
a1.sinks.k1.hdfs.rollCount=0
a1.sinks.k1.hdfs.rollSize=0
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 1
a1.sinks.k1.hdfs.roundUnit = hour
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.useLocalTimeStamp=true
a1.sinks.k1.channel=c1

a1.sinks.k2.type=hdfs
a1.sinks.k2.hdfs.path=/user/realestates/AU/%Y-%m-%d/
a1.sinks.k2.hdfs.filePrefix=re-
a1.sinks.k2.hdfs.rollInterval=600
a1.sinks.k2.hdfs.rollCount=0
a1.sinks.k2.hdfs.rollSize=0
a1.sinks.k2.hdfs.round = true
a1.sinks.k2.hdfs.roundValue = 1
a1.sinks.k2.hdfs.roundUnit = hour
a1.sinks.k2.hdfs.writeFormat=Text
a1.sinks.k2.hdfs.useLocalTimeStamp=true
a1.sinks.k2.channel=c2

a1.channels.c1.type=memory
a1.channels.c1.capacity=23456789
a1.channels.c1.transactionCapacity=23456789

a1.channels.c2.type=memory
a1.channels.c2.capacity=23456789
a1.channels.c2.transactionCapacity=23456789

The startup command for Cloudera Environment:

sudo -u hdfs flume-ng agent --conf ./ --conf-file example.conf \
     -name a1 -Dflume.root.logger=INFO,console \
     -Dorg.apache.flume.log.rawdata=true