Question1: Flume process report “Expected timestamp in the Flume event headers, but it was null”
Solution1: The flume process expect to receive events with timestamp, but the event doesn’t have. For sending normal text event to flume, we need to tell it to generate timestamp with every events by itself. Put below line into configuration:
a1.sinks.k1.hdfs.useLocalTimeStamp=true
Question2: HDFS Sink generates tremendous small files with high frequency even we have set “a1.sinks.k2.hdfs.rollInterval=600”
Solution2: We still need to set “rollCount” and “rollSize”, as Flume will roll file if any condition of “rollInterval”, “rollCOunt”, or “rollSize” been fulfilled.
a1.sinks.k1.hdfs.rollInterval=600 a1.sinks.k1.hdfs.rollCount=0 a1.sinks.k1.hdfs.rollSize=0
Question3: Flume process exit and report “Exception in thread “SinkRunner-PollingRunner-DefaultSinkProcessor” java.lang.OutOfMemoryError: GC overhead limit exceeded”
Solution3: Simply add “JAVA_OPTS=”-Xms12g -Xmx12g” (My server has more than 16G physical memory) into “/usr/lib/flume-ng/bin/flume-ng”
—— My configuration file for Flume ——
a1.sources=r1 r2 a1.sinks=k1 k2 a1.channels=c1 c2 a1.sources.r1.type=netcat a1.sources.r1.bind=0.0.0.0 a1.sources.r1.port=44444 a1.sources.r1.channels=c1 a1.sources.r2.type=netcat a1.sources.r2.bind=0.0.0.0 a1.sources.r2.port=55555 a1.sources.r2.channels=c2 a1.sinks.k1.type=hdfs a1.sinks.k1.hdfs.path=/user/realestates/CN/%Y-%m-%d/ a1.sinks.k1.hdfs.filePrefix=re- a1.sinks.k1.hdfs.rollInterval=600 a1.sinks.k1.hdfs.rollCount=0 a1.sinks.k1.hdfs.rollSize=0 a1.sinks.k1.hdfs.round = true a1.sinks.k1.hdfs.roundValue = 1 a1.sinks.k1.hdfs.roundUnit = hour a1.sinks.k1.hdfs.writeFormat=Text a1.sinks.k1.hdfs.useLocalTimeStamp=true a1.sinks.k1.channel=c1 a1.sinks.k2.type=hdfs a1.sinks.k2.hdfs.path=/user/realestates/AU/%Y-%m-%d/ a1.sinks.k2.hdfs.filePrefix=re- a1.sinks.k2.hdfs.rollInterval=600 a1.sinks.k2.hdfs.rollCount=0 a1.sinks.k2.hdfs.rollSize=0 a1.sinks.k2.hdfs.round = true a1.sinks.k2.hdfs.roundValue = 1 a1.sinks.k2.hdfs.roundUnit = hour a1.sinks.k2.hdfs.writeFormat=Text a1.sinks.k2.hdfs.useLocalTimeStamp=true a1.sinks.k2.channel=c2 a1.channels.c1.type=memory a1.channels.c1.capacity=23456789 a1.channels.c1.transactionCapacity=23456789 a1.channels.c2.type=memory a1.channels.c2.capacity=23456789 a1.channels.c2.transactionCapacity=23456789
The startup command for Cloudera Environment:
sudo -u hdfs flume-ng agent --conf ./ --conf-file example.conf \ -name a1 -Dflume.root.logger=INFO,console \ -Dorg.apache.flume.log.rawdata=true