After I run spark-submit in my YARN cluster with Spark-1.6.2:
./bin/spark-submit --class TerasortApp \
--master yarn \
--deploy-mode cluster \
--driver-memory 4G \
--executor-memory 12G \
--executor-cores 4 \
--num-executors 16 \
--conf spark.yarn.executor.memoryOverhead=4000 \
--conf spark.memory.useLegacyMode=true \
--conf spark.shuffle.memoryFraction=0.6 \
--conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:ArrayAllocationWarningSize=2048M" \
--queue spark \
/home/sanbai/myspark/target/scala-2.10/test_2.10-1.0.jar
The job fail, and the log report:
com.esotericsoftware.kryo.KryoException: java.io.IOException: failed to uncompress the chunk: PARSING_ERROR(2) Serialization trace: bytes (org.apache.hadoop.io.Text) at com.esotericsoftware.kryo.io.Input.fill(Input.java:142) at com.esotericsoftware.kryo.io.Input.require(Input.java:169) at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:317) at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:297) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18) at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699) at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611) at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729) at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:228) at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:171) at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:201) at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:198) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
Somebody in the internet say may be this is caused by the compatibility problem between Spark-1.6.2 and Snappy. Therefore I add
--conf spark.io.compression.codec=lz4
to my spark-submit shell script to change compress algorithm from Snappy to lz4. And this time everything goes ok.