You can choose to compress the output of a Map-Reduce job in Hadoop. You can configure to do it for all the jobs in a cluster or you can set properties for specific jobs.
Configuration parameters for compressing MapReduce job output
- mapreduce.output.fileoutputformat.compress- Set this property to true if you want to compress the MapReduce job output. Default value is false.
- mapreduce.output.fileoutputformat.compress.type- This configuration is applicable if your MapReduce job output is a sequence file. In that case you can specify any one of these value for compression- None, Record or Block. Default is Record.
- mapreduce.output.fileoutputformat.compress.codec– Which codec is to be used for compression. Default is org.apache.hadoop.io.compress.DefaultCodec
Configuring at cluster level
If you want to compress output of all MapReduce jobs running on
the cluster, then you can configure these parameters in mapred-site.xml.
As example- If you want to compress the output of MapReduce jobs
and the compression format used is Gzip.
<property> <name>mapreduce.output.fileoutputformat.compress</name> <value>true</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.type</name> <value>RECORD</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property>
Configuring at per-job basis
If you want to compress output of the specific MapReduce job then add the following properties in your job configuration.
FileOutputFormat.setCompressOutput(job, true); FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);If output is a sequence file then you can set compression type too.
SequenceFileOutputFormat.setOutputCompressionType(job, CompressionType.BLOCK);
That's all for this topic How to Compress MapReduce Job Output in Hadoop. If you have any doubt or any suggestions to make please drop a comment. Thanks!
>>>Return to Hadoop Framework Tutorial Page
Related Topics
You may also like-