Tuesday 15 April 2014

scala - Spark Standalone Mode: Change replication factor of HDFS output -


In my hdfs-site.xml the replication factor of 1 is configured.

However, when writing my results in HDFS:

  someMap.saveAsTextFile ("hdfs: // HOST: PORT / out")   

The result is repeated with one aspect of 3, overwriting my own replica factor, to save some space, I also like the replication factor of 1 for my output.

How can the replication agent use 1 to tell HDFS?

I think the SPARC is loading on it whose copy is at 3. To override this, you have to set an environment variable or system properties similar to other Spark configurations

  System.setProperty ("spark.hadoop.dfs.replication "," 1 ")   

or in your jvm startup:

  -Dspark.hadoop.dfs.replication = 1   

Hopefully something like should work ...

No comments:

Post a Comment