I have a piece of Java code using Apache Spark to join two dataframes with a conditional that relies on a VM argument -DearlyData=TRUE for an inner join, and -DearlyData=FALSE for a leftanti join depending on whether the VM argument is set to TRUE or FALSE (Technically, if it is set to TRUE or any other value.)
This is a simplified version of my code:
``
String earlyData = System.getProperty(Constants.EARLY_DATA);
if(earlyData.equalsIgnoreCase("TRUE")){
log.trace("Running Early Data");
DataBo.processData(earlyDF.join(cassandraDF,
earlyDF.col(AA).equalTo(example.col(BB))
.and(earlyDF.col(CC).equalTo(example.col(DD))),"inner")
drop(Constants.AA, Constants.CC));
}else{
log.trace("Running Late Data");
DataBo.processData(earlyDF.join(cassandraDF,
earlyDF.col(AA).equalTo(example.col(BB))
.and(earlyDF.col(CC).equalTo(example.col(DD))), "leftanti")
.drop(Constants.AA, Constants.CC));
``
My code works, but my question is this:
- Should I use an
Environment Variableor aVM Argumentfor the StringearlyData? - Are there drawbacks or unforeseen complications of using one versus the other in a
conditionallike this?