Afraid I can't paste a code example as my dataset is sensitive.
After some issues with our source files we realised that our source file is inconsistent with allele coding and need to alter it, the first step in that is dropping the redundant column value (sometimes it's REF, sometimes ALT1), the third value, A1 is always used, all three are characters, and POSITION is a string.
Given the number of rows involved I've tried to set up a loop as follows:
- Go to next row
- Concatenate new identifier using
A1and whichever ofREFandALT1does not equalA1
Looks simple enought in theory, but just won't behave; on inspection it appears to correctly catch the first instance of the first line but not the others.
Is there a glaring mistake I've made somewhere? Thanks.
# NOTE: reversed in order to match mapping file formatting (equiv. to REF_ALT)
for (i in 1:nrow(Chr1_results.dt)){
if(Chr1_results.dt[i,]$A1 != Chr1_results.dt[i,]$ALT1){
Chr1_results.dt[i,]$POSITION <- paste(Chr1_results.dt[i,]$ID, Chr1_results.dt[i,]$A1, Chr1_results.dt[i,]$ALT1, sep = "_")
} else{
Chr1_results.dt[i,]$POSITION <- paste(Chr1_results.dt[i,]$ID, Chr1_results.dt[i,]$A1, Chr1_results.dt[i,]$REF, sep = "_")
}
}