I am trying to represent the sequence of a biological virus as ATGCs, but I have seen code where it is represented as 1234s instead. Are there any differences in memory usage or code speed if we store it as the integers [1,2,3,4] instead of the letters [A,T,G,C]?
For those who might need a bit more context, I will not be doing any mathematical operations on the string of numbers/letters apart from changing their identities at random positions (i.e. mutation), keeping track of the positions that are mutated away from a reference sequence in a dictionary (such as: {2:'G', 52:'A'} or {2:3, 52:1}), and exporting the full sequence of any biological virus strain by iterating over the reference sequence and checking the mutation dictionary for any mutations.