ems, contrary to popular beliefs (or at least from most thread about ems here), is not based from the width of a single 'M'.
It was originally like that in typography, but in digital medium, including Android, its meaning was shifted to the size of the typeface used, or in other word, its height (excluding any padding for accents/diacritics).
So that means when you specify the ems for a TextView, it will used its textSize as a base and multiply it by the ems specified.
As a sample, if you set a 16sp TextView's ems to 4, its width will be 64sp wide. You can easily test it by using two TextView (with includeFontPadding set to false) side-by-side inside a ConstraintLayout (to leverage its layout_constraintDimensionRatio).