The answer can be fond in The ARRL Handbook 2019, Vol 3, although it's spreaded across different chapters. In short, SNR is typically calculated for the noise floor of 2500 Hz SSB signal. Particularly this is how WSJT-X calculates negative SNRs for FT8 and other modes.
Now the trick is that by deviding the bandwidth in half you decrease the noise floor by 3 dB assuming the noise is distributed evenly. When I use 500 Hz wide DSP filter in my Yaesu FT-891 I typically see S2-S3 noise floor on 20m, and S4 when 2500 Hz bandwidth is used. Now keeping in mind that one S-unit is approximately 6 dB we get:
>>> from math import log2
>>> 3*log2(3000/500)/6
1.292481250360578
And we really see 1-2 less S-units, as expected.
Now back to CW. The ARRL Handbook claims with the reference to "The ITU Classification of Emission Standards" that the bandwidth of CW signal can be estimated as:
BW = WPM * 0.8 * 5
For instance, a 17WPM signal has a bandwidth about 68Hz. This means that the signal can be received:
>>> 3*log2(2500/68)
15.600748614897329
... 15.6 dB "under the noise floor".
Please note,
1) This assumes that there is a 68 Hz bandpass filter on the receiver side. This in fact is a reasonable assumption in the era of DSP filters. For instance, FT-891 has a very narrow-band audio peak filter (APF).
2) When you compare the decoding threshold of different modes (e.g. FT8 and CW) there are many other factors to consider. In case of FT8: 1) it's not a "chat mode" as CW 2) FT8 frequencies are often crowded (read - QRM) thus the decoding threshold doesn't matter that match 3) sadly many people who use digital modes don't check the IMD of their sound card and transceiver combo which gives even more QRM, etc.