Detecting signals using statistics

Question

I have sampled a section of spectrum every few Hz. I have mean, standard deviation, and range data for every frequency sampled (and I can gather more statistics if necessary). I've sampled for several seconds before storing the spectrum statistics.

I would like to be able to guess where man-made signals (i.e. channels) exist in this spectrum (as opposed to noise floor). This is proving to be a difficult task, especially since the noise-floor power sometimes changes (different antenna inputs for different bands, for example.)

I doubt anyone has a ready solution for this, but what are some starting points, or tips, to help me begin to identify channels?

In the below image 'normal' is really autocorrelation from Pandas

Rodney Price · Answer 1 · 2017-07-21T21:12:43.207

There is a range of possibilities, depending on what kind of signal you want to find. I'll start from easy and move up to hard. I'm assuming that you are using an FFT to get your spectra.

RFI. An earlier poster referenced some papers on finding RFI. I don't know precisely how that is defined, but lets assume that it is unintentional RF from things like switching power supplies (line spectra) and poorly wired auto engines (impulsive noise). Impulsive noise is narrow in time and wide in frequency, which suggests that you simply have a look at your receiver output. There is probably some statistical test you could apply (say, the null hypothesis is white Gaussian noise). For line spectra, do the same in the frequency domain. Perhaps those who know more about RFI than I do would say otherwise. I would suggest, however, that rather than process your signal as you're doing, that you use the initial processing that I discuss next.
Conventional analog signals. By this I mean the AM, FM, SSB, etc. transmissions that you run across all the time. These live in an intermediate region between impulses and line spectra, which makes things a bit harder. I would suggest that you compute a power spectrum, and from that, the autocorrelation. This is easy if you are already using an FFT: compute the FFT, then multiply each element by its complex conjugate. Now you have the power spectrum. Next do the inverse FFT to get the autocorrelation. If there's a signal there, you'll see a smeared out peak near $\tau=0$, where $\tau$ is the delay. If it's relatively wideband, the peak will be narrower; if it's narrowband (compared to your FFT) the peak will be wider. Look for peaks in both the power spectrum and the autocorrelation. This works better than statistics-gathering the way you describe because it takes advantage of the coherence of the signal. If you use a large FFT, you'll be averaging over a lot of signal, which will suppress the noise and better show you what's there. If it's just noise, you'll see a very narrow peak right at $\tau=0$ and a flat power spectrum.
Conventional digital signals. I mean conventional from a ham radio perspective; modes like PSK31, etc. An autocorrelation/power spectrum approach can work well here, too. The same physical principles apply: noise averages incoherently, while signal averages coherently. There is a caveat, however. As signals get more bandwidth-efficient, they start to look more like white noise. An AFSK signal like those often used in packet radio, for example, is terribly (forgive me, packet radio enthusiasts) inefficient. It takes a lot of bandwidth to get a few bits through. PSK31 is much better. Some of the more modern modes, such as WSPR and relatives, are really good. You'll see a flat spectrum across the entire signal bandwidth. They're not trying to be sneaky. That's just the physics of the situation.
Wideband digital signals. This is a broad category, but includes spread-spectrum signals as well as very highly optimized signals like the ones your cell phone uses. There has been a remarkable amount of R&D put into these signals to make them efficient, and because they have to push a lot of bits across, they are wideband, and so the signal-to-noise ratio in any given frequency bin a few Hz across is going to be poor. They're just plain hard to see, and there are a surprising number of them out there. This is where cyclostationary processing, a form of higher-order statistics, comes into its own. Cyclostationary processing is used to extract the bit rate from an unknown signal. Even if you can't see the signal through second-order statistics (like the autocorrelation) the bit rate of the signal will often pop right out of a cyclostationary approach. The math can be intimidating, but if you're really interested, stick with it and you'll find that it's really not that hard.

Here are a few examples to illustrate the autocorrelation/power spectrum approach. In the images, I plot the signal first, then the power spectrum, then the autocorrelation. These are simulated signals with sampling interval $\Delta t = 1$.

White Gaussian noise

Flat power spectrum, autocorrelation has a peak at zero lag, flat everywhere else.

Barker-coded pulse

Radars use these to reduce their peak power requirements while maintaining range resolution. $\sin^2 \omega/\omega^2$ structure due to the pulse shape. Autocorrelation shows a narrow peak at zero lag. In the wild, these pulses are much shorter than I show here, leading to flatter, wider spectra. They can be difficult to distinguish from noise unless you have some independent way to calculate the SNR in the channel (which is difficult if you don't actually know that there is a signal in the channel). Best to just look directly in the time domain.

FM with sinusoidal modulation

Easy to see in both the power spectrum and autocorrelation.

Sum of all three

Here I've set the noise $\sigma = 1/4$, the peak amplitude of the pulse to $2$, and the peak amplitude of the FM signal to $1/4$. The fourth plot is a close-up of the autocorrelation near zero lag.

Raul O. · Answer 2 · 2017-07-20T13:55:45.210

Well, this is a well-studied field (Radio Frequency Interference detection and mitigation). There are tons of literature about it.

The noise you receive is theoretically Additive White Gaussian Noise (AWGN). That means that its Probability Density Function (PDF) is Gaussian, and the pdf of its samples' power is exponential. By setting a false alarm probability according to the power of the signal, you can easily detect samples that potentially are not noise. Check the point 2.2 of this paper, for example. This threshold method is applied in the frequency domain in Frequency Blanking and Spectrogram Blanking techniques.

Normality Test are used as well to evaluate if a signal is AWGN. I have read a lot of mentions to Anderson-Darling test, but I never worked with it.

You can use Kurtosis to see if a signal is Gaussian or not (but be careful because Kurtosis is weak against certain sinusoidal signals). If it is not Gaussian, it is likely to be man-made (however, some frequency Jammers could emit Gaussian signals). Kurtosis has been also used to detect RFI in the frequency domain, not only in the time domain (look for spectral kurtosis).

Here you have an other interesting (and open!) paper about RFI detection.

I am not sure if there are AWGN-like signals made by humans different that noise for jamming. The spectrum of signals encrypted with codes (such as GPS signals, which use Pseudo-Random-Noise codes) are sincs, and they are only flat in a small fraction of its bandwidth. Even if you are sampling this small fraction, I am not sure if the signal in time will seem Gaussian.

score 1 · Answer 3 · answered Nov 17 '24 at 01:52

Are you trying to detect anything that is not a background noise, or are you trying to detect certain man-made signals which you have some knowledge? The approach will be different. If the former, your test will be for the presence of background noise only. In the latter, your test will be for presence of one of however many types of signals you seek.

If the former, the test is simple. You gather enough background noise for various frequency bands and times of day, perhaps for various geomagnetic and ionospheric conditions, and make models for each condition. Set appropriate threshold. The test will be fairly weak.

If you can make certain signal models for the signals of interest, for example, the approximate bandwidth, modulation envelope spectrum, higher-order moments, etc. factor those in the model. You also need the background noise model. Then you can set up a generalized likelihood ratio test and set appropriate thresholds. If you are more interested in classifier rather than detection, you can use algorithms appropriate for classification. (GLRT is ok for classification but not the best.)

As you might imagine, such a detector's performance depends on the signal models (in this case, the model takes the form of joint/conditional probability density functions). You want to gather as much info about the noise as well as the target signals to build into the models. The model dimension must be appropriately selected to prevent sparsely specified or overfitting models.

score 1 · Accepted Answer · answered Sep 22 '17 at 16:16

You might try taking a image capture of the spectrum waterfall for some duration, and feeding that image to a machine learning inference engine, perhaps a DNN.

The inference engine could be trained on a large image database with lots of waterfalls of lots of known or suspected signal types, similar to these signal ID databases:

https://www.sigidwiki.com/wiki/Signal_Identification_Guide

http://qrznow.com/signal-identification-guide/

Detecting signals using statistics

4 Answers4

White Gaussian noise

Barker-coded pulse

FM with sinusoidal modulation

Sum of all three

Linked

Related