Lecture 14 - Stat 153 / 248

Here we will include some more demos of the periodogram and methods for averaging that can improve the behavior of the periodogram.

import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import periodogram, welch
import math

plt.rcParams['figure.figsize'] = (12, 4)
plt.rcParams['font.size'] = 14
plt.rcParams['mathtext.fontset'] = 'cm'

The periodogram stays noisy no matter how much data you have¶

White noise has a flat spectral density: $f(\omega) = \sigma^2$ for all $\omega$ . With the default scaling in scipy.signal.periodogram (which uses scaling='density'), the theoretical level is $2\sigma^2$ (because the two-sided spectrum is folded onto $[0, 1/2]$ ).

Let’s simulate 20 white noise series at each of three sample sizes and overlay their periodograms.

# Simulate many periodograms from white noise and overlay them
np.random.seed(42)
sigma2 = 1.0
fig, axes = plt.subplots(1, 3, figsize=(12, 3))
for idx, n in enumerate([64, 256, 1024]):
    for rep in range(20):
        wn = np.random.normal(0, np.sqrt(sigma2), n)
        freqs, power = periodogram(wn, fs=1)
        axes[idx].plot(freqs, power, alpha=0.2, color='steelblue')
    
    axes[idx].axhline(2 * sigma2, color='r', linewidth=2, label='$2\sigma^2$')
    axes[idx].set_title(f'n = {n}')
    axes[idx].set_xlabel('Frequency')
    axes[idx].set_ylim([0, 15])
    if idx == 0:
        axes[idx].set_ylabel('Power')
    axes[idx].legend(fontsize=8)
plt.suptitle('Periodogram variability does NOT decrease with n', y=1.02)
plt.tight_layout()

Each blue line is one periodogram. The red line is the true spectral density.

Notice that as $n$ grows from 64 to 1024, the periodogram gets denser (more Fourier frequencies), but the scatter around the red line doesn’t shrink at all. Every individual periodogram ordinate has roughly the same wild variability regardless of $n$ .

Averaging the periodogram¶

If we could observe many independent copies of the same process, we could average their periodograms. Since the periodogram is unbiased ( $E[P(\omega)] \approx f(\omega)$ ), the average converges to $f(\omega)$ by the law of large numbers.

Of course, in practice we usually only have one time series — but this exercise shows the principle.

np.random.seed(153)
n = 256
num_reps = [1, 5, 20, 100]

fig, axes = plt.subplots(1, 4, figsize=(16, 3.5), sharey=True)

for ax, K in zip(axes, num_reps):
    # Average K periodograms
    avg_power = None
    for rep in range(K):
        # Create a white noise signal with mean 0 and variance sigma^2
        # for n time points
        wn = np.random.normal(0, np.sqrt(sigma2), n)
        
        # Calculate the periodogram
        freqs, power = periodogram(wn, fs=1)

        # increment avg_power (later we'll divide by K)
        if avg_power is None:
            avg_power = power.copy()
        else:
            avg_power += power
        # Also plot individuals faintly for small K
        if K <= 5:
            ax.plot(freqs, power, alpha=0.15, color='steelblue', lw=0.5)
    avg_power /= K
    
    ax.plot(freqs, avg_power, lw=1.5, label=f'Avg of {K}')
    ax.axhline(2 * sigma2, color='r', lw=2, label='$2\sigma^2$')
    ax.set_title(f'K = {K} realization{"s" if K > 1 else ""}', fontsize=11)
    ax.set_xlabel('Frequency')
    ax.set_ylim([0, 8])

axes[0].set_ylabel('Power')
fig.suptitle('Averaging periodograms across independent realizations (n = 256 each)',
             y=1.02, fontsize=13)
plt.tight_layout()

With $K = 1$ our estimates are very noisy. By $K = 100$ , the average is nearly flat at $2\sigma^2$ . By law of large numbers, we see this convergence when averaging independent, unbiased estimates.

But we usually only have one time series. So how do we get this averaging effect?

Averaging across neighboring frequencies (Daniell smoother)¶

Another way that we can smooth the periodogram is by averaging across neighboring frequencies, rather than realizations of a time series (which we often don’t have). For a smooth spectral density, neighboring periodogram ordinates are approximately independent and have approximately the same expectation. So averaging $2L+1$ neighbors is like averaging $2L+1$ independent estimates:

\hat{f}(\omega) = \frac{1}{2L+1}\sum_{k=-L}^{L} P\!\left(\omega + \frac{k}{n}\right)

(1)

This reduces variance by a factor of roughly $1/(2L+1)$ , at the cost of some bias (blurring in frequency).

def daniell_smooth(power, L):
    """Smooth a periodogram by averaging 2L+1 neighbors."""
    kernel = np.ones(2 * L + 1) / (2 * L + 1)
    return np.convolve(power, kernel, mode='same')

np.random.seed(42)
n = 512
wn = np.random.normal(0, np.sqrt(sigma2), n)
freqs, power = periodogram(wn, fs=1)

L_values = [0, 3, 10, 30]

fig, axes = plt.subplots(1, 4, figsize=(16, 3.5), sharey=True)

for ax, L in zip(axes, L_values):
    if L == 0:
        smoothed = power
        label = 'Raw periodogram'
    else:
        smoothed = daniell_smooth(power, L)
        label = f'Smoothed (2L+1 = {2*L+1})'
    
    ax.plot(freqs, smoothed, color='steelblue', lw=0.8, label=label)
    ax.axhline(2 * sigma2, color='r', lw=2, label='$2\sigma^2$')
    ax.set_title(f'L = {L}  (window = {2*L+1})', fontsize=11)
    ax.set_xlabel('Frequency')
    ax.set_ylim([0, 8])
    ax.legend(fontsize=8)

axes[0].set_ylabel('Power')
fig.suptitle('Daniell smoother: averaging neighboring frequencies from ONE time series',
             y=1.02, fontsize=13)
plt.tight_layout()

This is the same idea as before, but now we’re averaging across frequencies instead of across realizations or repetitions of our time series. With $L = 30$ (averaging 61 neighbors), the estimate is nearly flat.

For white noise, there’s no bias from smoothing since the true spectrum is already constant. For a process with a peaked spectrum, smoothing would flatten the peak, so this can introduce bias in our estimates.

Comparing averaging methods¶

Let’s put them side by side with the same effective number of averages, so you can see that they give similar results.

np.random.seed(99)
n = 512
K = 21  # number of realizations to average
L = int((K-1)/2)  # Daniell half-width, so 2L+1 = K neighbors

fig, axes = plt.subplots(1, 3, figsize=(14, 4))

# (a) Single raw periodogram
wn_single = np.random.normal(0, np.sqrt(sigma2), n)
freqs, power_single = periodogram(wn_single, fs=1)
axes[0].plot(freqs, power_single, color='steelblue', lw=0.6, alpha=0.8)
axes[0].axhline(2 * sigma2, color='r', lw=2)
axes[0].set_title('(a) One raw periodogram\n(1 estimate per frequency)')
axes[0].set_ylim([0, 10])
axes[0].set_xlabel('Frequency')
axes[0].set_ylabel('Power')

# (b) Average of K=21 periodograms from independent realizations
avg_power = np.zeros_like(power_single)
for rep in range(K):
    wn = np.random.normal(0, np.sqrt(sigma2), n)
    _, pw = periodogram(wn, fs=1)
    avg_power += pw
avg_power /= K
axes[1].plot(freqs, avg_power, color='steelblue', lw=1.2)
axes[1].axhline(2 * sigma2, color='r', lw=2)
axes[1].set_title(f'(b) Average of {K} independent\nperiodograms')
axes[1].set_ylim([0, 10])
axes[1].set_xlabel('Frequency')

# (c) Daniell-smoothed periodogram from single realization
smoothed = daniell_smooth(power_single, L)
axes[2].plot(freqs, smoothed, color='steelblue', lw=1.2)
axes[2].axhline(2 * sigma2, color='r', lw=2)
axes[2].set_title(f'(c) Daniell smooth (2L+1 = {2*L+1})\nfrom one realization')
axes[2].set_ylim([0, 10])
axes[2].set_xlabel('Frequency')

plt.tight_layout()

Notice that panels (b) and (c) look similar in terms of the variance reduction through averaging, though the Daniell smoother also introduces more spatially correlated structure (smoothness across frequencies).

Welch’s method: a third way to average¶

In practice, many fields instead use another method to calculate the periodogram - Welch’s Method. It chops one long time series into overlapping segments, computes a periodogram for each segment, and averages them. It’s like creating pseudo-independent “realizations” from a single dataset.

This is conceptually the same as (b) above, except the segments aren’t truly independent (they overlap and come from the same series). But it works well in practice.

welch?

By default, Welch’s method uses a Hann window, which is defined as

$w(n)=0.5-0.5\cos\left(\frac{2\pi n}{M-1}\right) \quad 0 \leq n \leq M-1$

This window is applied to each time-domain segment before computing its periodogram, in the following sequence:

Chop the signal into overlapping segments of nperseg
Multiply each segment by the Hann window (in the time domain) - tapers the segment smoothly at the edges and results in a better behaved Fourier transform
Compute the periodogram of the windowed segment
Average the periodograms

import scipy.signal
window = scipy.signal.windows.hann(51)
plt.figure(figsize=(5,3))
plt.plot(window)
plt.title("Hann window")
plt.ylabel("Amplitude")
plt.xlabel("Sample")

np.random.seed(42)
n = 2048
wn = np.random.normal(0, np.sqrt(sigma2), n)

fig, axes = plt.subplots(1, 4, figsize=(16, 3.5), sharey=True)

# Raw periodogram
freqs_raw, power_raw = periodogram(wn, fs=1)
axes[0].plot(freqs_raw, power_raw, color='steelblue', lw=0.3, alpha=0.7)
axes[0].axhline(2 * sigma2, color='r', lw=2)
axes[0].set_title(f'Raw periodogram\n(1 segment of {n})', fontsize=10)
axes[0].set_ylabel('Power')
axes[0].set_xlabel('Frequency')
axes[0].set_ylim([0, 12])

# Welch with decreasing segment sizes
seg_sizes = [512, 256, 128]
for ax, nperseg in zip(axes[1:], seg_sizes):
    noverlap = nperseg // 2
    n_seg = (n - noverlap) // (nperseg - noverlap)

    # Calculate welch periodogram, hann window is the default
    freqs_w, power_w = welch(wn, fs=1, nperseg=nperseg, noverlap=noverlap, 
                             window='hann')
    
    ax.plot(freqs_w, power_w, lw=1.2)
    ax.axhline(2 * sigma2, color='r', lw=2)
    ax.set_title(f'Welch (seg = {nperseg})\n(≈ {n_seg} segments averaged)', fontsize=10)
    ax.set_xlabel('Frequency')
    ax.set_ylim([0, 12])

fig.suptitle(f'Welch\'s method: splitting n = {n} into overlapping segments',
             y=1.03, fontsize=13)
plt.tight_layout()

Comparing methods on data with frequency peaks¶

Let’s see an example of how these estimates compare for a signal with three sinusoidal components. This is very similar to the sinusoid we discussed in the Lecture 12 notebook.

fs = 1000  # sampling rate
t = np.arange(0, 1, 1/fs)

# Three sinusoidal components
f = [6, 10.5, 39.2]  # Frequencies of the sinusoids
x1 = 12*np.cos(2*math.pi*f[0]*t) + 13*np.sin(2*math.pi*f[0]*t)
x2 =  4*np.cos(2*math.pi*f[1]*t) +  5*np.sin(2*math.pi*f[1]*t)
x3 =  6*np.cos(2*math.pi*f[2]*t) +  7*np.sin(2*math.pi*f[2]*t)

# Clean signal
x_clean = x1 + x2 + x3

# Signal + noise
np.random.seed(42)
noise_std = 20.0
x_noisy = x_clean + noise_std * np.random.randn(len(t))

# Theoretical power for each component
for i, (freq, a, b) in enumerate(zip(f, [12,4,6], [13,5,7])):
    print(f'Component {i+1}: f = {freq} Hz, power = a² + b² = {a**2+b**2}')

fig, axes = plt.subplots(1, 2, figsize=(14, 3.5))

axes[0].plot(t, x_clean, lw=0.8, color='C0')
axes[0].set_title('Clean signal (no noise)')
axes[0].set_xlabel('Time (s)')
axes[0].set_ylabel('Amplitude')

axes[1].plot(t, x_noisy, lw=0.8, color='C0')
axes[1].set_title(f'Noisy signal (σ_noise = {noise_std})')
axes[1].set_xlabel('Time (s)')

plt.tight_layout()

Raw periodogram¶

First, let’s look at the raw periodogram for this signal. We already saw something like this before. Notice that the peaks don’t always line up with our Fourier frequencies, so there is spectral leakage into the nearby frequencies (for example, the 10.5 Hz peak is distributed across 10 Hz and 11 Hz).

freqs_c, power_c = periodogram(x_clean, fs=fs)
freqs_n, power_n = periodogram(x_noisy, fs=fs)

fig, axes = plt.subplots(1, 2, figsize=(14, 4), sharey=True)

for ax, power, title in zip(axes, [power_c, power_n],
                            ['Clean signal', 'Noisy signal']):
    ax.stem(freqs_c, power)#, lw=0.8, color='C0')
    for freq in f:
        ax.axvline(freq, ls='--', color='red', alpha=0.5, lw=1)
    ax.set_xlabel('Frequency (Hz)')
    ax.set_title(f'Periodogram — {title}')
    ax.set_xlim(0, 50)

axes[0].set_ylabel('Power')

# Add frequency labels
for ax in axes:
    for freq in f:
        ax.text(freq + 0.5, ax.get_ylim()[1] * 0.3, f'{freq} Hz',
                fontsize=8, color='red')

plt.tight_layout()

Smoothing¶

Now let’s use the noisy data (which is a more realistic case) and the smoothing methods to see what we recover.

fig, axes = plt.subplots(2, 3, figsize=(15, 11))

# Column labels
col_labels = ['Light smoothing', 'Medium smoothing', 'Heavy smoothing']

# ── Row 2: Daniell smoother ──
daniell_Ls = [1, 2, 15]
for col, L in enumerate(daniell_Ls):
    smoothed = daniell_smooth(power_n, L)
    axes[0, col].semilogy(freqs_n, smoothed, lw=1.2, color='C1',
                          label=f'Daniell (2L+1 = {2*L+1})')
    axes[0, col].semilogy(freqs_n, power_n, lw=0.3, color='C0')
    for freq in f:
        axes[1, col].axvline(freq, ls='--', color='red', alpha=0.4, lw=1)
    axes[0, col].set_xlim(0, 50)
    axes[0, col].set_ylim(1e-2, 1e4)
    axes[0, col].legend(fontsize=9, loc='upper right')
    if col == 0:
        axes[1, col].set_ylabel('Daniell smoother\n\nPower (log)')

# ── Row 3: Welch ──
welch_segs = [800, 500, 150]
for col, nperseg in enumerate(welch_segs):
    freqs_w, power_w = welch(x_noisy, fs=fs, nperseg=nperseg, noverlap=nperseg//2)
    n_segs = int(2 * len(x_noisy) / nperseg - 1)
    axes[1, col].semilogy(freqs_w, power_w, lw=1.2, color='C2',
                          label=f'Welch (seg={nperseg}, ~{n_segs} avg)')
    axes[1, col].semilogy(freqs_n, power_n, lw=0.3, color='C0')
    for freq in f:
        axes[1, col].axvline(freq, ls='--', color='red', alpha=0.4, lw=1)
    axes[1, col].set_xlim(0, 50)
    axes[1, col].set_ylim(1e-2, 1e4)
    axes[1, col].set_xlabel('Frequency (Hz)')
    axes[1, col].legend(fontsize=9, loc='upper right')
    if col == 0:
        axes[1, col].set_ylabel('Welch\'s method\n\nPower (log)')

fig.suptitle('Three estimators × three smoothing levels — red dashed lines mark true frequencies',
             fontsize=13, y=1.01)
plt.tight_layout()

How to choose the number of segments?¶

Welch can only resolve two peaks that are at least $\Delta f = f_s / N_{seg}$ apart, where $N_{seg}$ is the segment length in samples. If you need resolved peaks separated by $\delta f$ Hz, you need segments of at least:

N_{seg}\geq \frac{f_s}{\Delta f}

(2)

So for the 6 and the 10.5 Hz peak, these are 4.5 Hz apart, and with a sampling rate of $f_s=1000$ we’d need:

N_{seg}\geq \frac{1000}{\Delta 4.5} = 222

(3)

However, this in practice is the absolute floor, so we’d probably want segments at least 2-3 times longer than that, so 400 to 600 samples may work better here.

For the Daniell smoother, if you want to resolve two peaks separated by $\Delta f$ , you need the smoothed bandwidth to be less than this:

\Delta f_{\text{smooth}} < \Delta f \implies L < \frac{N \cdot \Delta f}{2f_s} - \frac{1}{2}

(4)

For the 6 and 10.5 Hz peak, we’d have:

L < \frac{1000(4.5)}{2(1000)} - \frac{1}{2} = 2.25 - 0.5 = 1.75

(5)

So here $L=1$ would be fine and not obscure the peaks, but $L=2$ is already borderline for these frequencies.

When is the periodogram not enough?¶

Can you think of some signals where knowing the periodogram obscures information that you might care about?

Interactive Spectrogram