Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Nonlinear regression

Recall last time we spoke about Multiple Linear Regression:

yi=β0+β1xi1+β2xi2++βnxip+wiy_i = \beta_0 + \beta_1 x_{i_1} + \beta_2 x_{i_2} + \cdots + \beta_n x_{i_p} + w_i

In this case, we assumed a linear relationship between yy and our β\beta estimates. However, in many real datasets and models, certain parameters are related to our output in a nonlinear fashion. One example is seen in the sunspots dataset, in which the time series data show strong periodicity.

Smoothed 12-month sunspot numbers sampled twice per year

When might this type of problem come up?

For example, we might have:

yt=β0+β1cos(2πft)+β2sin(2πft)+ϵtwhere ϵti.i.dN(0,σ2)y_t = \beta_0 + \beta_1 \cos (2\pi ft) + \beta_2 \sin (2\pi ft ) + \epsilon_t \quad \text{where } \epsilon_t \overset{i.i.d}\sim N(0, \sigma^2)

The parameters we will fit here include β0,β1,β2,σ\beta_0, \beta_1, \beta_2, \sigma and frequency parameter ff. If ff is already known, then (2) reduces to multiple linear regression: y=Xfβ+ϵy=X_f\beta + \epsilon with:

y=(y1yn),Xf=(1cos(2πf(1))sin(2πf(1))1cos(2πf(n))sin(2πf(n))),β=(β0β1β2)and ϵ=(ϵ1ϵn)y = \begin{pmatrix} y_1 \\ \vdots \\ y_n \end{pmatrix}, X_f = \begin{pmatrix} 1 & \cos(2\pi f(1)) & \sin(2\pi f(1)) \\ \vdots & \vdots & \vdots \\ 1 & \cos(2\pi f(n)) & \sin(2\pi f(n)) \\ \end{pmatrix}, \beta = \begin{pmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \end{pmatrix} \quad \text{and } \epsilon = \begin{pmatrix} \epsilon_1 \\ \vdots \\ \epsilon_n \end{pmatrix}

And we can proceed as before. If ff is not known, then this is a nonlinear regression model.

This particular type of model yt=β0+β1cos(2πft)+β2sin(2πft)+ϵty_t = \beta_0 + \beta_1 \cos (2\pi ft) + \beta_2 \sin (2\pi ft ) + \epsilon_t is called a sinusoid, which has some special properties (later we’ll revisit this when we talk about power spectral analysis).

About Sinusoids

A sinusoid can be given by the following function at time tt:

s(t):=β0+Rcos(2πft+ϕ)s(t) := \beta_0 + R \cos (2\pi f t + \phi)

Where:

You may have noticed here that this sinusoid doesn’t look exactly like the equation we started with earlier (2). However, we can use the trigonometric identity cos(a+b)=(cosa)(cosb)(sina)(sinb)\cos(a + b) = (\cos a)(\cos b) - (\sin a)(\sin b) to rewrite the equation as:

a=2πft,b=ϕ\begin{align*} a &= 2\pi ft, \quad b=\phi \end{align*}
s(t)=β0+R((cos2πftcosϕ)(sin2πftsinϕ))=β0+Rcosϕcos2πftRsinϕsin2πft\begin{aligned} s(t) &= \beta_0 + R ( (\cos 2\pi ft \cos \phi) - (\sin 2\pi ft \sin \phi))\\ &= \beta_0 + R \cos \phi \cos 2\pi ft - R \sin \phi \sin 2\pi ft \end{aligned}

We can now express β1=Rcosϕ\beta_1=R\cos\phi and β2=Rsinϕ\beta_2=-R\sin\phi, then we have the equation we discussed before:

s(t)=β0+β1cos2πft+β2sin2πfts(t) = \beta_0 + \beta_1 \cos 2\pi ft + \beta_2 \sin 2\pi ft

We can also always rederive RR or ϕ\phi if we want them:

R=β12+β22,ϕ=arctan(β2β1)R=\sqrt{\beta_1^2+ \beta_2^2}, \quad \phi = \arctan\left(-\frac{\beta_2}{\beta_1}\right)

A note on dealing with sampling

The sinusoidal model is written above in continuous time, but usually we are collecting data in discrete time samples. If our sampling rate is fsf_s (in samples per second, or Hz), then our actual observations are occuring at times tn=nfst_n = \frac{n}{f_s} for n=0,1,n=0, 1, \dots, so we are actually fitting:

yn=β0+Rcos(2πfnfs+ϕ)+ϵny_n = \beta_0 + R \cos \left( \frac{2\pi f n}{f_s} + \phi \right) + \epsilon_n

This sampling rate fsf_s fundamentally constrains what frequencies you can estimate. This has a special name:

The Nyquist limit

You can only reliably estimate frequencies up to half of the sampling rate:

fmax=fs2f_\text{max} = \frac{f_s}{2}

This fmaxf_\text{max} is also called the Nyquist Frequency. If the true signal contains a component at frequency ff and your sampling rate is too low (f>fs/2f > f_s/2), then that component doesn’t vanish -- rather, induces aliasing, which is where you will get an estimate of ff that is fnfs| f - n f_s| for any integer nn. We’ll come back to this when we talk about power spectral analysis, but for now, just remember that if you want to estimate a particular sinusoidal component with frequency ff, that frequency must be no more than one half of the sampling rate.

Also, this becomes important when we are trying to estimate ff via least squares, because if framed in this way we only have to test funif[0,1/2]f \sim \text{unif}[0, 1/2]

See a demo of how the Nyquist Limit works

Least squares estimation of β,f,σ\beta, f, \sigma

To estimate the parameters of this regression, we will use the same basic estimation as in prior lectures (least squares):

S(β0,β1,β2,f,σ):=t=1n(ytβ0β1cos2πftβ2sin2πft)2S(\beta_0, \beta_1, \beta_2, f, \sigma) := \sum_{t=1}^n (y_t - \beta_0 - \beta_1 \cos 2\pi ft - \beta_2 \sin 2\pi ft)^2

We have to minimize over all five variables β0,β1,β2,f,σ\beta_0, \beta_1, \beta_2, f, \sigma. In practice, we will first start with estimating ff by taking a bunch of possible values of ff, then calculating the goodness of fit RSS(f)RSS(f) of that resulting linear regression model with fixed ff. Then, with ff fixed at f^\hat{f} we will calculate the remaining parameters.

  1. Take a grid of possible values of ff in the range [0,1/2][0, 1/2]

  2. For each frequency value ff in the grid:

    • Create a matrix XfX_f

    • Perform regression of yy on XfX_f and compute the residual sum of squares RSS(f)RSS(f)

  3. Take f^\hat{f} to be the grid value that minimizes RSS(f)RSS(f) over all values on the grid.

  4. Take β^\hat{\beta} and σ^\hat{\sigma} using the usual regression estimates of β\beta and σ\sigma from typical linear regression of yy on Xf^X_{\hat{f}}.