Reading: Chapter 2 (somewhat), Chapter 4.1 – Shumway and Stoffer
Nonlinear regression¶
Recall last time we spoke about Multiple Linear Regression:
In this case, we assumed a linear relationship between and our estimates. However, in many real datasets and models, certain parameters are related to our output in a nonlinear fashion. One example is seen in the sunspots dataset, in which the time series data show strong periodicity.

When might this type of problem come up?¶
Climate data (El Nino/SOI)
Astronomy
Variable stars - brightness fluctations that can be used to estimate the period (), which relates to the intrinsic luminosity of a star. This can then be used to estimate the distance to the star. Edwin Hubble in the 1920s measured the distance to the Andromeda galaxy and showed they were too far to be inside the Milky Way, which suggested we are in just one galaxy among many.
Signal processing - speech and audio
Estimating the pitch and formants of a person’s voice is part of ASR (automatic speech recognition)
Economics - but generally less stable
For example, we might have:
The parameters we will fit here include and frequency parameter . If is already known, then (2) reduces to multiple linear regression: with:
And we can proceed as before. If is not known, then this is a nonlinear regression model.
This particular type of model is called a sinusoid, which has some special properties (later we’ll revisit this when we talk about power spectral analysis).
About Sinusoids¶
A sinusoid can be given by the following function at time :
Where:
is the amplitude of the sinusoid, which represents the height of the oscillation from the center line.
is the frequency of the sinusoid, which represents how many cycles (oscillations) are observed per unit time. If time is measured in seconds, then is given in units of Hertz (Hz).
is the period of the oscillation, which is the length of time it takes to complete one full oscillation (one up peak and down peak)
is the phase of the oscillation. If , the sinusoid is at its maximum value at . The value allows us to account for leftward or rightward shifts of the whole oscillation in time.
is called the angular frequency. Sometimes we also write this as . This measures the rate of change of the angle inside the cosine.
You may have noticed here that this sinusoid doesn’t look exactly like the equation we started with earlier (2). However, we can use the trigonometric identity to rewrite the equation as:
We can now express and , then we have the equation we discussed before:
We can also always rederive or if we want them:
A note on dealing with sampling¶
The sinusoidal model is written above in continuous time, but usually we are collecting data in discrete time samples. If our sampling rate is (in samples per second, or Hz), then our actual observations are occuring at times for , so we are actually fitting:
This sampling rate fundamentally constrains what frequencies you can estimate. This has a special name:
The Nyquist limit¶
You can only reliably estimate frequencies up to half of the sampling rate:
This is also called the Nyquist Frequency. If the true signal contains a component at frequency and your sampling rate is too low (), then that component doesn’t vanish -- rather, induces aliasing, which is where you will get an estimate of that is for any integer . We’ll come back to this when we talk about power spectral analysis, but for now, just remember that if you want to estimate a particular sinusoidal component with frequency , that frequency must be no more than one half of the sampling rate.
Also, this becomes important when we are trying to estimate via least squares, because if framed in this way we only have to test
See a demo of how the Nyquist Limit works
Least squares estimation of ¶
To estimate the parameters of this regression, we will use the same basic estimation as in prior lectures (least squares):
We have to minimize over all five variables . In practice, we will first start with estimating by taking a bunch of possible values of , then calculating the goodness of fit of that resulting linear regression model with fixed . Then, with fixed at we will calculate the remaining parameters.
Take a grid of possible values of in the range
For each frequency value in the grid:
Create a matrix
Perform regression of on and compute the residual sum of squares
Take to be the grid value that minimizes over all values on the grid.
Take and using the usual regression estimates of and from typical linear regression of on .