In the last lecture, we started discussing nonlinear regression models. As a motivating example, we considered the following dataset (from FRED) on Annual Estimates of the Resident Population of California (units are thousands of persons) from 1900 to 2024.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as smcapop = pd.read_csv('CAPOP_11Sept2025.csv')
print(capop.head(10))
print(capop.tail(10))
tme = np.arange(1900, 2025)
plt.plot(tme, capop['CAPOP'], label='California Population')
plt.xlabel('Year')
plt.ylabel('Population (in thousands)')
plt.title('California Annual Resident Population (in thousands)')
plt.show() observation_date CAPOP
0 1900-01-01 1490.0
1 1901-01-01 1550.0
2 1902-01-01 1623.0
3 1903-01-01 1702.0
4 1904-01-01 1792.0
5 1905-01-01 1893.0
6 1906-01-01 1976.0
7 1907-01-01 2054.0
8 1908-01-01 2161.0
9 1909-01-01 2282.0
observation_date CAPOP
115 2015-01-01 38904.296
116 2016-01-01 39149.186
117 2017-01-01 39337.785
118 2018-01-01 39437.463
119 2019-01-01 39437.610
120 2020-01-01 39521.958
121 2021-01-01 39142.565
122 2022-01-01 39142.414
123 2023-01-01 39198.693
124 2024-01-01 39431.263

We work with the logarithms of the population data as this will lead to models with better interpretability.
y = np.log(capop['CAPOP'])
n = len(y)
plt.plot(tme, y)
plt.xlabel('Year')
plt.ylabel('Log(Population in thousands)')
plt.title('Logarithm of California Annual Resident Population (in thousands)')
plt.show()
We shall fit the following model to this dataset:
We can write this as
where is constructed as below.
n = len(y)
x = np.arange(1, n+1)
Xfull = np.column_stack([np.ones(n), x-1])
for i in range(n-2):
c = i+2
xc = ((x > c).astype(float))*(x-c)
Xfull = np.column_stack([Xfull, xc])
print(Xfull)[[ 1. 0. -0. ... -0. -0. -0.]
[ 1. 1. 0. ... -0. -0. -0.]
[ 1. 2. 1. ... -0. -0. -0.]
...
[ 1. 122. 121. ... 1. 0. -0.]
[ 1. 123. 122. ... 2. 1. 0.]
[ 1. 124. 123. ... 3. 2. 1.]]
If we fit this model without any regularization, we get the following.
mdfull = sm.OLS(y, Xfull).fit()
print(mdfull.summary()) OLS Regression Results
==============================================================================
Dep. Variable: CAPOP R-squared: 1.000
Model: OLS Adj. R-squared: nan
Method: Least Squares F-statistic: nan
Date: Thu, 02 Oct 2025 Prob (F-statistic): nan
Time: 16:09:30 Log-Likelihood: 3409.6
No. Observations: 125 AIC: -6569.
Df Residuals: 0 BIC: -6216.
Df Model: 124
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 7.3065 inf 0 nan nan nan
x1 0.0395 inf 0 nan nan nan
x2 0.0065 inf 0 nan nan nan
x3 0.0015 inf 0 nan nan nan
x4 0.0040 inf 0 nan nan nan
x5 0.0033 inf 0 nan nan nan
x6 -0.0119 inf -0 nan nan nan
x7 -0.0042 inf -0 nan nan nan
x8 0.0121 inf 0 nan nan nan
x9 0.0037 inf 0 nan nan nan
x10 -0.0016 inf -0 nan nan nan
x11 -0.0011 inf -0 nan nan nan
x12 -0.0003 inf -0 nan nan nan
x13 0.0007 inf 0 nan nan nan
x14 -0.0094 inf -0 nan nan nan
x15 -0.0179 inf -0 nan nan nan
x16 -0.0042 inf -0 nan nan nan
x17 0.0113 inf 0 nan nan nan
x18 -0.0038 inf -0 nan nan nan
x19 -0.0050 inf -0 nan nan nan
x20 0.0391 inf 0 nan nan nan
x21 0.0032 inf 0 nan nan nan
x22 -0.0153 inf -0 nan nan nan
x23 0.0172 inf 0 nan nan nan
x24 -0.0060 inf -0 nan nan nan
x25 -0.0208 inf -0 nan nan nan
x26 0.0004 inf 0 nan nan nan
x27 0.0021 inf 0 nan nan nan
x28 -0.0057 inf -0 nan nan nan
x29 -0.0032 inf -0 nan nan nan
x30 -0.0024 inf -0 nan nan nan
x31 -0.0124 inf -0 nan nan nan
x32 -0.0076 inf -0 nan nan nan
x33 -0.0003 inf -0 nan nan nan
x34 0.0045 inf 0 nan nan nan
x35 0.0027 inf 0 nan nan nan
x36 0.0077 inf 0 nan nan nan
x37 0.0025 inf 0 nan nan nan
x38 -0.0096 inf -0 nan nan nan
x39 -0.0002 inf -0 nan nan nan
x40 0.0048 inf 0 nan nan nan
x41 0.0164 inf 0 nan nan nan
x42 0.0261 inf 0 nan nan nan
x43 0.0285 inf 0 nan nan nan
x44 -0.0447 inf -0 nan nan nan
x45 -0.0067 inf -0 nan nan nan
x46 -0.0209 inf -0 nan nan nan
x47 0.0054 inf 0 nan nan nan
x48 -0.0048 inf -0 nan nan nan
x49 0.0034 inf 0 nan nan nan
x50 0.0056 inf 0 nan nan nan
x51 0.0095 inf 0 nan nan nan
x52 0.0021 inf 0 nan nan nan
x53 0.0076 inf 0 nan nan nan
x54 -0.0120 inf -0 nan nan nan
x55 -0.0097 inf -0 nan nan nan
x56 0.0133 inf 0 nan nan nan
x57 -0.0038 inf -0 nan nan nan
x58 0.0029 inf 0 nan nan nan
x59 -0.0036 inf -0 nan nan nan
x60 -0.0130 inf -0 nan nan nan
x61 0.0130 inf 0 nan nan nan
x62 -0.0045 inf -0 nan nan nan
x63 5.425e-05 inf 0 nan nan nan
x64 -0.0073 inf -0 nan nan nan
x65 -0.0033 inf -0 nan nan nan
x66 -0.0090 inf -0 nan nan nan
x67 0.0021 inf 0 nan nan nan
x68 -0.0054 inf -0 nan nan nan
x69 0.0049 inf 0 nan nan nan
x70 -0.0031 inf -0 nan nan nan
x71 0.0055 inf 0 nan nan nan
x72 -0.0069 inf -0 nan nan nan
x73 0.0020 inf 0 nan nan nan
x74 0.0008 inf 0 nan nan nan
x75 0.0025 inf 0 nan nan nan
x76 0.0013 inf 0 nan nan nan
x77 0.0005 inf 0 nan nan nan
x78 0.0026 inf 0 nan nan nan
x79 -0.0031 inf -0 nan nan nan
x80 0.0049 inf 0 nan nan nan
x81 -0.0029 inf -0 nan nan nan
x82 0.0016 inf 0 nan nan nan
x83 -0.0002 inf -0 nan nan nan
x84 -0.0026 inf -0 nan nan nan
x85 0.0039 inf 0 nan nan nan
x86 0.0019 inf 0 nan nan nan
x87 -9.855e-05 inf -0 nan nan nan
x88 -0.0002 inf -0 nan nan nan
x89 0.0017 inf 0 nan nan nan
x90 -0.0014 inf -0 nan nan nan
x91 -0.0094 inf -0 nan nan nan
x92 -0.0003 inf -0 nan nan nan
x93 -0.0063 inf -0 nan nan nan
x94 -0.0033 inf -0 nan nan nan
x95 0.0002 inf 0 nan nan nan
x96 0.0035 inf 0 nan nan nan
x97 0.0046 inf 0 nan nan nan
x98 0.0007 inf 0 nan nan nan
x99 -0.0003 inf -0 nan nan nan
x100 0.0111 inf 0 nan nan nan
x101 -0.0108 inf -0 nan nan nan
x102 -0.0030 inf -0 nan nan nan
x103 -0.0004 inf -0 nan nan nan
x104 -0.0018 inf -0 nan nan nan
x105 -0.0020 inf -0 nan nan nan
x106 -0.0017 inf -0 nan nan nan
x107 0.0010 inf 0 nan nan nan
x108 0.0034 inf 0 nan nan nan
x109 -1.6e-05 inf -0 nan nan nan
x110 -5.495e-05 inf -0 nan nan nan
x111 -0.0012 inf -0 nan nan nan
x112 -0.0003 inf -0 nan nan nan
x113 -4.045e-05 inf -0 nan nan nan
x114 0.0005 inf 0 nan nan nan
x115 -0.0005 inf -0 nan nan nan
x116 -0.0019 inf -0 nan nan nan
x117 -0.0015 inf -0 nan nan nan
x118 -0.0023 inf -0 nan nan nan
x119 -0.0025 inf -0 nan nan nan
x120 0.0021 inf 0 nan nan nan
x121 -0.0118 inf -0 nan nan nan
x122 0.0096 inf 0 nan nan nan
x123 0.0014 inf 0 nan nan nan
x124 0.0045 inf 0 nan nan nan
==============================================================================
Omnibus: 2.109 Durbin-Watson: 0.015
Prob(Omnibus): 0.348 Jarque-Bera (JB): 1.923
Skew: -0.203 Prob(JB): 0.382
Kurtosis: 2.548 Cond. No. 1.78e+04
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.78e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
/Users/aditya/mambaforge/envs/stat153fall2025/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1795: RuntimeWarning: divide by zero encountered in divide
return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/Users/aditya/mambaforge/envs/stat153fall2025/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1795: RuntimeWarning: invalid value encountered in scalar multiply
return 1 - (np.divide(self.nobs - self.k_constant, self.df_resid)
/Users/aditya/mambaforge/envs/stat153fall2025/lib/python3.11/site-packages/statsmodels/regression/linear_model.py:1717: RuntimeWarning: divide by zero encountered in scalar divide
return np.dot(wresid, wresid) / self.df_resid
In this unregularized estimation, the estimate of equals , the estimate of is , and the estimate of is for . Let us check this.
print(y[0], mdfull.params.iloc[0])
print(y[1] - y[0], mdfull.params.iloc[1])
print(y[2] - y[1] - y[1] + y[0], mdfull.params.iloc[2])
print(y[3] - y[2] - y[2] + y[1], mdfull.params.iloc[3])
print(y[4] - y[3] - y[3] + y[2], mdfull.params.iloc[4])7.306531398939505 7.3065313989395415
0.03947881097378758 0.03947881097376421
0.006542546627510859 0.00654254662755926
0.0015063840174303067 0.001506384017439581
0.004000542782827132 0.0040005427827803555
Ridge and LASSO regularized estimation¶
We now compute the ridge and lasso regularized estimators using the optimization library cvxpy. cvxpy is a library for formulating and solving convex optimization problems. It is widely used in Machine Learning, Statistics, Engineering etc.
import cvxpy as cpHere is the function for solving the ridge optimization problem. Given , and , this code solves the problem:
Here denotes penalty_start in the code (the penalty does not involve for ).
def solve_ridge(X, y, lambda_val, penalty_start=2):
n, p = X.shape
# Define variable
beta = cp.Variable(p)
# Define objective
loss = cp.sum_squares(X @ beta - y)
reg = lambda_val * cp.sum_squares(beta[penalty_start:])
objective = cp.Minimize(loss + reg)
# Solve problem
prob = cp.Problem(objective)
prob.solve()
return beta.value
Below is the code for computing the ridge estimator with a fixed value of . We also plot the fitted values (these are the values ) corresponding to the Ridge estimate.
Play around with different values of and see how the estimator changes.
b_ridge = solve_ridge(Xfull, y, lambda_val = 10000) #lambda = 10000 seems to work well
#print(b_ridge)
plt.plot(y)
ridge_fitted = np.dot(Xfull, b_ridge)
plt.plot(ridge_fitted, color = 'red')
plt.show()
Here is the function for minimizing the LASSO objective function. Given , and , this code solves the problem:
Here denotes penalty_start in the code (the penalty does not involve for ).
def solve_lasso(X, y, lambda_val, penalty_start=2):
n, p = X.shape
# Define variable
beta = cp.Variable(p)
# Define objective
loss = cp.sum_squares(X @ beta - y)
reg = lambda_val * cp.norm1(beta[penalty_start:])
objective = cp.Minimize(loss + reg)
# Solve problem
prob = cp.Problem(objective)
prob.solve()
return beta.value
Below is the code for computing the ridge estimator with a fixed value of . We also plot the fitted values (these are the values ) corresponding to the Ridge estimate.
Play around with different values of and see how the estimator changes.
b_lasso = solve_lasso(Xfull, y, lambda_val = 25) #10 seems to work well
#print(b_lasso)
plt.plot(y)
lasso_fitted = np.dot(Xfull, b_lasso)
plt.plot(lasso_fitted, color = 'red')
plt.show()
The LASSO estimator is typically sparse. This can be checked by the code below where we only display the estimated coefficients which cross a threshold in absolute value.
threshold = 1e-6
significant_idx = np.where(np.abs(b_lasso) > threshold)[0]
print(b_lasso[significant_idx])
# Or to see index-value pairs:
for idx in significant_idx:
print(f"Index {idx}: {b_lasso[idx]}")[ 7.40101781e+00 3.90266732e-02 -6.92496253e-04 -1.95497046e-03
-1.41763312e-02 -3.62292248e-03 -6.37795834e-03]
Index 0: 7.401017811074355
Index 1: 0.03902667316754264
Index 29: -0.0006924962529493638
Index 30: -0.001954970458737117
Index 64: -0.01417633117574186
Index 65: -0.003622922481167155
Index 91: -0.006377958342345468
Below we plot both the ridge and LASSO fits.
plt.plot(y, color = 'None')
ridge_fitted = np.dot(Xfull, b_ridge)
plt.plot(ridge_fitted, color = 'red')
lasso_fitted = np.dot(Xfull, b_lasso)
plt.plot(lasso_fitted, color = 'black')
plt.show()
The two fitted values seem quite similar.
Fitting Smooth Trend Functions to Data¶
These models and the corresponding Ridge and LASSO estimators are useful for fitting trend functions to data.
The following two datasets are from NOAA climate at a glance. They contain temperature anomalies for the months of January and June (for each year from 1850 to 2025). Anomalies are in celsius and are with respect to the 1901-2000 average.
temp_jan = pd.read_csv('TempAnomalies_January.csv', skiprows=4)
temp_june = pd.read_csv('TempAnomalies_June.csv', skiprows=3)
print(temp_jan)
y_jan = temp_jan['Anomaly']
y_june = temp_june['Anomaly']
plt.plot(temp_jan['Year'], y_jan, label='January')
plt.plot(temp_jan['Year'], y_june, label='June')
plt.legend()
plt.xlabel('year')
plt.ylabel('Celsius')
plt.title('Temperature anomalies (from 1901-2000 average) for January')
plt.show() Year Anomaly
0 1850 -0.46
1 1851 -0.17
2 1852 -0.02
3 1853 -0.12
4 1854 -0.28
.. ... ...
171 2021 0.83
172 2022 0.92
173 2023 0.89
174 2024 1.30
175 2025 1.33
[176 rows x 2 columns]

We shall fit a trend function to each of these two datasets. The first step is to create the matrix.
n = len(y_jan) #both datasets have the same length, so the X matrix will be the same for both
x = np.arange(1, n+1)
Xfull = np.column_stack([np.ones(n), x-1])
for i in range(n-2):
c = i+2
xc = ((x > c).astype(float))*(x-c)
Xfull = np.column_stack([Xfull, xc])
print(Xfull)[[ 1. 0. -0. ... -0. -0. -0.]
[ 1. 1. 0. ... -0. -0. -0.]
[ 1. 2. 1. ... -0. -0. -0.]
...
[ 1. 173. 172. ... 1. 0. -0.]
[ 1. 174. 173. ... 2. 1. 0.]
[ 1. 175. 174. ... 3. 2. 1.]]
The ridge regression estimate is computed below. Start with some standard choice of (e.g., ) and then increase or decrease it by factors of 10 until you get a fit that is visually nice (smooth while capturing patterns in the data).
b_ridge_jan = solve_ridge(Xfull, y_jan, lambda_val = 1000)
ridge_fitted_jan = np.dot(Xfull, b_ridge_jan)
b_ridge_june = solve_ridge(Xfull, y_june, lambda_val = 1000)
ridge_fitted_june = np.dot(Xfull, b_ridge_june)
plt.figure(figsize = (10, 6))
#plt.plot(temp_jan['Year'], y_jan, color = 'lightgray')
plt.plot(temp_jan['Year'], ridge_fitted_jan, color = 'red', label = 'January Ridge')
plt.plot(temp_jan['Year'], ridge_fitted_june, color = 'blue', label = 'June Ridge')
plt.legend()
plt.show()
We repeat the exercise with LASSO below.
b_lasso_jan = solve_lasso(Xfull, y_jan, lambda_val = 10)
lasso_fitted_jan = np.dot(Xfull, b_lasso_jan)
b_lasso_june = solve_lasso(Xfull, y_june, lambda_val = 10)
lasso_fitted_june = np.dot(Xfull, b_lasso_june)
plt.figure(figsize = (10, 6))
#plt.plot(temp_jan['Year'], y, color = 'lightgray')
plt.plot(temp_jan['Year'], lasso_fitted_jan, color = 'red', label = 'January LASSO')
plt.plot(temp_jan['Year'], lasso_fitted_june, color = 'blue', label = 'June LASSO')
plt.legend()
plt.show()
Cross-validation for picking ¶
def ridge_cv(X, y, lambda_candidates):
n = len(y)
k = 5
fold_size = n // k
folds = []
for i in range(k):
start = i * fold_size
end = (i + 1) * fold_size if i < k - 1 else n # last fold includes remainder
test_indices = np.arange(start, end)
train_indices = np.concatenate([np.arange(0, start), np.arange(end, n)])
folds.append((train_indices, test_indices))
cv_errors = {lamb: 0 for lamb in lambda_candidates}
for train_index, test_index in folds:
X_train = X[train_index]
X_test = X[test_index]
y_train = y[train_index]
y_test = y[test_index]
for lamb in lambda_candidates:
beta = solve_ridge(X_train, y_train, lambda_val=lamb)
y_pred = np.dot(X_test, beta)
squared_errors = (y_test - y_pred) ** 2
cv_errors[lamb] += np.sum(squared_errors)
for lamb in lambda_candidates:
cv_errors[lamb] /= n
best_lambda = min(cv_errors, key=cv_errors.get)
return best_lambda, cv_errors
y = y_jan
lambda_candidates = np.array([0.1, 1, 10, 100, 1000, 10000, 100000])
best_lambda, cv_errors = ridge_cv(Xfull, y, lambda_candidates)
print(best_lambda)
for lamb, error in sorted(cv_errors.items()):
print(f"Lambda = {lamb:.2f}, CV Error = {error:.6f}")
1000.0
Lambda = 0.10, CV Error = 11.850955
Lambda = 1.00, CV Error = 2.261767
Lambda = 10.00, CV Error = 0.513626
Lambda = 100.00, CV Error = 0.126082
Lambda = 1000.00, CV Error = 0.044625
Lambda = 10000.00, CV Error = 0.048169
Lambda = 100000.00, CV Error = 0.073446
def lasso_cv(X, y, lambda_candidates):
n = len(y)
k = 5
fold_size = n // k
folds = []
for i in range(k):
start = i * fold_size
end = (i + 1) * fold_size if i < k - 1 else n # last fold includes remainder
test_indices = np.arange(start, end)
train_indices = np.concatenate([np.arange(0, start), np.arange(end, n)])
folds.append((train_indices, test_indices))
cv_errors = {lamb: 0 for lamb in lambda_candidates}
for train_index, test_index in folds:
X_train = X[train_index]
X_test = X[test_index]
y_train = y[train_index]
y_test = y[test_index]
for lamb in lambda_candidates:
beta = solve_lasso(X_train, y_train, lambda_val=lamb)
y_pred = np.dot(X_test, beta)
squared_errors = (y_test - y_pred) ** 2
cv_errors[lamb] += np.sum(squared_errors)
for lamb in lambda_candidates:
cv_errors[lamb] /= n
best_lambda = min(cv_errors, key=cv_errors.get)
return best_lambda, cv_errors
y = y_june
lambda_candidates = np.array([0.1, 1, 10, 100, 1000, 10000, 100000])
best_lambda, cv_errors = lasso_cv(Xfull, y, lambda_candidates)
print(best_lambda)
for lamb, error in sorted(cv_errors.items()):
print(f"Lambda = {lamb:.2f}, CV Error = {error:.6f}")
10.0
Lambda = 0.10, CV Error = 1.907423
Lambda = 1.00, CV Error = 0.072098
Lambda = 10.00, CV Error = 0.038189
Lambda = 100.00, CV Error = 0.078824
Lambda = 1000.00, CV Error = 0.165582
Lambda = 10000.00, CV Error = 0.165582
Lambda = 100000.00, CV Error = 0.165582
Shrinkage and Sparsity¶
Let us compare the ridge regularized estimates of with the unregularized estimates (unregularized means ).
y = y_jan
b_ridge_0 = solve_ridge(Xfull, y, lambda_val = 0)
b_ridge = solve_ridge(Xfull, y, lambda_val = 1000)
plt.scatter(b_ridge_0[2:], b_ridge[2:])
print(b_ridge_0[1], b_ridge[1])
print(b_ridge_0[0], b_ridge[0])0.28999999999997206 0.003551957730090027
-0.4599999999999922 -0.18495721415398625

Note that the -axis above has a much tighter range compared to the -axis. This illustrates the shrinkage aspect of ridge regression. It shrinks the unregularized estimates towards zero.
y = y_jan
b_lasso_0 = solve_lasso(Xfull, y, lambda_val = 0)
b_lasso = solve_lasso(Xfull, y, lambda_val = 10)
plt.scatter(b_lasso_0[2:], b_lasso[2:])
print(b_lasso_0[1], b_lasso[1])
print(b_lasso_0[0], b_lasso[0])0.290000000008449 -0.00183578311693186
-0.46000000053282925 -0.14899519332692354

This plot clearly shows that the LASSO sets most coefficients exactly to zero. Compare this plot to the corresponding plot for the ridge estimator. This means that LASSO is setting most of the ’s to zero. This is referred to as sparsity.
Ridge estimation results in shrinkage and LASSO estimation results in sparsity.
Simulated Data¶
I am simulating two time series datasets (of the same size ) using the code below.
def smoothfun(x):
ans = np.sin(15*x) + 3*np.exp(-(x ** 2)/2) + 0.5*((x - 0.5) ** 2) + 5 * np.log(x + 0.1) + 7
return ans
n = 1000
xx = np.linspace(0, 1, n)
truth_1 = np.array([smoothfun(x) for x in xx])
sig_1 = 2
rng = np.random.default_rng(seed = 42)
errorsamples_1 = rng.normal(loc=0, scale = sig_1, size = n)
y_1 = truth_1 + errorsamples_1
xx = np.linspace(0, 1, n)
truth_2 = 0.8*truth_1
sig_2 = 3
errorsamples_2 = rng.normal(loc=0, scale = sig_2, size = n)
y_2 = truth_2 + errorsamples_2Here are the two datasets plotted.
fig, axes = plt.subplots(2, 1, figsize=(6, 4), sharex=True)
# First panel
axes[0].plot(xx, y_1, label='Data 1', color='blue')
axes[0].legend()
axes[0].set_ylabel("y_1")
# Second panel
axes[1].plot(xx, y_2, label='Data 2', color='lightgray')
axes[1].legend()
axes[1].set_xlabel("xx")
axes[1].set_ylabel("y_2")
plt.tight_layout()
plt.show()

#Plotting both datasets together:
plt.plot(xx, y_1, label = 'Data 1', color = 'blue')
plt.plot(xx, y_2, label = 'Data 2', color = 'lightgray')
plt.legend()
plt.show()
Each of these datasets is generated by taking a smooth function, and adding noise to it. The smooth functions which generated the data are shown below.
#True functions which generated the data:
plt.plot(xx, truth_1, label = 'True Trend 1', color = 'blue')
plt.plot(xx, truth_2, label = 'True Trend 2', color = 'red')
plt.legend()
plt.show()

These smooth functions (which generated the data) are clearly different. These differences are not easy to see however when we look at the raw data directly because of noise. We can use our trend estimates (based on ridge and lasso) to denoise the data so that they become comparable.
y = y_1
n = len(y)
x = np.arange(1, n+1)
Xfull = np.column_stack([np.ones(n), x-1])
for i in range(n-2):
c = i+2
xc = ((x > c).astype(float))*(x-c)
Xfull = np.column_stack([Xfull, xc])
print(Xfull)[[ 1. 0. -0. ... -0. -0. -0.]
[ 1. 1. 0. ... -0. -0. -0.]
[ 1. 2. 1. ... -0. -0. -0.]
...
[ 1. 997. 996. ... 1. 0. -0.]
[ 1. 998. 997. ... 2. 1. 0.]
[ 1. 999. 998. ... 3. 2. 1.]]
lambda_candidates = np.array([1e1, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8])
best_lambda_1, cv_errors = ridge_cv(Xfull, y_1, lambda_candidates)
print(best_lambda_1)
best_lambda_2, cv_errors = ridge_cv(Xfull, y_2, lambda_candidates)
print(best_lambda_2)1000000.0
1000000.0
b_ridge_1 = solve_ridge(Xfull, y_1, lambda_val = best_lambda_1)
ridge_fitted_1 = np.dot(Xfull, b_ridge_1)
b_ridge_2 = solve_ridge(Xfull, y_2, lambda_val = best_lambda_2)
ridge_fitted_2 = np.dot(Xfull, b_ridge_2)
plt.plot(ridge_fitted_1, color = 'blue', label = 'Ridge Estimate (for data 1)')
plt.plot(ridge_fitted_2, color = 'red', label = "Ridge Estimate (for data 2)")
plt.legend()
plt.show()
lambda_candidates = np.array([1e1, 1e2, 1e3, 1e4])
best_lambda_1, cv_errors = lasso_cv(Xfull, y_1, lambda_candidates)
print(best_lambda_1)
best_lambda_2, cv_errors = lasso_cv(Xfull, y_2, lambda_candidates)
print(best_lambda_2)/Users/aditya/mambaforge/envs/stat153fall2025/lib/python3.11/site-packages/cvxpy/problems/problem.py:1539: UserWarning: Solution may be inaccurate. Try another solver, adjusting the solver settings, or solve with verbose=True for more information.
warnings.warn(
1000.0
1000.0
b_lasso_1 = solve_lasso(Xfull, y_1, lambda_val = best_lambda_1)
lasso_fitted_1 = np.dot(Xfull, b_lasso_1)
b_lasso_2 = solve_lasso(Xfull, y_2, lambda_val = best_lambda_2)
lasso_fitted_2 = np.dot(Xfull, b_lasso_2)
plt.plot(lasso_fitted_1, color = 'blue', label = 'LASSO Estimate (for data 1)')
plt.plot(lasso_fitted_2, color = 'red', label = "LASSO Estimate (for data 2)")
plt.legend()
plt.show()
plt.plot(ridge_fitted_1, color = 'blue', label = 'Ridge Estimate (for data 1)')
plt.plot(ridge_fitted_2, color = 'red', label = "Ridge Estimate (for data 2)")
plt.plot(lasso_fitted_1, color = 'blue', linestyle = '--', label = 'LASSO Estimate (for data 1)')
plt.plot(lasso_fitted_2, color = 'red', linestyle = '--', label = "LASSO Estimate (for data 2)")
plt.legend()
plt.show()