Seminar | June 10 | 9-10 a.m. | Zoom
Vignesh Subramanian
Electrical Engineering and Computer Sciences (EECS)
Modern machine learning models have demonstrated tremendous success at tasks such as image classification, object detection and tracking. Typically, these models are gigantic and have vastly more parameters than the number of training data points. Consequently, they can interpolate noisy training data, i.e. achieve zero training error, something that traditionally has been viewed as overfitting and thought to have an adverse effect on generalization. However, after empirical work demonstrated that interpolating models generalize well in practice, a series of theoretical works followed to understand why.
My work tackles this problem from a signal processing perspective. With the help of a Fourier features toy model, we are able to gain insights into the perils of interpolating noisy training data and use these insights to theoretically analyze generalization error of linear regression, binary classification and multiclass classification in a Gaussian features setting.
In this talk, we map overparameterization to the phenomenon of undersampling with the help of a Fourier toy model. Signal-bleed, shrinkage of the true signal, and contamination due to false discovery of aliases are primary challenges while interpolating noisy training data. Essentially, we must favor the true signal directions while ensuring that there are sufficient unimportant directions to absorb noise. By using an asymptotic framework where the number of training points, features and amount of feature favoring vary together, we identify parameter regions where overparameterized linear regression generalizes well. Extending our analysis to the binary classification problem shows the existence of a regime where classification works but regression does not. Good generalization for binary classification requires less favoring of true features as compared to what is required for regression. Finally, we derive sufficient conditions for good generalization of multiclass classification where the number of classes also scales with the training points. Here, we highlight the additional challenges that we must overcome while moving from a binary to a multiclass setting and contrast the parameter regions where multiclass classification generalizes well to the regions where binary classification generalizes well.
vignesh.subramanian@berkeley.edu, 510-5704136
Shirley SALANIO, shirleysalanio@berkeley.edu, 510-643-8347