Homework Assignment # 3
Due: Friday, April 16, 2021, 11:59 p.m. Total marks: 100
Question 1. [50 marks]
In this question, you will implement multivariate linear regression, and polynomial regression, and learn their parameters using Stochastic Gradient Descent (SGD). An script in python has been given to you, called script_regress_q1.py, to run the regression algorithms on a dataset. You will be running the algorithms on the GraduateAdmissions v1.0 data set, which has n = 500 samples and d = 7 features. The features are composed of GPA, TOEFL grades and a few other criteria. You are asked to train some models to predict the admission probability based on these features. The features are augmented to have a column of ones (to create the bias term), in dataloader.py (not in the data file itself). Baseline algorithms, including mean and random predictions, are used to serve as sanity checks. We should be able to outperform random predictions, and the mean value of the target in the training set.
(a) [20 marks] Implement a mini-batch stochastic gradient descent approach to obtain the linear
regression solution (see Algorithm 3 in Chapter 7 of the notes). The update has the form wt+1 = wt − ηtgt, where gt is the mini-batch gradient on iteration t. We will use a slightly better heuristic stepsize choice than the last assignment, where we the inverse of an accumulating sum of gradient norms: ηt = (1 + g ̄t)−1 where
g ̄t = g ̄t−1 + d + 1
with gt,j the j-th entry in the vector (the array) gt and g ̄0 = 1 where t starts at 1. Initialize the weights to zero and set the default number of epochs to 100 (as in the barebones). Add this code to algorithms.py. Report the error obtained by your linear regression model, as outputted by script_regress_q1.py.
(b) [20 marks] Implement polynomial regression with a p = 2 degree polynomial, using the same
approach as in Part a. This means using a mini-batch stochastic gradient descent approach with the given stepsize approach. Add this code to algorithms.py. As a hint, consider calling the LinearRegression algorithm you wrote in the first question, within PolynomialRegression, to avoid code duplication and to re-use an already debugged algorithm. Report the error obtained by your polynomial regression model, as outputted by script_regress_q1.py.
(c) [10 marks] Rather than the scalar step size, we can have a vector of step sizes: a different
stepsize for each dimension in the weight vector. Moving to a vector of stepsizes makes it even more important to have adaptive stepsize approaches, as it is impractical to choose different step sizes for each dimension manually. The AdaGrad algorithm is similar to the stepsize approach above, but uses a different stepsize for each dimension. We now maintain a vector g ̄t ∈ Rd+1 of accumulating gradient values for each dimension
g ̄ t = g ̄ t − 1 + g t2
where g ̄t=0 = ⃗0 where t starts at 1, and the squaring is done element-wise in g, i.e., for every j = 0,1,…,d
g ̄=g ̄ +g2 t,j t−1,j t,j
Fall 2020 CMPUT 296: Basics of Machine Learning The vector stepsize ηt is composed of entries ηt,j = 1/√g ̄t,j. The update multiplies each gradient
dimension with its own stepsize, which is an element-wise product between the vector ηt and gt wt+1 = wt − ηt · gt
where · means element-wise product. Add AdaGrad as a stepsize option within LinearRegression. Report the error obtained by your linear regression model trained with AdaGrad, as outputted by script_regress_q1.py.
Question 2. [30 marks]
In this question, you will use the paired t-test to compare the performance of two models. You will compare the above LinearRegression model and PolynomialRegression model, both using AdaGrad. You will run this comparison using script_regress_q2.py. You hypothesize that PolynomialRegression is better than LinearRegression, and so want to run a one-tailed test to see if that is true.
(a) [5 marks] Define the null hypothesis and the alternative hypothesis. Use μ1 to be the true expected squared error for LinearRegression and μ2 the true expected squared error for Polynomi-
(b) [10 marks] Before running the paired t-test, you should check if the assumptions are not
violated. One way to satisfy the assumption for the paired t-test is to check if the errors are (approximately) normally distributed with (approximately) equal variances. To do this, you need to implement the checkforPrerequisites method in script_regress_q2.py. For each model, you can plot a histogram of its errors on the test set. You can do so using the two vectors of errors and the function plotTwoHistograms function to visualize the error distributions simultaneously. Discuss why it is ok or not ok to use the paired t-test to get statistically sound conclusions about these two models.
(c) [15 marks] Regardless of the outcome of Part b, let’s run the paired t-test. (Note, I am not
advocating that you check for violated assumptions and then ignore the outcome of that step. The goal of this question is simply to give you experience actually running a statistical significance test. Presumably, in practice, you would pick an appropriate one after verifying assumptions). To run this test, you need to compute the p-value. To do this implement the getPValue method, which returns the p-value for the one-tailed paired t-test. Report the p-value. Would you be able to reject the null hypothesis with a significance threshold of 0.05? How about of 0.01?
Question 3. [20 marks]
In this question, you will implement multivariate logistic regression, and learn its solution using Stochastic Gradient Descent (SGD). We will be examining some of the practical aspects of implementing binary classification, including for a large number of features and samples. An initial script in python has been given to you, called script_classify_q3.py, and associated python files. The implementation of logistic regression class shall be provided in algorithms.py. You will be running on a physics data set, with 8 features and 100,000 samples (called susysubset). The features are augmented to have a column of ones (to create the bias term), in dataloader.py (not in the data file itself). We should be able to outperform random predictions, provided by a random classifier.
(a) [15 marks] Implement a mini-batch stochastic gradient descent approach to logistic regression, 2/4
Fall 2020 CMPUT 296: Basics of Machine Learning using AdaGrad. Report the error, using the number of epochs given in the script.
(b) [5 marks] Implement a mini-batch stochastic gradient descent approach to logistic polynomial
regression. As a hint, consider calling the LogisticRegression algorithm you wrote in (a), within PolynomialLogisticRegression, to avoid code duplication and to re-use an already debugged algorithm. Report the error, using the same parameters and step size approach as (a).
本网站支持 Alipay WeChatPay PayPal等支付方式
E-mail: firstname.lastname@example.org 微信号:vipnxx