机器学习代写｜Machine Learning Bonus Assignment 1

• ALL

1. The table below provides a training data set containing 6 observations, 3 predictors, and 1 qualitative response variable.

Suppose we wish to use this data set to make a prediction for Y when X1 = X2 = X3 = 0 using K-nearest neighbors.

(a) Compute the Euclidean distance between each observation and the test point, X1 = X2 = X3 = 0.

(b) What is our prediction with K = 1 ? Why?

(c) What is our prediction with K = 3? Why?

(d) If the Bayes decision boundary in this problem is highly nonlinear, then would we expect the best value for K to be large or small? Why?

2. This question should be answered using the Carseats data set.

(a) Fit a multiple regression model to predict Sales using Population, Urban, and US.

(b) Provide an interpretation of each coefficient in the model. Be careful – some of the variables in the model are qualitative!

(c) Write out the model in equation form, being careful to handle the qualitative variables properly.

(d) For which of the predictors can you reject the null hypothesis H0 : βj = 0 ?

(e) On the basis of your response to the previous question, fit a smaller model that only uses the predictors for which there is evidence of association with the outcome.

(f) How well do the models in (a) and (e) fit the data?

(g) Using the model from (e), obtain 95% confidence intervals for the coefficient(s).

(h) Is there evidence of outliers or high leverage observations in the model from (e)?

3. Suppose we have features x ∈ Rp, a two-class response, with class sizes N1, N2, and the target coded as −N/N1, N/N2.

Show that the LDA rule classifies to class 2 if

and class 1 otherwise.

4. Use the WineQt data to build a logistic regression(response is quality). Use different regularization technique: None, L1 and L2. Show the accuracy and recall on the train and test data. Does regularization improve your model performance?

5. Compare the classification performance of LDA and support vector machine on the MNIST data. In particular, consider only the 2’s and 3’s. Show both the training and test accuracy.

6. Show for the polynomial kernel function

7. Suppose each of K-classes has an associated target tk, which is a vector of all zeros, except a one in the k th position. Show that classifying to the largest element of yˆ amounts to choosing the closest target, mink ∥tk− yˆ∥, if the elements of yˆ sum to one.

8. Show how to solve the generalized eigenvalue problem maxaTBa subject to aTWa = 1 by transforming to a standard eigenvalue problem.(Assume B and W are symmetric)

9. Show that the ridge regression estimates can be obtained by ordinary least squares regression on an augmented data set. We augment the centered matrix X with p additional rows pλI , and augment y with p zeros. By introducing artificial data having response value zero, the fitting procedure is forced to shrink the coefficients toward zero.

E-mail: vipdue@outlook.com  微信号:vipnxx