这是一篇来自美国的关于学习理论和线性预测器的**机器学习代写**家庭作业

**Submission Instruction **

- For the writing part: please only submit pdf file with name [ComputingID]-hw01.pdf. We recommend to use LaTeX and you can find a template on the course webpage. If you are not familiar with LaTeX, using hand writing and scanning it to pdf will also work.

- For the coding part: by default, we will use Python. Your submission should be an iPython notebook file with the name [ComputingID]-hw01.ipynb.

**Questions (20 points) **

**The Bayes Predictor**(4 points) For a binary classification problem, if we know the data distribution*D*over*X × Y*with*Y*=*{*+1*,**−*1*}*, we can define the Bayes predictor as

*f**D*(** x**) = +1 if P[

*y*= +1

*|*

**]**

*x**>*12

*−*1 ifP[*y *= *−*1*| *** x**]

*>*12(1)

Note that P[*y *= +1 *| *** x**] + P[

*y*=

*−*1

*|*

**] = 1. Please show that this is the optimal predictor. In other words, for any predictor**

*x**h*, we have

*L**D*(*f**D*) *≤ **L**D*(*h*) (2)

**Selection of Hypothesis Spaces**(8 points) In lectures, we talked about how to identify the decision boundary using a mixture of Gaussian distributions. As an exercise, please replace the distribution with the following mixture of Gaussian distributions

*D *=12*N *(** x**; 0

*,*1)|

*y*= {z

*−*1 }+12

*N*(

**;23**

*x**π,*0

*.*5)|{z }

*y*=+1(3)

Please answer the following questions with the new data distribution

(a) (2 point) What is the decision boundary of the Bayes predictor *b*Bayes? Such as the Bayes predictor can be defined as

*f**D*(*x*) = +1*x > b*Bayes*−*1*x < b *Bayes (4)

(b) (1 point) What is the true error of the Bayes predictor, *L**D*(*f**D*)?

(c) (2 point) With the following hypothesis space *H *and the data distribution in equation 3, please

find out the best hypothesis *h**∗ **∈ H *and report the corresponding decision boundary *b**∗ *

*H *= *{ *

*i *

400

: *i **∈ *[1200]*} *

(5)1(d) (1 point) What is the true error of *h**∗*, *L**D*(*h**∗ *)?

(e) (1 point) Follow a similar data generation procedure as in the demo code, sample 100 data points from *each *component and label them correspondly. Then, with the same hypothesis space *H *in equation 5 and these 200 training examples, please find out the best hypothesis *h**S *that minimize the empirical error and report the corresponding decision boundary *b**S*.

(f) (1 point) What is the true error of *h**S*, *L**D*(*h**S*)?

**Perceptron algorithm**(3 points) Implementing the Perceptron algorithm with a simple example.

The data you need for the implementation is in the file data.txt, which is released together with the assignment. Comparing to the pseudocode in our lecture, *T *was removed from line 3. That is because in practice we do not know the actual value of *T*. But we can monitor the predictions on all data points and stop the algorithm when the classifier makes correct predictions on all examples.

1: **Input**: *S *= *{*(** x**1

*, y*1)

*, . . . ,*(

*x**m*

*, y*

*m*))

*}*

2: Initialize ** w**(0) = (0

*, . . . ,*0)

3: **for ***t *= 1*, *2*, **· · · ***do **

4:*i **← **t *mod *m *

5:**if ***y**i**⟨*** w**(

*t*)

*,*

*x**i*

*⟩≤*0

**then**

6:** w**(

*t*+1)

*←*

**(**

*w**t*) +

*y*

*i*

*x**i*

7:**end if **

8: **end for **

9: **Output**: the final ** w**(

*t*)

**Logistic Regression**(2 points) Given a training set*S*=*{*(1*x**, y*1)*, . . . ,*(*x**m**, y**m*)*}*, the loss function of logistic regression is defined as

*L*(*h**w**, S*) =1*m **m*X*i*=1 log(1 + exp(*−**y**i**⟨**w**, **x**i**⟩*))*. *(6)

Please show that the gradient of *L*(*h**w**, S*) with respect to ** w **is

*dL*(

*h*

*w*

*, S*)

*d*** w **=1

*m*

*m*X

*i*=1exp(

*−*

*y*

*i*

*⟨*

*w**,*

*x**i*

*⟩*)1 + exp(

*−*

*y*

*i*

*⟨*

*w**,*

*x**i*

*⟩*)(

*−*

*y*

*i*

*x**i*) (7)

**Linear Regression**(3 points) The loss function of linear regression with*ℓ*2 regularization is defined as

*L**ℓ*2 (*h**w**, S*) =*m*X*i*=1(*h**w*(*x**i*) *− **y**i*)2 + *λ**∥**w**∥*2 2 (8)

Please show that the solution of this problem, when **A **+ *λ***I **is invertible, is

** w **= (

**A**+

*λ*

**I**)

*−*1

**(9)**

*b*where **I **is the identify matrix, **A **and ** b **is defined as

**A **=*m*X*i*=1*x**i*** x**T

*i*

**=**

*b**m*X

*i*=1

*y*

*i*

*x**i*(10)

Note that *{**x**i**} *are column vectors.

Please report your answer in the homework submission and also submit your code with file name as [ComputingID]-hw02.py or [ComputingID]-hw02.ipynb. Without code submission, you will get a 50% deduction on the total points you have on this problem.

**程序辅导定制C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB**

本网站支持 Alipay WeChatPay PayPal等支付方式

**E-mail:** vipdue@outlook.com **微信号:**vipnxx

如果您使用手机请先保存二维码，微信识别。如果用电脑，直接掏出手机果断扫描。