1 PART I [40%]
1.1 Kernel perceptron (Handwritten Digit Classification)
Introduction: In this exercise you will train a classifier to recognize hand written digits. The task is quasi-realistic and you will (perhaps) encounter difficulties in working with a moderately large dataset which you would not encounter on a \toy” problem.
You may already be familiar with the perceptron, this exercise generalizes the perceptron in two ways,first we generalize the perceptron to use kernel functions so that we may generate a nonlinear separating surface and second, we generalize the perceptron into a majority network of perceptrons so that instead of separating only two classes we may separate k classes.
Adding a kernel: The kernel allows one to map the data to a higher dimensional space as we did with basis functions so that class of functions learned is larger than simply linear functions. We will consider a single type of kernel, the polynomial Kd(p; q) = (p · q)d which is parameterized by a positive integer d controlling the dimension of the polynomial.
Training and testing the kernel perceptron: The algorithm is online that is the algorithms operate on a single example (xt; yt) at a time. As may be observed from the update equation a single kernel function K(xt; ·) is added for each example scaled by the term αt (may be zero). In online training we repeatedly cycle through the training set; each cycle is known as an epoch. When the classifier is no longer changing when we cycle thru the training set, we say that it has converged. It may be the case for some datasets that the classifier never converges or it may be the case that the classifier will generalize better if not trained to convergence, for this exercise I leave the choice to you to decide how many epochs to train a particular classifier (alternately you may research and choose a method for converting an online algorithm to a batch algorithm and use that conversion method). The algorithm given in the table correctly describes training for a single pass thru the data (1st epoch). The algorithm is still correct for multiple epochs, however, explicit notation is not given. Rather, latter epochs (additional passes thru the data) is represented by repeating the dataset with the xi’s renumbered. I.e., suppose we have a 40 element training set f(x1; y1); (x2; y2); :::; (x40; y40)g to model additional epochs simply extend the data by duplication, hence an m epoch dataset is
where x1 = x41 = x81 = : : : = x(m−1)×40+1, etc. Testing is performed as follows, once we have trained a classifier w on the training set, we simply use the trained classifier with only the prediction step for each example in test set. It is a mistake when ever the prediction ^ yt does not match the desired output yt, thus the test error is simply the number of mistakes divided by test set size. Remember in testing the update step is never performed.
Generalizing to k classes: Design a method (or research a method) to generalise your two-class classifier to k classes. The method should return a vector κ 2 <k where κi is the \confidence” in label i; then you should predict either with a label that maximises confidence or alternately with a randomised scheme.
本网站支持 Alipay WeChatPay PayPal等支付方式
E-mail: firstname.lastname@example.org 微信号:vipnxx