1 PART I [40%]
1.1 Kernel perceptron (Handwritten Digit Classification)
Introduction: In this exercise you will train a classifier to recognize hand written digits. The task
is quasi-realistic and you will (perhaps) encounter difficulties in working with a moderately large dataset
which you would not encounter on a \toy” problem.
You may already be familiar with the perceptron, this exercise generalizes the perceptron in two ways,
first we generalize the perceptron to use kernel functions so that we may generate a nonlinear separating
surface and second, we generalize the perceptron into a majority network of perceptrons so that instead of
separating only two classes we may separate k classes.
Adding a kernel: The kernel allows one to map the data to a higher dimensional space as we did with
basis functions so that class of functions learned is larger than simply linear functions. We will consider
a single type of kernel, the polynomial Kd(p; q) = (p · q)d which is parameterized by a positive integer d
controlling the dimension of the polynomial.
Training and testing the kernel perceptron: The algorithm is online that is the algorithms operate
on a single example (xt; yt) at a time. As may be observed from the update equation a single kernel
function K(xt; ·) is added for each example scaled by the term αt (may be zero). In online training we
repeatedly cycle through the training set; each cycle is known as an epoch. When the classifier is no
longer changing when we cycle thru the training set, we say that it has converged. It may be the case for
some datasets that the classifier never converges or it may be the case that the classifier will generalize
better if not trained to convergence, for this exercise I leave the choice to you to decide how many epochs
to train a particular classifier (alternately you may research and choose a method for converting an online
algorithm to a batch algorithm and use that conversion method). The algorithm given in the table correctly
describes training for a single pass thru the data (1st epoch). The algorithm is still correct for multiple
epochs, however, explicit notation is not given. Rather, latter epochs (additional passes thru the data) is
represented by repeating the dataset with the xi’s renumbered. I.e., suppose we have a 40 element training
set f(x1; y1); (x2; y2); :::; (x40; y40)g to model additional epochs simply extend the data by duplication, hence
an m epoch dataset is
where x1 = x41 = x81 = : : : = x(m−1)×40+1, etc. Testing is performed as follows, once we have trained a
classifier w on the training set, we simply use the trained classifier with only the prediction step for each
example in test set. It is a mistake when ever the prediction ^ yt does not match the desired output yt, thus
the test error is simply the number of mistakes divided by test set size. Remember in testing the update
step is never performed.
Generalizing to k classes: Design a method (or research a method) to generalise your two-class classifier
to k classes. The method should return a vector κ 2 <k where κi is the \confidence” in label i; then you
should predict either with a label that maximises confidence or alternately with a randomised scheme.
本网站支持 Alipay WeChatPay PayPal等支付方式
E-mail: email@example.com 微信号:vipnxx