Summary: You have to submit a 3-page report (using the provided template) and an
implementation code. For submission, zip your report and code into a single file called
The assessment grade, which is worth 100% of the total grade, is separated into 2 components:
the report and the source code.
The main task is to develop a deep neural network to perform multi-class classification.
Specifically, the dataset for this assignment is a CIFAR-10 image dataset 1, which is also available
from the torchvision library in PyTorch. The dataset is made up of 32×32 RGB images, which
are split into a training set of 50,000 images and a test set of 10,000 images. The images have
labels (one label per image) from 10 classes: airplane, automobile, bird, cat, deer, dog, frog,
horse, ship, truck. The network is trained to predict the labels using the train data and its
generalisation performance is evaluated using the test set.
TASK: Training a neural network
In this assignment, you need to implement a deep neural network with an input layer, three
hidden layers (convolutional, or fully-connected, or a combination of both) with ReLU non
linear activation, and an output (classification) layer 2. Feel free to use PyTorch (recommended),
or any other deep learning frameworks (JAX, TensorFlow) 3 with automatic differentiation,
or python. The neural network should be trained to classify the images from the CIFAR-10
dataset. You can use the built-in modules to load the dataset (e.g. in PyTorch, you can use
torchvision.datasets.CIFAR10) and to build the layers in your model (e.g. in PyTorch you can
use Linear, Dropout, ReLU, Conv2d, BatchNorm2d among others). The training process should
explore the following hyperparameter settings:
• Batch size: Number of examples per training iteration.
• Depth: try and compare deeper versus shallower models. For example, compare perfor
mance when using two/three/four hidden layers.
• Width: Try using different numbers of hidden nodes and compare the performances. In a
fully connected layer, this corresponds to the hidden layer size. In a convolutional layer,
this corresponds to number of filters used for convolution.
• Convolutional filter size: try to vary filter size (also called the kernel size of the filters)
in convolutional layers and compare the performance. Try to analyse, how the filter size
affects the receptive field of the convolutional layers.
• Dropout: Dropout is an effective strategy to defend against overfitting in the fully con
nected layers. Try comparing the performance using different dropout rates.
• Batchnorm: batch normalisation is typically used in convolutional neural networks to
prevent overfitting and speed up convergence. Compare performance with and without
batch normalisation. Explore jointly with the batch size hyper-parameter.
• Max pool: max pooling is typically used to reduce the spatial dimensions of the lay
ers (downsampling). Compare performance when using other types of pooling (average
pooling), or no pooling.
• Tanh non-linearity: Compare the performance when training with tanh non-linear activa
tion, with ReLU (this is the main task), and without any non-linearity.
• Optimiser: Try using different optimisers such as SGD, Adam, RMSProp.
• Weights initialisation: Try different weight initialisation strategies, such as He, Xavier,
• Regularisation (weight decay): L2 regularisation can be specified by setting the weight decay
parameter in optimiser. Try using different regularisation factors and check what effect
this has on the performance.
• Learning rate, Learning rate scheduler: Learning rate is the key hyperparameter in model
training; you can gradually decrease the learning rate to further improve your model. Try
using different learning rates and learning rate schedulers to compare the performance.
You should explore the learning rate and at least three other types of hyperpa
rameters from those listed above; choose at least 3 different values for each hyper
parameter (where applicable). For simplicity, you could analyse one hyperparameter at a
time (i.e. fixing all others to some reasonable value), rather than performing a grid search.
You should describe your model selection procedure: 1) did you do a single train-val split or
did you do cross validation, and how did you split the data? and 2) demonstrate your analysis of
model selection based on learning curves: loss curves w.r.t. training epochs/iterations, accuracy
curves on training and validation data. If you use TensorBoard to monitor your training, you
can directly attach the screenshots of the training curves in your report.
To evaluate the performance of the model after hyper-parameter selection, you also need
to have an evaluation part (for example a function), which uses (or loads) the trained model
and evaluates its performance on the test set. In your report, please clearly state what
hyperparameters you explored, how you did model selection, and what accuracy
the model achieved on the train, validation and test sets.
Let your interest in designing, and deploying neural networks be a driving force when explor
ing hyper-parameter search in this assignment. We value your creativity and critical thinking
when analysing and presenting your findings.
本网站支持 Alipay WeChatPay PayPal等支付方式
E-mail: firstname.lastname@example.org 微信号:vipnxx