Task 1 (No coding- show your general working):
a) You are building a classifier to determine which walking trail is best suited for a weekend
outing with your friends. You scouted around and gathered data about eleven different
walking trails and about the difficulty level (easy, some difficulty or advance), the distance
from Auckland (within, short distance or far), their direction (North, South or West),
whether they can comply with restrictions (none, wheelchair access or flat terrain) and
whether you enjoyed them or not. Using this data build a decision tree to decide whether
you would enjoy a particular trail or not, showing at each level how you decided which
attribute to expand next.
The data is reported in the following table:
b) What is the training set error of your decision tree (i.e. the fraction of points in the training
set that it misclassified)?
c) If you are now given additional data (test set) from several other walking trails, which one
would you go?
Verify your decision tree accuracy, you decide to try them all. The results are:
What is the test set error? Is this result ideal? Explain your answer.
Task 2 (coding):
In this task, we will implement a full ML classifier based on decision trees (in python using
Jupyter notebook). We will use the dataset Mushroom Data Set to train and evaluate your
classifier is. This dataset come from the UCI ML repository. (Hint: There is missing values in this
dataset. At this particular time, you may ignore instances that have missing values. Please note
that there are other ways of pre-processing data which we have not seen yet.)
You can use libraries e.g., Pandas, NumPy but you may NOT use any prebuilt decision tree
a. Implement the basic decision tree procedure as discussed in lectures. You will implement
DecisionTree algorithm with a train procedure. Implement the information gain
criterion as described in our lectures. In your report use one or two sentences to discuss the
output at You may print out your decision tree (This may be large -you will need to consider
the best way to show it).
b. Implement tree depth control as a means of controlling the model’s complexity. In the
procedure train you will implement takes a parameter stopping_depth. You will use
the stopping_depth parameter to stop further splits of the tree. In your report use one
or two sentences to discuss the output at stopping level 2, 3, 4. You may print out your
c. Implement a test procedure for your DecisionTree algorithm. Describe your test
d. Propose an evaluation strategy, implement, and describe how you have carried out
evaluation of your DecisionTree. Please explain your steps and results performance
measure. You may include a snapshot of your results from Juypter notebook.
Task 3 (Reflection – not coding):
a. Discuss what will happen you decide to change the splitting criterion. Explain how you are
changing it to and how this might change your decision tree.
b. Explain whether your evaluation methodology can indicate whether your tree is over-or
本网站支持 Alipay WeChatPay PayPal等支付方式
E-mail: firstname.lastname@example.org 微信号:vipnxx