# 机器学习代写｜Homework Assignment 1 CSE 151A: Introduction to Machine Learning

• ALL

Instructions: Please answer the questions below, attach your code in the document, and insert figures to create a single PDF file. You may search information online but you will need to write code/find solutions to answer the questions yourself.
Grade: out of 100 points

## 1 (10 points) Classification vs. Clustering

In this question, you are provided with several scenarios. You need to identify if the given scenario is better formulated as a classification task or a clustering task. You should also provide the reason that supports your choice.
1. Scenario 1: Assume there are 100 graded answer sheets for a homework assignment (scores range from 0 to 100). We would like to split them into several groups where each group has similar scores.
Reason:
2. Assume there are 100 graded answer sheets for a homework assignment (scores range from 0 to 100).
We would like to split them into several groups where each group represents a letter grade (A, B, C,D) following the criteria: A (90-100), B (75-90), C (60-75), D (0-60).
Reason:

## 2 (40 points) Basic Calculus

2.1 (20 points) Derivatives with Scalars  2.2 (20 points) Derivatives with Vectors

Several particular vector derivatives are useful for this course. For matrix A RM×M, column vector x RM and a RM , we have The above rules adopt a denominator-layout notation. For more rules, you can refer to this Wikipedia page.

Please apply the above rules and calculate following derivatives:  ## 3 (20 points) Metrics

In machine learning, we have many metrics to evaluate the performance of our model. For example, in a binary classification task, there is a dataset S = (xi , yi), i = 1, .., N where each data point (x, y) contains a feature vector x RM and a ground-truth label y ∈ {0, 1}. We have obtained a classifier f : RM → {0, 1} to predict the label ˆy of feature vector x:yˆ = f(x) Assume N = 200 and we have the following confusion matrix to represent the result of classifier f on dataset

S: Please follow the lecture notes to compute the metrics below:

1. Please compute the accuracy of the classifier f on dataset S.
2. Please compute the precision of the classifier f on dataset S.
3. 4 Please compute the F1 score of the classifier f on dataset S.
4. You may find the accuracy of current model very high. Does it mean the performance of this model is always very good? Why?

Hint: You may refer to other metrics you have computed.

## 4 (10 points) Data Visualization

We will be using the UCI Wine dataset for this problem and Question 5. The description of the dataset can be found at https://archive.ics.uci.edu/ml/datasets/wine. You can load the dataset using the code below (recommended), or you can download the dataset here and load it yourself. You may refer the the Jupyter notebook HW1-Q4-Q5.ipynb for some skeleton code.

1. Show a scatter plot for the first 2 feature dimensions in 2-D space.Some useful instructions are shown below:
• Import several useful packages into Python:

import matplotlib.pyplot as plt

from sklearn import datasets

• Load Wine dataset into Python:

wine = datasets.load wine()

X = wine.data

Y = wine.target

Report your code and the scatter plot in Gradescope submission.

## 5 (20 points) Data Manipulation

We have already had a glimpse of the Wine dataset in Question 4. In this question, we will still use the Wine dataset. In fact, you can see the shape of array X is (178, 13) by running X.shape, which means it contains 178 data points and 13 features per data point. You may refer the the Jupyter notebook HW1-Q4-Q5.ipynb for some skeleton code. Here, we will calculate some measures of the array X and perform some basic data manipulation:

1. Show the first 2 features of the first 3 data points (i.e. first 2 columns and first 3 rows) of array X.(You can print the 3 × 2 array).
1. Calculate the mean and the variance of the 1st feature (the 1st column) of array X.
2. Randomly sample 3 data points (rows) of array X by randomly choosing the row indices.Show the indices and the sampled data points.

Hint: You may use np.random.randint().

1. Add one more feature (one more column) to the array X after the last feature. The values of the added feature for all data points are constant 1. Show the first data point (first row) of the new array.Hint: You may use np.ones() and np.hstack().
• Get a row or a column of the array X:

pr int X[ 0 ]  # P r i n t t h e f i r s t row o f a r r ay X.

pr int X[ : , 0 ]  # P r i n t t h e f i r s t column o f a r r ay X. # : he re means a l l rows and 0 means column 0 .

• Get part of the array:

pr int X[ 3 : 5 , 1 : 3 ]  # P r i n t 4 t h and 5 t h rows , 2nd and 3 rd columns .

pr int X[ : 3 , : 2 ]  # P r i n t f i r s t 3 rows , f i r s t 2 columns .

• You may refer to a quick tutorial using NumPy here:

http://cs231n.github.io/python-numpy-tutorial/

Report your code and the results of data manipulation in Gradescope submission. E-mail: vipdue@outlook.com  微信号:vipnxx 