# 本次Python代写的主要内容是基于machine learning的数据分析

Instruction

CMPSC 448: Machine Learning and AI

Homework 1 (Due 02/14/2021 11:59 PM)

Problem 1 [20 points] In this problem, you are given two matrices A, B ∈ R2×2 and a vector x ∈ R2 1 2 1 2 2

A=24,B=34,x=1

and asked to answer the following questions about them.

- (a) What is A×B?
- (b) What is x⊤Ax?
- (c) What is x⊤x?
- (d) What is xx⊤?
- (e) What is the projection of x onto the subspace spanned by the columns of A?
- (f) Let f : R2 → R be the function given by f(z) = z⊤Az. What is the gradient of f with respect to z, i.e. ∇zf(z)?
- (g) For the function f defined above, what is ∇2zf(z) (the Hessian of f with respect to the vector z ∈ R2)?
- (h) What is the maximizer of f among all vectors with unit Euclidean length, ∥z∥2 = 1?

Problem 2 [10 points] For this problem, we use the following notation for random variables: • X ∼ N(μ,σ2): X is a Gaussian random variable with mean μ and variance σ2

• X ∼ Bern(p): X is a {0, 1}-valued Bernoulli random variable with expectation p.

• E[X]: the expected value of random variable X

(a) If X ∼ N (1, 2), then what is E[X]? What is (E[X])2 − E[X2]? 1

- (b) If X1,X2,…,Xn be independent random variables with Xi ∼ Bern(p),i = 1,2,…,n, what is the distribution of ni=1 Xi?
- (c) Letassumethesequence{0,0,1,1,0,1,0,1,0,1,1,0,1,1}isindependentlydrawnfromBern(p)(multi- ple flips of a biased coin with probability of being head as p which is unknown). What is the maximum likelihood estimator (MLE) of p? Please show the detailed steps (and mathematical derivations you employ).

Problem 3 [5 points] What is the rank of the following matrix and why?

1 2 1 1 0 3 112

Problem 4 [5 points] Use either numpy.linalg or scipy.linalg to find the eigendecomposition of the following matrix:

3 1 1 X=2 4 2

−1 −1 1 Problem5[5points]Forthefunctionf(x)=ln1+e−2x,whatisitderivativef′(x)=df(x) =?.

dx

Problem 6 [10 points] Let x ∈ Rd be a vector in d dimensional space and define the vector valued function

f : Rd → R by

where A ∈ Rd×d is a symmetric matrix and b ∈ Rd is a fixed vector. Using the definition of gradient show

f(x)= 12x⊤Ax+b⊤x, ∇f(x) = Ax + b

that

Problem 7 [5 points]

(a) Whatisthemaximizerofg:[−4,4]→Rgivenbyg(x)=1×3−1×2−6x+27? 222

(b) What is 1 g(x)dx for g defined above? 0

Exploratory Data Analysis with pandas

Problem 8 [40 points] The goal of this problem is to do basic data analysis on a simple data set using pandas package in Python (no machine learning for now). As it has been emphasized in the lectures, we need to have a good understanding of data before training a machine learning model. In this assignment, you are asked to analyze the UCI Adult data set. The Adult data set is a standard machine learning data set that contains demographic information about the US residents. This data was extracted from the census bureau database found at: http://www.census.gov/ftp/pub/DES/www/welcome.html. The data set contains 32561 instances and 15 features (please check the notebook for possible values of each feature) with different types (categorical and continuous).

The data is provided as a csv file and can be loaded into panda’s DataFrame object as shown: data = pd.read_csv(‘adult.data.csv’)

You are asked to answer following questions about this data set. Please note that you need to use pandas functionalities to answer these questions, rather than implementing pure Python code.

- How many men and women (sex feature) are represented in this data set?
- What is the average age (age feature) of women?
- What is the percentage of German citizens (native-country feature)?
- What are the mean and standard deviation of age for those who earn more than 50K per year (salary feature) and those who earn less than 50K per year?
- Is it true that people who earn more than 50K have at least high school education? (education – Bachelors, Prof-school, Assoc-acdm, Assoc-voc, Masters or Doctorate feature)
- Display age statistics for each race (race feature) and each gender (sex feature).
- What is the maximum number of hours a person works per week (hours-per-week feature)? How many people work such a number of hours, and what is the percentage of those who earn a lot (¿50K) among them?
- Count the average time of work (hours-per-week) for those who earn a little and a lot (salary) for each country (native-country). What will these be for Japan?

To answer these questions, you are provided with a Jupyter notebook with questions. Please complete the notebook with you code to answer the questions. You are encouraged to install Anaconda distribution of Python to run the Jupyter notebook or directly use JupyterLab and accomplish this problem.

Deliverables

This homework comes with a data file adult.data.csv, and a Jupyter notebook. You are asked to:

- Submit a PDF file including the answers for theory problems (1-7) and the answer for EDA questions(a snapshot of your code + final answers) via Gradescope.
- The completed Jupyter notebook for EDA problem. Make sure your code is running and includeenough details about your code via Canvas.

**程序辅导定制C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB**

本网站支持 Alipay WeChatPay PayPal等支付方式

**E-mail:** vipdue@outlook.com **微信号:**vipnxx

如果您使用手机请先保存二维码，微信识别。如果用电脑，直接掏出手机果断扫描。