这是一个美国的Python机器学习代写Exercise
1 Non-negative matrix factorization
Setup
First load the dataset and import scikit-learn’s decomposition module:
import math
import matplotlib . pyplot as plt
import numpy as np
from sklearn . datasets import load_digits
from sklearn import decomposition
digits = load_digits ()
X = digits [” data ” ]/255.
Y = digits [” target “]
1.1 Comparison of scikit-learn’s NMF with SVD (6 Points)
Use the decomposition module to compare non-negative matrix factorization (NMF) with singular
value decomposition (SVD, np.linalg.svd) on the digits dataset where the methods factorize X
(the matrix of flattened digit images) in the following way:
X = Z · H (NMF) (1)
X = U · S · VT (SVD) (2)
X; Z; H 2 R≥0. If X 2 RN ≥0×D and your number of latent components is M then Z 2 RN ≥0×M and H 2 RM×D≥0 . Run SVD with full rank and then select the 6 rows from VT corresponding to the largest singular values. Use at least 10 components for NMF. Note that you must use centered data for SVD (but not for NMF, of course) and add the mean back to the basis vectors. Reshape the selected basis vectors from H and VT into 2D images and plot them. One can interpret these images as a basis for the vector space spaned by the digit dataset. Compare the bases resulting from SVD and NMF and comment on interesting obervations.
1.2 Implementation (8 Points)
We learned in the lecture that the NMF can be found by alternating updates of the form
Numerators and denominators of the fractions are matrix multiplications, whereas the divisions and multiplicative updates must be executed element-wise. Implement a function non_negative(data,num_components) that calculates a non-negative matrix factorization with these updates, when re num_components is the desired number of features M after decomposition. Initialize Z0 and H0 positively, e.g by taking the absolute value of standard normal random variables (RV) with np.random.randn. Iterate until reasonable convergence, e.g. for t = 1000 steps. Note that you might have to ensure numerical stability by avoiding division by zero. You can achieve this by clip ping denominators at a small positive value with np.clip. Run your code on the digits data, plot the resulting basis vectors and compare with the NMF results from scikit-learn (results should be similar). Can you confirm that the squared loss kX − Zt · Htk2 2 is non-increasing as a function of t?
2 Recommender system (12 Points)
Use your code to implement a recommendation system. We will use the movielens-100k dataset with pandas, which you can download as an “external link” on MaMPF.
import pandas as pd # install pandas via conda
# column headers for the dataset
ratings_cols = [’user id ’,’movie id ’,’rating ’,’timestamp ’]
movies_cols = [’movie id ’,’movie title ’,’release date ’,
’video release date ’,’IMDb URL ’,’unknown ’,’Action ’,
’Adventure ’,’Animation ’,’Childrens ’,’Comedy ’,’Crime ’,
’ Documentary ’,’Drama ’,’Fantasy ’,’Film – Noir ’,’Horror ’,
’Musical ’,’Mystery ’,’Romance ’,’Sci -Fi ’,’Thriller ’,
’War ’ ,’Western ’]users_cols = [’user id ’,’age ’,’gender ’,’ occupation ’,
’zip code ’]
users = pd. read_csv (’ml -100 k/u. user ’, sep =’|’,
names = users_cols , encoding =’latin -1 ’)movies = pd. read_csv (’ml -100 k/u. item ’, sep =’|’,
names = movies_cols , encoding =’latin -1 ’)ratings = pd. read_csv (’ml -100 k/u. data ’, sep =’\t’,
names = ratings_cols , encoding =’latin -1 ’)# peek at the dataframes , if you like 🙂
users . head ()
movies . head ()
ratings . head ()# create a joint ratings dataframe for the matrix
fill_value = 0
rat_df = ratings . pivot ( index = ’user id ’,
columns =’movie id ’, values = ’rating ’). fillna ( fill_value )
rat_df . head ()
程序辅导定制C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB

本网站支持 Alipay WeChatPay PayPal等支付方式
E-mail: vipdue@outlook.com 微信号:vipnxx
如果您使用手机请先保存二维码,微信识别。如果用电脑,直接掏出手机果断扫描。
