首页 » Python代写 » 机器学习代写|Fundamentals of Machine Learning Exercise 8

机器学习代写|Fundamentals of Machine Learning Exercise 8

这是一个美国的Python机器学习Exercise代写

1 Non-negative matrix factorization

Setup

First load the dataset and import scikit-learn’s decomposition module:

import math
import matplotlib . pyplot as plt
import numpy as np
from sklearn . datasets import load_digits
from sklearn import decomposition
digits = load_digits ()
X = digits [” data ” ]/255.
Y = digits [” target “]

1.1 Comparison of scikit-learn’s NMF with SVD (6 Points)

Use the decomposition module to compare non-negative matrix factorization (NMF) with singular
value decomposition (SVD, np.linalg.svd) on the digits dataset where the methods factorize X
(the matrix of flattened digit images) in the following way:

X = Z · H (NMF) (1)
X = U · S · VT (SVD) (2)

X; Z; H 2 R≥0. If X 2 RN ≥0×D and your number of latent components is M then Z 2 RN ≥0×M and
H 2 RM×D≥0 . Run SVD with full rank and then select the 6 rows from VT corresponding to the
largest singular values. Use at least 10 components for NMF. Note that you must use centered
data for SVD (but not for NMF, of course) and add the mean back to the basis vectors. Reshape
the selected basis vectors from H and VT into 2D images and plot them. One can interpret these
images as a basis for the vector space spaned by the digit dataset. Compare the bases resulting
from SVD and NMF and comment on interesting obervations.

1.2 Implementation (8 Points)

We learned in the lecture that the NMF can be found by alternating updates of the form

Numerators and denominators of the fractions are matrix multiplications, whereas the divisions and
multiplicative updates must be executed element-wise. Implement a function non_negative(data,
num_components) that calculates a non-negative matrix factorization with these updates, whe
re num_components is the desired number of features M after decomposition. Initialize Z0 and
H0 positively, e.g by taking the absolute value of standard normal random variables (RV) with
np.random.randn. Iterate until reasonable convergence, e.g. for t = 1000 steps. Note that you
might have to ensure numerical stability by avoiding division by zero. You can achieve this by clip
ping denominators at a small positive value with np.clip. Run your code on the digits data, plot
the resulting basis vectors and compare with the NMF results from scikit-learn (results should be
similar). Can you confirm that the squared loss kX − Zt · Htk2 2 is non-increasing as a function of t?

2 Recommender system (12 Points)
Use your code to implement a recommendation system. We will use the movielens-100k dataset
with pandas, which you can download as an “external link” on MaMPF.

import pandas as pd # install pandas via conda

# column headers for the dataset
ratings_cols = [’user id ’,’movie id ’,’rating ’,’timestamp ’]
movies_cols = [’movie id ’,’movie title ’,’release date ’,
’video release date ’,’IMDb URL ’,’unknown ’,’Action ’,
’Adventure ’,’Animation ’,’Childrens ’,’Comedy ’,’Crime ’,
’ Documentary ’,’Drama ’,’Fantasy ’,’Film – Noir ’,’Horror ’,
’Musical ’,’Mystery ’,’Romance ’,’Sci -Fi ’,’Thriller ’,
’War ’ ,’Western ’]

users_cols = [’user id ’,’age ’,’gender ’,’ occupation ’,
’zip code ’]
users = pd. read_csv (’ml -100 k/u. user ’, sep =’|’,
names = users_cols , encoding =’latin -1 ’)

movies = pd. read_csv (’ml -100 k/u. item ’, sep =’|’,
names = movies_cols , encoding =’latin -1 ’)

ratings = pd. read_csv (’ml -100 k/u. data ’, sep =’\t’,
names = ratings_cols , encoding =’latin -1 ’)

# peek at the dataframes , if you like 🙂
users . head ()
movies . head ()
ratings . head ()

# create a joint ratings dataframe for the matrix
fill_value = 0
rat_df = ratings . pivot ( index = ’user id ’,
columns =’movie id ’, values = ’rating ’). fillna ( fill_value )
rat_df . head ()


程序辅导定制C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB


blank

本网站支持 Alipay WeChatPay PayPal等支付方式

E-mail: vipdue@outlook.com  微信号:vipnxx


如果您使用手机请先保存二维码,微信识别。如果用电脑,直接掏出手机果断扫描。

blank