EE-654 Assignment #3
1) Using Hive, complete the following tasks:
– Create a database on Hive called “movies”
– Create the four tables corresponding to the Movie-Lens dataset corresponding to the following
files: u.data, u.item, u.user, and u.genre, and populate the tables with the data from the files.
– Write the Hive Query language (HQL) statements corresponding to the following 10 questions.
These are the same questions you answered in the previous assignment, only using MapReduce
a) Print a list of the 10 movies that received the most number of ratings.
b) Print a list of the 10 movies that received the most number of ratings, sorted by the number
c) Print a list of the number of ratings received by each genre.
d) Print the oldest movie with a “5” rating.
e) Print a list of the genre of the top 10 most rated movies.
f) Print the title of the movie that was rated the most by students.
g) Print the list of movies that received the highest number of “5” rating.
h) Print the list of zip codes corresponding to the highest number of users that rated movies.
i) Find the most rated movie by users in the age group 20 to 25.
j) Print the list of movies that were rated within a specific period of time of your choosing.
2) Using the installation of MySQL on the Ubuntu virtual machine:
– Create a database and the four tables, as specified above
– Write the statements for the above 10 SQL questions.
本网站支持 Alipay WeChatPay PayPal等支付方式
E-mail: firstname.lastname@example.org 微信号:vipnxx