CIS 8392 Topics in Big Data Analytics
1/8 Assignment 3
Step 1. Preprocess the imdb data using the following code
set.seed(123) n_sample <- 5000 max_features <- 5000 maxlen <- 300
imdb <- dataset_imdb(num_words = max_features) c(c(x_train, y_train), c(x_test, y_test)) %<-% imdb # Loads the data
x_train <- pad_sequences(x_train, maxlen = maxlen) x_test <- pad_sequences(x_test, maxlen = maxlen)
sample_indicators = sample(1:nrow(x_train), n_sample)
x_train <- x_train[sample_indicators,] # use a subset of reviews for training
x_train <- y_train[sample_indicators]
# use a subset of reviews for training
x_test <- x_test[sample_indicators,] # use a subset of reviews for testing
y_test <- y_test[sample_indicators]
# use a subset of reviews for testing
2/8 Assignment 3
Step 2. Use
to fit the following models:
1. Simple RNN
4. bidirectional LSTM
5. bidirectional GRU
6. 1D convnet
You can decide the parameters for the network structure (e.g., layers, etc) and model training (e.g., epochs , batch_size and
, number of
3/8 Assignment 3
Step 3. Save the following files
Save each of these fitted models to an h5 file Save the history of each model to an rds file (see Save x_test and y_test to rds files
Step 4. Save the R code you used for steps 1 to 3 to an R file Step 5. Compress all the output files from step 3 to a zip file
4/8 Assignment 3
Step 6. Use R Markdown to achieve the following:
1. Specify author, date, and title in the YAML metadata of your document
2. Read all the output files from Step 3
3. Use x_test and y_test to show the following statistics:
Number of reviews in the test set Number of positive reviews in the test set Number of negative reviews in the test set
4. For each model:
Show model summary Plot the training history Evaluate the performance of the model using the test set
5. Summarize the performance of different models using a table. Columns include
: Overall accuracy of the predictions in the test set n_tp : Number of true-positive predictions in the test set n_tn : Number of true-negative predictions in the test set n_fp : Number of false-positive predictions in the test set n_fn : Number of false-negative predictions in the test set 6. Discuss what you found from the table
5/8 Assignment 3
Here are some additional notes about writing a RMarkdown report. Violating these rules will lead to a lower grade.
Put the data in the same folder as your Rmd file. Whenever we run/knit an RMarkdown file, it uses the folder with the Rmd file as the working directory.
Read the data in your Rmd code chunk using relative path. If you use an absolute path, I will not be able to knit the Rmd file to an html file from my end.
You will lose 5 points if for any reason (input path, error in code, etc.) the Rmd file cannot be knitted to an html file.
Distinguish headings (## heading) and normal text. We should not put all the text in headings.
Do not print excessive data in your RMarkdown report. Use kable to format tables.
Do not put your discussions/explanations in code chunk. Write them as normal text.
Do not use include=FALSE or echo=FALSE in your code chunk. I need to read your code. You may use message=T , warning=T to suppress messages/warnings.
Do not write an excessively long line of code. Break it into multiple lines to improve readability.
6/8 Assignment 3
Step 7. Knit the R Markdown file (.Rmd) to an HTML file Step 8. The R, Rmd, HTML, zip files must follow the naming rule below:
For example: Assignment3-Lin.R Assignment3-Lin.Rmd Assignment3-Lin.html Assignment3-Lin.zip
Step 9. Submit the R, Rmd, html, and zip files (individually) to iCollege
7/8 Assignment 3
Due by the beginning of next class Extra credit: the student who has the best report (determined by the instructor) will be given 5 extra points towards the final grade
Submissions that are too similar would not be considered for the extra credit Accuracy of the models plays a significant role for this extra credit
Grading is based on the following:
Grading is based on the submitted files on iCollege, and the submission folder will become unavaialbe after deadline. Do not wait till the last minutes. You will see a “not authorized” error message if you click submit after the deadline. You will receive 0 point if you submit your assignment via email.
Whether all required files were submitted to iCollege on time, following the naming rule Whether the Rmd file is syntactically correct and can render the html file Whether the report has a professional format and style (succinct and yet provides adequate and clear discussions) Whether the report meets the requirements specified in Step 6
本网站支持 Alipay WeChatPay PayPal等支付方式
E-mail: firstname.lastname@example.org 微信号:vipnxx