首页 » 数据库辅导 » 数据库辅导 | DSC 450: Database Processing for Large-Scale Analytics

数据库辅导 | DSC 450: Database Processing for Large-Scale Analytics

DSC 450: Database Processing for Large-Scale Analytics

Take-home Midterm

Don’t forget to include all python code and SQL code in every question.

Part 1

 (8 points) Please give a “True” or a “False” rating to each statement below.

  1. A schema that is in Second Normal Form (2NF) must also be in Third Normal Form (3NF).
  2. A primary key must be composed of exactly one column.
  3. The union rule allows to combine functional dependencies AàCD and BàCD into a functional dependency ABàCD.
  4. SQL query results are always returned sorted by the primary key.
  5. A table in a relational database is a set of tuples.
  6. A UNIQUE constraint on a column allows NULLs to be inserted.

Part 2

  1. (3 points) Write a SQL column definition for PurchaseAmt (just for one column, don’t worry about the rest of the table) that ensures that the purchase amount must be between 0 and 9999.99 (column definition should enforce both the number precision and > 0 requirement).
  2. (3 points) Write a SQL column definition for Street (just for one column, don’t worry about the rest of the table) that ensures that the street address is up to 10 characters and has to start from letter ‘A’ (hint: CHECK constraint can use LIKE operator)
  3. (14 points) The following set of relations records information about university students, courses and assigned grades. The Student relation contains information about the student, including the name (full name is stored in one column), address and their year of graduation. The Course relation records information about courses: course name (primary key), course department and the number of credits provided by the course.  Finally, the Grade relation records information about the grades given; CName is the foreign key referring to the primary key of the Course relation and StudentID is the foreign key referring to the primary key of the Student relation.  The grades are a numeric value given on a 4-point system.

Student(StudentID, Name, Address, GradYear)

Grade(CName, StudentID, CGrade)

Course(CName, Department, Credits)

For each part below, write a single SQL query.

Q1.  Display the list of student IDs and names for the students who graduated in the first 3 years (e.g., if first graduation = 1898, within 3 years should include 1898, 1899, 1900. This is an example; do not use these numbers in your answer).  You can assume that GradYear is an integer.

Q2.  Display student names and their taken course names for all students with the middle name of ‘Muriel’. You may assume that name is always written as ‘First Middle Last’.  Your query output should be sorted from the highest grade to the lowest grade.

Q3.  For students who are either not enrolled in any courses or are enrolled in only 1 course, list those student’s names and graduation years.

Q4.  For each department, return the average length of a student name (you can use LEN function to get the necessary number, i.e. LEN(Name) will give you length of student name) .

 

Q5. Update all student records, to increase the graduation year by 1 for all students who live in Chicago

 

Q6. Modify the course table to add a Chair column that can contain up to 25 characters (that question requires a DDL rather than a DML SQL statement)

Part 3

  1. (8 points)
  • For the table below, fill in the missing values in W column, consistent with functional dependencies: XYàZA ,  AàW. You can make any necessary assumptions, but be sure to state them.
X Y Z W A
1 1 1000 Dog 2
1 2 2000 Cat 3
1 3 3000 Parrot 4
2 1 1000 3
2 2 1000 4
2 3 5000 5

 

  • Given the schema R and the following functional dependencies:

R(X, Y, Z, W, A)  with  XYàZA ,  AàW

does W determine A? (WàA?) Why or why not?

  • Suppose that you were also given relation S:

S(P, Q, U, M, N)

What functional dependencies (if any) can you assume?

  1. (7 points) Consider a TVShows table that keeps track of different TV shows. The table stores the show name and the year to which the entry refers.  Additionally, each row stores channel name, length of the show, the average cost of an episode, and the filming location’s zip, city and state.  Moreover, each entry contains the name of the lead actor and their salary.

The table is already in First Normal Form, and its primary key is (Show, Year).

The schema for the TVShows table is:

(Show, Year, Channel, Length, Cost, Zip, City, State, Lead, Salary)

You are given the following functional dependencies:

Show à Cost

Zip à City, State

Lead à Salary

Show, Year à Channel, Cost

  • Remove any existing partial dependencies to create a set of linked relational schema (copying functional dependencies does not define a schema, be sure to include primary/foreign keys here) in Second Normal Form.

 

  • Remove any existing transitive dependencies to create a set of linked relational schemas in Third Normal Form.

 

  1. (7 points)

Given the schema R and the following functional dependencies:

R(A, B, C, D, E)  with  ABàC,  CàD

  • Describe how to identify a primary key for relation R (the primary key is a minimal set of columns that determines all columns in the relation such that ? à ABCDE )
  • Decompose relation R into a relational schema in third normal form

Part 4

(50 points) 

Create the schema from Part 2-c in SQLite and populate it with at least 7 students, 4 courses, and 14 enrollments (at least one of the students should not be enrolled in any courses and at least one course should have zero current enrollments).

  1. Write python code to export three tables from SQLite into three individual .txt files

Include a screenshot of (any) one output .txt file

  1. Create a view that pre-joins the three tables, including all of the records from student and course tables
  2. Write python code that will use that view to export all data into a single .txt file (that is a “de-normalized” file with redundancy present)

Include a screenshot of the output .txt file

  1. Add a new record to the de-normalized file (you can manually edit the .txt file) from part c) that violates the following functional dependency:

CName à Department, Credits

(you can do so by creating a new record that repeats the course name but does not match either the department or the number of credits associated with this course)

  1. Write python code that will identify values for which functional dependency was violated (hint: when the functional dependency is valid, there is only one unique Department and only one unique Credits for each CName). Keep in mind that functional dependency is violated only in the .txt file, not in the SQLite database.

Execute the following query in parts f, g, and h below.

Q1: For every department, display the average grade (grade should be numeric) and the number of rows.

  1. Use the view from Part 4-b to write Q1 as a SQL query
  2. Use the single .txt file containing de-normalized data to answer Q1 in python (without using SQLite)

Include a screenshot of the output of your code

  1. Use the three .txt files of individual tables to answer Q1 in python (without using SQLite)

Include a screenshot of the output of your code


程序辅导定制C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB


本网站支持 Alipay WeChatPay PayPal等支付方式

E-mail: vipdue@outlook.com  微信号:vipnxx


如果您使用手机请先保存二维码,微信识别。如果用电脑,直接掏出手机果断扫描。

blank

发表评论

您的电子邮箱地址不会被公开。