# Python数据科学代写 | HW3 – Errorbars and correlation

• ALL

To complete this homework, you need to download one csv file, which contain the monthly totals
of the number of new cases of measles, mumps, and chicken pox, respectively, for New York City
during the years 1931-1971 (for a total of 41 years). The data file contains 123 rows and 12
columns. Each column represents a month from Jan to Dec. The first 41 rows are the number of
new measles cases in each year during that period, the next 41 rows are for mumps, and the
remaining 41 rows are chicken pox. The rows are ordered by the years in chronical order.

Complete the python script skeleton to analyze the data for the following tasks. For your
information, data has been loaded with the Pandas package and reorganized into a Numpy 3D
array of shape (3, 41, 12), where the first dimension represents the three diseases in the order
mentioned above. Several other variables are also defined for your convenience.

Q1 (10 pts). Calculate the mean of the number of cases per year for each disease, and estimate
the 95% confidence interval of the mean (Lec4.pptx slide #4). Plot as an errorbar. (Use marker=’d’,
linestyle=”, capsize=5 to show a figure similar to example Figure 1 on the next page.)

Q2 (10 pts). For each disease, calculate the fraction of cases occurred in each month of the year
during this period of time. You will end up with a matrix C of size 3 x 12, where each row is for a
disease, and the value in the i-th row and j-th column, Cij, is the total number of cases of disease
i occurred in month j (of all 41 years), divided by the total number of cases of disease i. (Hint: use
matrix multiplication instead of for loops for this if you can.) Plot the vectors as three lines in one
graph. (See example figure 2.)

Q3.1 (8 pts) Scatter plot the mean monthly mumps cases occurred in each month of the year
during this period of time against the mean monthly chickpen pox cases. In other words, you are
scatter plotting two vectors, x, and y, each of which has 12 values, representing the average
number of mumps (chicken pox) cases in each of the 12 months averaged over 41 years. (See
example figure 3.) Annotating the figure with months is optional (lecture2 slides #27).

Q3.2 (7 pts) Calculate the Pearson correlation coefficient as well as the spearman correlation
coefficient between the mean monthly mumps cases and mean monthly chicken pox cases (the
two vectors x and y you calculated in Q3.1), print out on screen, and display the values (with a
precision 0.0001) in the upper left corner of the figure (decide the x and y positions ad hoc from

Q4.1 (8 pts) Scatter plot the total number of mumps cases in each year against that of chicken
pox cases. In other words, you are scatter plotting two vectors, x, and y, each of which has 41
values, representing the total number of mumps or chicken pox cases in year 1931, 1932, etc.)
(See example figure 4.)

Q4.2 (7 pts) Calculate the Pearson correlation coefficient as well as the spearman correlation
coefficient between the annual mumps cases and annual chicken pox cases (the two vectors x
and y you calculated in Q4.1), print out on screen, and display the values (with a precision 0.0001)
in the bottom lower corner of the figure. (decide the x and y positions ad hoc from your fig).

Q5 (10 pts) Calculate and show the correlation matrix between each of the 12 months for the
number of mumps cases. More formally, you have a matrix M of size 41 x 12, where Mij is the
number of mumps cases in year i and month j. You need to calculate a matrix C of size 12 x 12,
where Cij is the correlation between Mi and Mj. Mi is the i-th column of M. Use plt.imshow(C) to
display the matrix, and plt.colorbar() to show the color map. Changing the months from 0-11 to 1-
12 is optional but can be done with xticks and yticks as usual: xticks(range(12), range(1,13)). (See
example Fig 5.) E-mail: vipdue@outlook.com  微信号:vipnxx 