4 modules are ALLOWED to use : matplotlib, Tkinter, and pandas and requests,
CSV module is NOT allowed to use, and .read_csv() method from pandas library is NOT allowed.
For any other modules you might think of using you MUST ASK on Piazza first, and instructors will advise accordingly; don’t presume you can use a module not listed in the allowed above without asking your instructors first.
For this assignment, you’ll select at least 2 sources of data, .csv files, that you will read using Python code. The data in these csv files need to be read into objects of at least 2 classes that you have designed and written for this project.
Data Sources and Data Cleaning
We recommend that you use 2 .csv files available from the Vancouver Open Data project
(https://opendata.vancouver.ca/pages/home) /, but if you have other online sources of data that you would like to use, you can request permission to use other data sources on Piazza (for example, you might find something that you are interested in analyzing at the federal level from Canada’s open data https://open.canada.ca/en/open-data
(https://open.canada.ca/en/open-data) . The only thematic or topic requirement for the data is that it should be something related to or derived from a Canadian source or about a Canadian industry, topic, place, etc. We encourage you to investigate some topic that you are interested in,possibly even one that relates to work you would be interested in doing (profit or non-profit) or an area of study you’ve previously pursued.
Your program should access the .csv files from the web using the requests module, and should do all of its parsing and data cleaning inside your program. Real-world data, even data provided by government sources, can be quite messy and difficult to deal with, so be sure to choose your datasets early and ask for help often if you have problems with cleaning your data. We are happy to help!
Data Analysis Requirement
The next step in the project, after you’ve successfully read, and cleaned, the data from 2 or more sources into 2 or more classes, is to perform some sort of analysis of the data that you would not be able to do with just one of the datasets. Generally, this will require that you relate the two objects in some way, and often you will create a data structure (a list, a dictionary, etc.) that has the new information that you’ve been able to create as a result of combining the two or more data sources.
You should be able to create some sort of printout of this result; it doesn’t matter how it looks, this is just basically a way to check that your analysis has worked. You will replace this printout with a visualization in the next step.
There’s a requirement for Milestone 1 that you write out the overall design of your project in 3-4 paragraphs; see Milestone 1 description for details.
Data Visualization Requirement
The next goal is to use the matplotlib library to create a visualization of your data analysis. We will provide a starter code to help you with this step, but you can also expand or add additional elements based on some other resources and guides to the matplotlib module that we’ll provide.
The Controller Requirement
The final element we’d like to see in your project is some element of interactivity: at minimum, the user should be able to enter something in the terminal in response to a prompt that impacts how your data is visualized in some way – this could be selecting years, restricting the data in some other way, but it should impact the data structure that you’re graphing in some way by adding or removing elements, or changing their order (sorting). You can also use an interactive clickable element in a graphical user interface, but this an additional optional element, and I strongly recommend that you first get a terminal interaction with the user working first.
The Test Suite
All of the functions and/or classes in your model should be tested using a test suite. You You should test the ‘happy path’ (expected method of working) in all your functions/methods. If your functions/methods raise errors, you should test that they raise errors properly. The exception is file errors; these are difficult to test in a test suite using the techniques we’ve learned so far, so you don’t need to test file errors, although you should handle them.
Additional Considerations and Constraints
While your classes are obviously object-oriented design, the rest of your design -can- but object oriented, but it can also be procedurally designed (functions-based).
You cannot use more than 2 global variables (other than unchanging CONSTANTS) in your code.
The 2 global variables will probably need to be a global window and/or figure for any GUI you use for visualization. You cannot use global data structures – that is, you can’t have a global dictionary, list, tuples, etc. – these will be seen as an attempt to circumvent the 2 global variable limit.
You cannot import any modules other than required libraries without permission (ask permission on Piazza). You can definitely use the regular expression library (re), the math library, the requests library, matplotlib, and numpy and/or pandas: however, there are restrictions on what you can do with numpy and/or pandas: you can create a basic data frame or numpy array from your objects,and run basic mathematical transformations on the data in the data frame, but that’s all. These two libraries are large and complicated, and we don’t want to introduce additional complexity into an already-large project. If you have questions about which numpy and pandas methods are allowed, please ask.
You can use any methods in the standard Python library without restriction.
Remember that all your functions and/or methods should still show procedural decomposition – that is, each of them should do one thing and one thing only. This will make them much easier to test, and it will also improve their reusability – you may be able to use a lot of the same code to clean or parse one .csv file that you use to parse the other.
All of your functions or methods must be documented. Your classes should also be documented, and the methods within them should be documented. The exception is the test suite: so long as your tests have sufficiently descriptive names, you do not need to write documentation for them.
Your code should, as always, follow the style guide for the course.
Your code that handles files should be written defensively.
You cannot import static images (e.g. .pngs, .gifs) for your graphs: all data visuals/graphs must be generated using Python code inside your project.
You may re-use code that you have written during this semester and submitted as part of a graded assignment. You may only use code that you have written yourself or that was provided to you as starter code. Any code that you use from previous assignments should be revised and rewritten to conform to our current understanding of good code practices, with functional decomposition, error handling as necessary particularly for files, good docstrings, etc. This rewrite requirement includes any starter code that you use that was provided to you.
Your final project includes a required codewalk. This codewalk will be a 30 minute scheduled online meeting with your professor and possibly TAs or additional faculty members if invited by the professor. You will register for a 30 minute time slot for your code walk on a thread the professor posts for this purpose on Piazza.
During this 30 minute meeting, you will share your screen and:
Explain the flow of control through your methods and functions: specifically, start with the method or functions where the .csv files are accessed and read, and then take us through the process by which the data is cleaned and parsed into at least 2 objects.
Describe your chosen analysis: what are you doing to produce knowledge that wasn’t possible with the 2 separate data sources. Explain to us the functions or methods that perform this analysis, explain the data structure that the analysis results are stored in, and explain why you chose that structure.
Show us the code that allows user interaction with your data, and in particular show us and explain what changes to the data are made as the result of the user input and where in the code this happens. If this code is not yet complete, explain your plans for this code.
Finally, show us the graph or visualization that is produced by your code, and explain how your code is generating that visualization. You don’t need to explain matplotlib, but rather focus on the data structure that you’re using to make the visualization, and how that data structure is being represented in the graph.
The above is an extremely detailed version of what’s commonly called a “codewalk” in industry and academia, and this will be the first of many times you’ll be asked to explain what your code is doing. I will not be grading you on your presentation style, conversational fluency, etc.: this codewalk is just about you demonstrating that you can describe what your code is doing and how it is doing it. You often need to perform codewalks on code that is not fully functional (yet): that’s perfectly normal. This codewalk is your chance to explain your code and ideas to us, even if you haven’t gotten it all working yet.
Before the Codewalk
1.Make sure that your camera works, and that you can share your screen and have your camera on at the same time.
2.All codewalks will be conducted remotely.
3.Think through questions that your audience might ask about your code and practice answering them.
How your codewalk will be evaluated – 100 points
The code that imports the data from the .csv files on the web is clearly identified and explained.
The code that cleans the data is clearly explained, where the changes made from the raw data are explained, and how those changes are accomplished in the code is explained. (10 points) At least one problem that was encountered in cleaning the data is explained, as well as how that problem was solved in the code. (10 points)
The code that stores the data into objects is clearly explained, including the overall flow of control in the code. (10 points)
The analysis that is performed on the data is clearly explained, the code that performs this analysis is described, and the data structure that the results are saved in is identified and described. (10 points)
It is explained how this analysis was not possible without combining the two or more data sources.
The input from the user that will be taken is explained. If the code is currently functional, that code is identified and walked through; if not, the future code is described. (10 points)
The data visualization that results from the analysis is described, and the impact that the user input will have on the visualization is described. If the code is currently functional, that code is identified and walked through; if not, the future code is described. (10 points)
The student, to the best of their ability, answers the professor’s questions (10 points)
The code walk is well-paced, clear, and demonstrates the student’s abilities and understanding of the topic (10 points)
Collaboration and code sources
Copying code from anywhere on the internet is strictly forbidden. Copying code from other students’ homeworks or from lab groups that you did not participate in is strictly forbidden. Collaborating with other students on code is not permitted. You may discuss concepts and challenges in this homework with other students when you are both far, far away from your keyboards and not looking at code.
However, you may not look at other students’ code or show other students your code. You may only discuss the concepts and challenges in this project verbally.
Practice sharing your screen while talking about your code.
本网站支持 Alipay WeChatPay PayPal等支付方式
E-mail: firstname.lastname@example.org 微信号:vipnxx