This task is a real-world data mining problem. You are required to prepare a set of presentation slides that must include (1) the full name and student number of each student in the group, the contribution (in percent) of each group member, (2) your proposed data mining approach and methodology; (3) the strengths and weaknesses of your proposed approach; (4) the performance measures that can evaluate your data mining results; (5) the results and a brief discussion. Below is the recommended structure of your slides:
- Introduction (define the problem and the goal)
- Methods (propose approaches, and discuss their strengths and weaknesses)
- Results (Figures and tables of data analysis)
- Discussion (discovered knowledge from data mining)
Task: Airplane Model Recognition
Airplane model recognition with data mining algorithms is a challenging issue. There are several aspects that makes airplane model recognition challenging. Firstly, aircraft designs span a hundredyears, including many thousand different models and hundreds of different makes and airlines.
Secondly, aircraft designs vary significantly depending on the size, destination, purpose, propulsion,and many other factors including technology. Thirdly, any given aircraft model can be re-purposed or used by different companies, which causes further variations in appearance. These, depending on the identification task, may be consider as noise or as useful information to be extracted. Finally, aircraft are largely rigid objects, which simplifies certain aspects of their modeling.
Recently, there has been an increasing interest to develop deep learning based prediction models for aircraft recognition due to their powerful feature representation capability. Briefly, deep learning models automatically learn feature descriptors (can be understood as attributes in data mining) from aircraft images and use them to train classifiers that can distinguish between different airplanes. This task is about training classification models for airplane model recognition. FGVC Airplane is a public airplane recognition dataset and widely used for the development of airplane recognition models.More details of the dataset can be accessed from https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft.
Some example images of this dataset are shown below.
The feature descriptors of FGVC Airplane dataset produced with a deep learning model (DenseNet 201 model trained on ImageNet dataset) has been provided to you with this instruction as the “airplane-model-recognition.zip’ file. By unzipping it, you shall find the following two files:
1) “training.csv” with 6667 feature descriptors extracted using images from training split of FGVC Airplane dataset. You should use these descriptors for training.
2) “testing.csv” with 3333 feature descriptors extracted using images from testing split of FGVC Airplane dataset. You should use these descriptors for test.
3) Both files has the following data format: image_name<>class_name<>feature_descriptor.There are 102 image classes.The goal of this task is to train classification models for airplane model recognition using provided feature descriptors.
- Get yourself familiar with the FGVC Airplane dataset and the provided training and test sets.Present a general description of the dataset and present the general properties of the dataset.
- You are required to implement three classification methods to predict the classes of the airplanes.You shall correctly use the provided training and test sets. Also, you need to tune the hyperparameters of your classification models in a principled way.
- Discuss any data preprocessing or post processing and selection of attributes which have been applied.
- You need to provide the performance measures of your classification results.
- Compare the classification models you have implemented and discuss their advantages and disadvantages.
本网站支持 Alipay WeChatPay PayPal等支付方式
E-mail: email@example.com 微信号:vipnxx