In this assignment, students will conduct preprocessing on data sets on their Application using the concepts (Attribute based filtering, Instance based filtering, Supervised Filtering, Unsupervised filtering) such as data summarization, cleaning and transformation ON WEKA.

Focus on the same application system. As stated earlier, you will continue to work on this system as you develop your project. Download the WEKA system and get familiar with its functions. Study the related documentation from the site.

Use Vote data set. This should be in ARFF format. In addition, create data set for your Application. This should be converted to CSV file format. Your project data set should be large enough, i.e., a bare minimum of at least 100 instances and 10 attributes after integration into a common repository, which can then be converted to ARFF.

On both these data sets, execute data preprocessing operations that help you draw basic conclusions to help better understand the data. These preprocessing operations should also set the stage for further work, especially in your project data set. Refer to the assignment where you decided which technique is best suited for your application, and preprocess the data keeping those goals in mind. You must be creative here in choosing relevant operations and in exploring the different functions provided by the WEKA tool. There must be at least five significant operations executed on each data set with justification. Please note that all these operations must pertain to pre-processing. So do not execute the actual mining techniques such as clustering etc.

Accordingly, please make slides to show your work. You should bring a printout of the same and hand it in class. The slides should include:

The WEKA data set you used with an explanation based on your understanding of it.

  • Execution of relevant data preprocessing operations on this data set along with a justification of why you chose them.
  • Any useful inferences you can draw from this preprocessing.
  • Your project data set with a description of the relevant data. Please note that you must integrate everything into a common file for mining.
  • Execution of relevant preprocessing operations with justification.
  • Basic conclusions and explanation of how this preprocessing will aid further analysis in your system.