Data Mining 101

Use the 10% data from KDD Cup 1999 Dataset located at… for this exercise.

Create 2 training sets by selecting samples from this data set and evaluate them using decision trees (such as J48 in Weka). You can use random sampling or any other selective sampling technique. Compare the decision trees you find and describe any key changes between the trees. Comment on why these changes may be occuring by looking at the class distribution in your samples or the size of your training samples.

You may use alternate analysis techniques such as clustering and associations to supplement your analysis (although this is not required).

Submit a word document of your assignment, please make sure to include the decision tree snapshots and other relevant snapshots in your assignment. You do not need to include snapshots of every intermediate step or analysis.

You can use weka or any other alternative data mining tool for this assignment.

