Monday, October 27, 2014

Predictions - Effect of unique number of target classes on accuracy




When we perform machine learning of type classification, the target variable is a categorical (nominal) variable that has a set of unique values or classes . It could be a simple two class target variable like "approve application? " with classes (values)  of "yes" or "no". Sometimes they might indicate ranges like "Excellent", "Good" etc. for a target variable like satisfaction score. We might also convert continuous variables like test scores (1 - 100)  into classes like grades (A, B, C etc).

This experiment is to find the effect of the number of unique classes in the target variable on the accuracy of the prediction. The hypothesis is that accuracy will go down as the number of classes increases. This is because, with each additional class boundary, there is additional chance of a predicted sample to end up on the wrong side of the boundary.

For this experiment, I used a data set of  blood pressure levels. Each observation contains the patient's demographics and the actual systolic blood pressure measured. The value of the blood pressure is the binned into multiple classes (blood pressure ranges). Prediction of the blood pressure range is then done for varying number of bins (classes). The results are then tabulated as follows.


The experiment confirms the hypothesis. Accuracy drops sharply as the number of classes in the target variable increases. It does taper out beyond as size of 8.






2 comments:

  1. I have read your blog and I gathered some needful information from your blog. Keep update your blog. Awaiting for your next update.
    Data Science Online Training
    Hadoop Online Training

    ReplyDelete
  2. Hello,
    The Article on Effect of unique number of target classes on accuracy is really amazing give detail information about it .Thanks for Sharing the information about it. data science consulting

    ReplyDelete