Calculating Confidence Interval for Classification accuracy

Prologue: I have been working and practicing various skills and algorithms as a progress to show on my road-map to become as a matured data scientist. As a part of this expedition I have decided to document all those stuffs I am going through. So whatever you read under this column will be either be summary of my understanding or a post explaining the details of my experiment.

I really had hard time in understanding and calculating the confidence intervals, a mathematical technique to evaluate the accuracy of experimental classification models. Books and wikis so far I read have taken time to explain the derived formula for confidence interval but they somehow failed to explain the intuition behind the writing for non-mathematical practitioners. And I write this post about confidence interval to fill that gap of intuition…!!!


Say you and your friend have developed two classification models for same classification problem given by your teacher.

Your model Ma has an accuracy of 80% when evaluating with test-set containing 100 records.

Your friend model Mb has an accuracy of 75% when evaluating with test-set containing 1000 records.

The question is how will your teacher decide which model is a better model. Is Ma a better model than Mb? Although Ma has higher accuracy, we can’t concretely say its a better model because it is tested with less data-set.

So what will the teacher do now? This is where Confidence Interval comes into play. With respect to our problem, Confidence Interval is a mathematical technique to answer how much confidence can we place on the accuracy of a model when increasing the number of data sets.

Consider the below table that depicts the confidence level of some model Mx Over different data sets. (Calculation of confident level is discussed in latter part of this post.)






Confidence Level

0.584 -0.919

0.670- 0.888

0.701- 0.807

0.789- 0.811

By referencing the table, you can say that you are confident enough (in the range of 70% to 80%) that the model Mx will give the promised accuracy if the no.of data set is between 200 to 499. That’s the big picture of confidence interval.

Jumping  deep into of confidence interval. 

Now imagine you have a problem to a statistician to find the mean height of all students in your state.

As a statistician he would first prefer to consider a sample of students from your state and find the sample mean height(Ẋ). And then by using the sample mean (Ẋ) he would estimate the mean height (µ) of the population itself (Here population refers to all students in your state). This is much simpler way without wasting time and resource to estimate the mean height all students in your state.

But how can one be sure that estimated mean height of all students is accurate. Because the calculated mean height of all students is based on the values of sample mean. Which means that when we change the sample set we might get entirely different mean height of all student. Thus there is always a marginal error in the calculation. And it is for this reason confidence interval is derived to represent the accuracy of statistical calculate with in some range.


It not only just means, we can even estimating the chance of winning binomial experiment or we can estimate the standard deviation by calculating the sample proposition or sample standard deviation.



Standard deviation







Confidence Interval formal for binomial experiments

Coming back to our initial problem i.e. Which model is a better model. We have to consider the classification model’s predication as a binomial experiment where all correct predication are consider as true value or success case. Then confident interval CI can calculate by


where p = proportion of interest

n = sample size

α = desired confidence

Although above formula is simple and easy to use it suffer from accuracy when N is sufficiently large. Thus use more complicated formula to calculate the confidence interval (Calculating Confidence Interval for Classification accuracy).

Confidence Interval


N = Sample size

X = is number of records correctly predicted.

empirical accuracy (acc) = X/N

alpha = 1-CC

cc = confidence coefficient.


and then calculate the normal area curve in the table for 0.4950 will give you Z α /2.

Therefore substituting all values in the formula we end up with results that depicts confidence level for the promised accuracy of the model. 


  1. Linchuan XU wrote
    at 8:02 AM - 17th November 2014 Permalink

    Hi thanks for your intuitive explanation.

    But how do you get these two formulas to calculate the confident interval?
    Is there any theoretical foundation? or any reference?
    Thanks again.

  2. shakthydoss wrote
    at 8:24 AM - 17th November 2014 Permalink

    Hi Linchuan,

    I don’t remember the reference exactly.
    But you can search in and

Post a Comment

Your email is never published nor shared. Required fields are marked *