Prologue: I have been working and practicing various skills and algorithms as a progress to show on my roadmap to become as a matured data scientist. As a part of this expedition I have decided to document all those stuffs I am going through. So whatever you read under this column will be either be summary of my understanding or a post explaining the details of my experiment.
I really had hard time in understanding and calculating the confidence intervals, a mathematical technique to evaluate the accuracy of experimental classification models. Books and wikis so far I read have taken time to explain the derived formula for confidence interval but they somehow failed to explain the intuition behind the writing for nonmathematical practitioners. And I write this post about confidence interval to fill that gap of intuition…!!!
Say you and your friend have developed two classification models for same classification problem given by your teacher.
Your model M_{a }has an accuracy of 80% when evaluating with testset containing 100 records.
Your friend model M_{b} has an accuracy of 75% when evaluating with testset containing 1000 records.
The question is how will your teacher decide which model is a better model. Is M_{a} a better model than M_{b}? Although M_{a} has higher accuracy, we can’t concretely say its a better model because it is tested with less dataset.
So what will the teacher do now? This is where Confidence Interval comes into play. With respect to our problem, Confidence Interval is a mathematical technique to answer how much confidence can we place on the accuracy of a model when increasing the number of data sets.
Consider the below table that depicts the confidence level of some model M_{x }Over different data sets. (Calculation of confident level is discussed in latter part of this post.)
N 
40 
80 
200 
500 
Confidence Level 
0.584 0.919 
0.670 0.888 
0.701 0.807 
0.789 0.811 
By referencing the table, you can say that you are confident enough (in the range of 70% to 80%) that the model M_{x} will give the promised accuracy if the no.of data set is between 200 to 499. That’s the big picture of confidence interval.
Jumping deep into of confidence interval.
Now imagine you have a problem to a statistician to find the mean height of all students in your state.
As a statistician he would first prefer to consider a sample of students from your state and find the sample mean height(Ẋ). And then by using the sample mean (Ẋ) he would estimate the mean height (µ) of the population itself (Here population refers to all students in your state). This is much simpler way without wasting time and resource to estimate the mean height all students in your state.
But how can one be sure that estimated mean height of all students is accurate. Because the calculated mean height of all students is based on the values of sample mean. Which means that when we change the sample set we might get entirely different mean height of all student. Thus there is always a marginal error in the calculation. And it is for this reason confidence interval is derived to represent the accuracy of statistical calculate with in some range.
It not only just means, we can even estimating the chance of winning binomial experiment or we can estimate the standard deviation by calculating the sample proposition or sample standard deviation.
Mean 
proposition 
Standard deviation 

Population 
µ 
P 
σ 
Sample 
Ẋ 
Ᵽ 
Ŝ 
Confidence Interval formal for binomial experiments
Coming back to our initial problem i.e. Which model is a better model. We have to consider the classification model’s predication as a binomial experiment where all correct predication are consider as true value or success case. Then confident interval CI can calculate by
where p = proportion of interest
n = sample size
α = desired confidence
Although above formula is simple and easy to use it suffer from accuracy when N is sufficiently large. Thus use more complicated formula to calculate the confidence interval (Calculating Confidence Interval for Classification accuracy).
where
N = Sample size
X = is number of records correctly predicted.
empirical accuracy (acc) = X/N
alpha = 1CC
cc = confidence coefficient.
and then calculate the normal area curve in the table for 0.4950 will give you Z α /2.
Therefore substituting all values in the formula we end up with results that depicts confidence level for the promised accuracy of the model.
2 Comments
at 8:02 AM  17th November 2014 Permalink
Hi thanks for your intuitive explanation.
But how do you get these two formulas to calculate the confident interval?
Is there any theoretical foundation? or any reference?
Thanks again.
at 8:24 AM  17th November 2014 Permalink
Hi Linchuan,
I don’t remember the reference exactly.
But you can search in mathworld.wolfram.com and en.wikipedia.org
Post a Comment