Custom Search

Tutorial:
SQL
VBA

Home >> Data mining >> Model Evaluation

Model Evaluation

Model Evaluation is important phase in Data Mining Process. It comes after the model development phase. I am going to discuss it first because i think before you develop model you should need to know what actually you are doing. What is the meaning behind the statistics shown under weka screen(when you develop model). Once you have clear picture about these statistics, it will be easy for you to build model and compare it with another model that use different parameter to solve the same problem.

Well when you evaluate your model you must need to consider the "Reliability" and "Validity" of the model. Suppose you had created a model to solve a specific problem and you use that model to solve 100 problems. But you get success only 50 times. That means your model is 50% reliable, which is definitely not good to solve the problems in real life especially where we don’t have much choice to make any mistake (like Medical, Military, Finance field). Validity of the model is also very important criteriac to measure the model. Model is valid if it satisfies the original objective behind model.

Here i am going to cover following model evaluation techniques.
Confusion matrix, bagging & boosting, Lift Chart, Reciever Operating Curve, Leave-One-Out, Hold-Out, K-flod Cross-Validation, training data and test data.

Confusion Matrix
When you are new to confusion matrix it may be confusing for you how to read the contents of the confusion matrix. That way it's called confusion matrix. In weka when you develop a model to solve the problem most of time you will get result in terms of confusion matrix (2X2, 3X3) and statistical figures(discussed next to confusion matrix). Consider the following confusion matrix.

In the above matrix we can see 11 peoples are predicted as Sick, but in actual 10 people are sick. 1 person is healthy(wrong prediction).
In case of healthy, model predict 9 healthy peoples but in actual out of 9, 7 are healthy and 2 are sick(wrong prediction).

Confusion Matrix is used as performance criteria for model eveluation. Consider the following criteria for Confusion Matrix

Accuracy = 10+7 / 10+1+2+7

True Positive Rate 7 / 2+7

False Positive Rate 1 / 10+1

True Negative Rate 10 / 10+1

False Negative Rate 2 // 2+7

Precision 7 / 7+1

Bagging and Boosting

Lift Chart

Reciever Operating Curve(ROC)

Leave-One-Out

Hold-Out

K-fold Cross-Validation

Training data and test data