Custom Search

Tutorial:

SQL
VBA

Inferential Statistics
Estimation
Hypothesis Testing
Linear Regression and Inferences

Home >> Data Analysis >> Inferential Statistics

Inferential Statistics

Descriptive Statistics is very useful to get the understanding about the sample (extracted from population), but we can not use it to make some conclusion about the population. Inferential Statistics is used to make conclusions about the population based on the finding of sample statistics. In normal life when we get views from different people about some product we normally make our judgment about that product based upon what most of people think about it. So based on response of peoples we conclude something about the product. In market research area it is not possible to analyze whole population. So, market researcher get sample (represent the true picture of the population) from the population to find something related to their research and then make conclusions about the population parameters depends on sample statistics.

Inferential Statistics can be divided into two categories

Estimation
Estimation also known as parameter estimation helps to estimation the population parameters based upon descriptive statistics or sample findings. For example based upon sample mean and/or standard deviation we can find population mean and/or standard deviation. But before we can start estimation we need to make sure what sort of data we have to analyze. For qualitative data we have different estimation methods and for quantitative data we have different estimation methods.

In qualitative data almost every time we are interested to divide the sample results into two or more categories/groups based upon different characteristics. For example how many people are interested to buy a new product from X company. So based upon interested or not-interested we can form two groups. If your manager is interested to find out what proportions of peoples are interested to buy new product, you can find this through sample findings. Because in this case we are not using the whole sample (only that part of sample who are interested) to estimate the population parameter, we can denote this small proportion of sample by pie symbol (as shown below) and the proportion of Population that we want to estimate can be denoted as "p".

So if 75% of peoples(from sample) are interested to buy the new product then
NOTE: Estimation can be done in two ways. One is point estimation. As in the above examples we see that 75% peoples are interested to buy new product, this single value known as point estimator. Second is interval estimation. In some situation we are not sure about the exact value, but we are very much sure about the range in which exact value can exist. So if we say peoples who are interested to buy new products lies between 70%-80%, that means we are making interval estimation rather then point estimation.

Up to this point we only analyze the one qualitative variable. But in real world we may have to analyze two or more qualitative variables at once. For example you manager may be interested to see the results in terms of gender specification. That mean you also need to find how many male and female are interested to buy new product. So we have two qualitative variable interest (yes, no) and gender (male, female). So we divided the population into two proportions based on gender specification.

We can denote male by p1 and female by p2 (sample proportion) and in case of population we can denote male by pie1 and female by pie2. As we noticed earlier 75% peoples are interested to buy the new product. That means if we have 400 peoples (sample size), 300 are interested to by the new product. Out of 300, 100 are male and 200 are females. That means p1 = 33.33 % are males and p2 = 66.67% are females. Similarly we can calculate value for pie1= (population size * 0.75) * 0.33 and pie2 = (population size * 0.75) * 0.66.

As discussed earlier, in case of quantitative data we may be interested to estimate the population parameters such as mean, standard deviation from sample findings. Notations for population parameters and sample parameters are given below.

Hypothesis Testing

Hypothesis is just like statement or assumption that we made about something. For example a market researcher may assume that if they are going to spend $5000 for advertisement in a small geographical area, it will boost their sale up to 20%, and they can get profit around $7000 in addition to their normal profit from that area. That means he is assuming that he can increase the profit up to $2000 (Total profit $7000 - Advertisement Cost $5000). This hypothesis is based upon results from other areas.

Is this hypothesis will be true for current selected area? Well may be yes or may be not. The reason is people may have different life standard (up or low) as compare to previous selected areas. So it is impossible to say what will be exact outcome. Hypothesis testing is used to come up from such problems.

When we perform hypothesis testing we made two hypothesis statements. One Null hypothesis and other is Alternative hypothesis. In all hypothesis testing we perform following five steps.

1. Create or state Null and Alternative Hypothesis.
2. Define Test procedure.
3. Collect Data and calculate statistics.
4. Accept or reject Null hypothesis.
5. Interpret the result.

There are few statistical test available for perform hypothesis testing. Each test has its own importance and used in different situations.
z-test Z-test is used(in case of test for mean) when we have population standard deviation (from historical data), and sample size is grater then 30 (if it is less but still we know the population standard deviation we can use Z-test).
T-test T-test is used when sample size is less then 30 and we don’t have population standard deviation.

Alpha value or significance value In addition to above test we also need to define the significance value (denoted by alpha sign) to perform hypothesis testing. This value helps to accept or reject a Null hypothesis. According to standard rule or most of researchers set alpha value = 0.05. This value is predefined, that mean before performing hypothesis testing we should define the alpha value.

Linear Regression & Inferences
Linear regression is useful when we have only two variable, one is independent variable know as, and other is dependent variable known as . With the help of linear regression we try to predict the value of dependent variable based on the value of independent variable. Before we can perform linear regression we must need to determine is there linear relationship exist between two variables. If it is so, than we can use linear regression other we need some other technique to predict the value of dependent variable.

With the help of scatter plots we can determine wether linear relationship exists between the two variables. Linear relationship exists between two variable if increase in value x results in increase in value of y variable. In the following figures we can see, in first figure there is no linear relationship between two variables, but in second figure we can see that there is linear relationship between two variable because we can see the increase in revenue as we spend more money on advertisement.

Linear relationship can be positive or negative. As you can see in the following figures first figure is an example of positive linear relationship and the second figure is a example of non linear relationship.
So, once you know that linear relationship exists between two variable next step is to find the equation (also called model in regression analysis) by which you can determine the value of dependent variable. In linear regression model relationship between dependent(y) and independent(x) variable is described by the following equation.

Correlation