Custom Search

Tutorial:

SQL
VBA

Descriptive Statistics

Descriptive Statistic with Excel
Descriptive Statistic with SPSS
Cross tab with SPSS
Frequency with SPSS


Home >> Data Analysis >> Descriptive Statisctics

Descriptive Statistics

Descriptive Statistics is basic sort of analysis, but it is quite useful and important in any analysis process to get initial understanding about the data. Once we have initial understanding we can work on data more deeply. All most every data analysis software/package has facility to see the data into descriptive form.
Descriptive Statistics maily covers:

Measure of Centeral Tendency

Covers average(mean) of the given data, which values mostly appears in the data set(mode) and what is the center point of the data (median).

Measure of Dispersion

Covers Standard Deviation, Variance, Range, Standard Error Mean

Measure of Distribution

Covers Skewness and Kutosis


Mean:
Represents the average for a particular column.

Median:
Represents the median(mid value in the given data set after sorting)

Mode:
Represents the value most appears in the data set(for particular column)

Standard Error or Standard Error of Mean:
It is a value observed from hypothetical sample and indicates mean can vary from one sample to another sample taken from the same population.

Variance:
A measure of Dispersion (based on mean) estimate the variance of given data. Measure how different the individual value of the data from one another.
Data with low variance (6, 7,7,6,7,6). Data with high variance contains values that are not similar4,233,3,644,8.

Std
Rule: In a normal distribution, 68% of data fall within one standard deviation of the mean, two standard deviation covers 95% data. Standard deviation
is used to find how values are different from its mean. If the standard deviation is higher that mean there is grater difference between the values in
the dataset and mean is less significant (means not represents the true picture).

Range
Range is the difference between the largest and smallest value in the dataset.

Minimum:
Represents the minimum value in the data set.

Maximum:
Represents the maximum value in the data set.

Kurtosis (Distribution)
Kurtosis is a measure of distribution which describe how data is cluster around its mean or in other words most of data is covered under one standard deviation. If most of data is cluster around its means then we say kurtosis has peakedness.

Skewness (distribution)
Used to measure the skewness of data as compare to normal distribution. In normal distribution skewness is always zero. If you diagram is towards the right as compare to normal distribution, that mean positive skewness(mean is grater then median) and if diagram is towards the left as compare to normal distribution that mean negative skewness(mean is less then median).

Descriptive Statistics with Excel

For this example i am using northwind database(Sale by Category) that comes up with Ms-Access.
1. Open the Data(Excel sheet or Data from External Sources) into excel.
2. Subtotal data based upon CategoryName as shown below.

3. Click on "1" in left upper corner. Your data will look as follows.

4.Now select the ProductSales column. and click Alt+;(semicolan).
5.Copy the data and paste into another sheet.

6.Click on Tools menu and then click on Data Analysis Option.
7.In the Data Analysis box select Descriptive Analsysis as shown below.

8. A descriptive statistics box will be open. Select the option you want to show and click on ok.

9.You can see descriptive statistics on the new sheet after pressing Ok.


Descriptive Statistics with SPSS

1. Copy the ProductSales Column from excelsheet and past into new SPSS file, as shown below.
Note:You can extract directly from Northwind database into SPSS file by going through File>Open Databse>New Query.

2.Click on Analyze(menu)>Descriptive statistics>Descriptives.

3.In the Descriptive box click on option and select the option that you want to add, as shown below.

4.Click ok and it will show the results.

Z-Score

Z-score is used to measure the standard deviation number for a given observation( particular data value) with respect to its means. We use following formula to calculate the Z-score.
If z-score is a negative value then, it means data value is less then its means. If z-score value is positive, it means data value is higher then its mean.

Crosstab Example with SPSS

In the following example i used the data collected during the enrolment process in xxxx university. Number of questions were asked from different students. But we are going to analyze responses of only two questions. These are:

1. How important it is for you to study in your field?
2. What do you think how important this course would be to find you a suitable job in your career?

Each question has following options to answer.

1. Not Important
2. Not Sure
3. Little Important
4. Important
5. Very Important

From the above Crosstab result we can see there are 533 responses we got. Out of 533, 5 responses are missing or not answered. We have 528 valid responses for analysis purpose, which is 99.1% of overall survey. This figures are shown in the first table.

Second table represent the cross tab comparision for the given responses. As we can see out of 528, 355 students are very much interested in their course and out of 355, 197 feels that it could be very important move to get suitable job in their career. There are 11 students who are doing study in their choosen field but they are not interested in the course at all, but out of those 11, 4 students feels it could be important move to get suitable job in the end.

Frequency Distribution with SPSS





Descriptive Statistics

Go to Top