Descriptive Statistics

I.TYPES OF DATA

Frequency distribution tables Bar Charts Pie Chart Pareto Diagrams
Screenshot 2024-11-12 at 6.01.38 PM.png Screenshot 2024-11-12 at 6.01.46 PM.png Screenshot 2024-11-12 at 6.02.03 PM.png Screenshot 2024-11-12 at 6.02.12 PM.png
Frequency distribution tables show the category and its corresponding absolute frequency. Bar charts are very common. Each bar represents a category. On the y-axis we have the absolute frequency. Pie charts are used when we want to see the share of an item as a part of the total. Market share is almost always represented with a pie chart. The Pareto diagram is a special type of bar chart where the categories are shown in descending order of frequency, and a separate curve shows the cumulative frequency.
In Excel, we can either hard code the frequencies or count them with a count function.
Total formular =SUM()
Bar charts are also called clustered column charts in Excel. Choose your data, Insert -> Chart -> Clustered column or Bar Chart. Pie charts are created in the following way: Choose your data, Insert -> Chart -> Pie Chart next section

IV.PARETO DIAGRAMS IN EXCEL
Creating Pareto in Excel:

  1. Order the data in your frequency distribution table in descending order.
  2. Create a bar chart.
  3. Add a column in your frequency distribution table that measures the cumulative frequency
  4. Select the plot area of the chart in Excel and Right Click
  5. Choose Select series
  6. Click Add
  7. Series name does not matter. You can out 'Line'
  8. For Series values choose the cells that refer to the cumulative frequency.
  9. Click OK. You should see two side-by-side bars.
  10. Select the plot area of the chart and Right click.
  11. Choose Change Chart Type
  12. Select Combo
  13. Choose the type of representation from the dropdown list. Your initial categories should be Clustered Column. Change the second series, that you called 'Line' to 'Line'

V.NUMERICAL VARIABLES & FREQUENCY DISTRIBUTION TABLE AND
HISTOGRAM

  1. Decide on the number of intervals you would like to use
  2. Find the interval width (using the formula above)
  3. Start your first interval at the lowest value in your dataset
  4. Finish your first interval at the lowest value + interval width (= start_interval_cell + interval_width_cell)
  5. Start your second interval where the first stops (that's a formula as well - make the starting cell of interval 2 = the ending of interval 1)
  6. Continue in this way until you have created the desired number of intervals.
  7. Count the absolute frequencies using the following COUNTIF formula:
    =COUNTIF(dataset_range,">="&interval start) -COUNTIF(dataset_range,">"&interval end)
  8. In order to calculate the relative frequencies, use the following formula: = absolute_frequency_cell / number_of_observations
  9. In order to calculate the cumulative frequencies:
    1. The first cumulative frequency is equal to the relative frequency
    2. Each consecutive cumulative frequency = previous cumulative frequency + respective relative frequency
  1. Choose your data
  2. Insert -> Charts -> Histogram
  3. To change the number of bins (intervals)
    1. Select the x-axis
    2. Click ChartTools -> Format -> Axis Options
    3. You can select the bin width (interval width), number of bins, etc
    4. Histogram with relative frequency Screenshot 2024-11-12 at 6.50.47 PM.png
      VI.CROSS TABLE
  1. Choose your data
  2. Insert -> Charts -> Clustered Column
    Selecting more than one series (groups of data) will automatically prompt Excel to create a side-by-side bar (column) chart.
    VII.SCATTER PLOTS:
  1. Choose the two dataset you want to plot
  2. Insert -> Charts -> Scatter
Chart Description
Screenshot 2024-11-13 at 10.48.29 AM.png Scatter plot shows no pattern because completely vertical 'form' show no associations
Screenshot 2024-11-13 at 10.49.15 AM.png Scatter plot that looks in this way shows a linear pattern, meaning that the observations move together.

VIII.MEAN, MEDIAN, MODE:

Covariance Correlation
Covariance is a measure of the joint variability of two variables.
- A positive covariance means that the two variables move together.
- A covariance of 0 means that the two variables are independent
- A negative covariance means that the two variables move in opposite directions.

Covariance can take on values from - infinity to + infinity. This is a problem as it is very hard to put such numbers into perspective.`
Correlation is a measure of the joint variability of two variables. Unlike covariance, correlation could be thought of as a standardized measure. It takes on values between -1 and 1, thus it is easy for us to interpret the result.
- A correlation of 1, known as perfect positive correlation, means that one variable is perfectly explained by the other.
- A correlation of 0 means that the variables are independent.
- A correlation of -1, known as perfect negative correlation, means that one variable is explaining the other one perfectly, but they move in opposite directions.
Screenshot 2024-11-13 at 11.31.26 AM.png Screenshot 2024-11-13 at 11.37.48 AM.png
In Excel:
Sample covariance: =COVARIANCE.S()
Population covariance: =COVARIANCE.P()
In Excel:
=CORREL()