Goals of Assignment 2
Increase Familiarity with Definitions of Descriptive Statistics
Increase Familiarity with Statistical Methods
Increase Familiarity with Computer Programs
Part 1
Methods: Range, mean, median, mode, kurtosis, skewness, and standard deviation were defined by the author, in his own words. Data supplied by Dr. Ryan Weichelt was used to hand calculate a standard deviation for two sets of data: the first standard deviation was calculated using a sample of student's standardized test scores from Eau Claire North High School and the other standard deviation was calculated using a sample of student's standardized test scores from Eau Claire Memorial. Microsoft Excel was then used to calculate the range, mean, median, mode, kurtosis, skewness, and standard deviation for the Eau Claire North and Eau Claire Memorial data samples. These results were then used to answer the following question; "Should Eau Claire North teachers worry about not having the highest test grade?"
Definitions
Range: The range is the difference between the highest and lowest scores; max score - low score.
Mean: The mean is the calculated average of all observations, found by adding all observations together then dividing by the total number of observations.
Median: The median is the midpoint of the data set; half of the observations fall above and half fall below this point. If the total number of observations is odd, it is simply the middle number in the data set and, if the total number of observations are even, it is the average of the two middle observations.
Mode: The mode is the most frequently occurring data point found in a set of observations.
Kurtosis: Kurtosis (Fig. 1) is the shape of the distribution curve and can be mesokurtic (normal), leptokurtic (peaked), or platykurtic (flat). A value greater than 1 is considered leptokurtic while a value less than -1 is considered platykurtic. Curves that are lepto- or platykurtic have this shape as 68% of the observations one standard deviation above and below the mean have to fit in that area (Fig. 2). Leptokurtic distributions have smaller standard deviations as there are more observations near the mean while platykurtic distributions have larger standard deviation values.
Figure 1: Kurtosis
(http://grants.hhp.coe.uh.edu/doconnor/PEP6305/KurtosisPict.jpg)
Figure 2: Normal Distribution
(http://img.tfd.com/dorland/distribution_normal.jpg)
Skewness: Skewness (Fig. 3) is the measure of the asymmetry of a distribution curve due to a higher-or lower-than-expected number of observations that fall into the positive or negative ends of the curve; if positive there is an extended tail to the positive side (due to more observations that are lower-than-expected) and, if negative (due to more observations that are higher-than-expected), an extended tail to the negative side. A value below 1 and above -1 is considered normal.
Figure 3:Skewness
(https://www.isobudgets.com/wp-content/uploads/2015/10/skewness.jpg)
Standard Deviation: The standard deviation (Fig. 4) is the distribution/distance of scores about/from the mean, found by subtracting the mean from each observation, squaring the result, then finding the square root of the sum of these results after dividing by N or n-1. In this assignment n-1 was used as this is a sample population standard deviation.
Figure 4: Sample and Population Standard Deviation Equations
(http://dsearls.org/courses/M120Concepts/ClassNotes/Statistics/StandardDeviation2a.gif)
Hand Calculation
Eau Claire North Test Score Data Hand Calculation
Figure 5: Eau Claire North Data
Eau Claire Memorial Test Score Data Hand Calculation
Figure 6: Eau Claire Memorial Data
Results
Eau Claire North Test Scores
This data suggests that teachers at Eau Claire North have no reason to worry about being fired because their students have lower test scores than students at Eau Claire Memorial. The four descriptive statistics that are best for determining this are the range, median, mean and kurtosis. I chose these for the simple fact that they portray score differences in a manner that is easy to explain to people with little knowledge of descriptive statistics. EC North students have a lower range (83 compared to 91) which shows us that their test scores are less widely dispersed than the EC memorial students scores were; i.e. the lowest score attained by a student at North (111) was closer to the highest score attained by another student at North (194) than the difference between the lowest (107) and highest (198) scores attained at Memorial. This may suggest that the North students taking the test were better prepared at all levels of ability as compared to the students at Memorial. The median was higher at North (164.5) than at Memorial (159.5). This tells us that half of the students at North who took this exam in this sample had a score higher than 164.5 and half had a score lower than 164.5, compared to Memorial's median of 159.5. The mean test scores in this sample show that North, with a 160.92 average, had a higher mean test score than Memorial, at 158.54, did. Kurtosis refers to the shape of the distribution; the shape of the distribution curve based on the sample of test scores at North (-.56) is between -1 and 1 and is normal. The shape of the distribution curve based on the test score samples from Memorial is less than -1 (-1.17) and is considered platykurtic. This flatness of the curve is due a larger standard deviation value (27.16) and reflects the extreme disbursement of scores from the mean; we have less scores near the mean score of (158.54) than we observe based on the sample from North (160.92), i.e. the scores at north were closer to the mean. The students at North, on average, performed better on this standardized test. These statistics show that the teachers and students at North are doing quite well and public perceptions are not only unjustified but probably incorrect. This analysis was done using a sample of test scores from both schools using descriptive statistics; we are describing what we see and can make assumptions about this data but we cannot make any inferences regarding it.
Part 2
Methods: The shapefile used in Assignment 1 was added to a blank map in ArcGIS and a Microsoft Excel spreadsheet, provided by Dr. Ryan Weichelt containing Wisconsin population data by county, was joined to the shapefile based on countyGeo_id. Three spatial statistic analyses were performed: a geographic mean center of Wisconsin (Toolbox/Spatial Statistics/Measuring Geographic Distributions/Mean Center), a weighted mean center of population using 2000 Wisconsin county population data (Toolbox/Spatial Statistics/Measuring Geographic Distributions/Mean Center/Weight/2000), and finally a weighted mean center of population using 2015 Wisconsin county population data (Toolbox/Spatial Statistics/Measuring Geographic Distributions/Mean Center/Weight/2105).
Mean Center: The mean center is a spatial measurement of central tendency; in the map below it is represented by the red point which represents the geographical center of the state of Wisconsin found by averaging X and Y values (latitude and longitude).
Weighted Mean Center: Weighted mean centers are concerned with the frequencies in data sets; in the map below mean centers were weighted by population. More populous areas (counties) have a higher number of people and therefore have a heavier weight which pulls the geographic mean center toward the direction of more populous counties.
Results: The green point represents the weighted mean center of population in 2000, and the blue, of 2015. Both points are pulled to the southeast of the geographic center of Wisconsin, which is the red point, as the southern and eastern regions of Wisconsin have a higher population than the northern and western parts of the state. This is due to high populations in cities such as Milwaukee (southeast corner of the state), Madison (south-central), and Green Bay (east-central), for example. We observe, however, that the weighted mean center of population in 2015 has been pulled slightly to the southwest of the weighted mean center of population observed in 2000. An explanation for this is slight increase in county populations in the southern and western parts of the state that is greater than the increase in the northern and eastern counties in the state.