Through the use of numerical measure, the Motion Picture Industry can be analyzed more specifically. Descriptive statistics can assist analyst to measure data in terms of location, variability, association between two variables, as well as using data for exploratory analysis and the shape, relative location, and the identification of outliers. The data presented offers a look at four data sets including opening gross income, total gross income, number of theaters, and weeks in the top 60 movies for a sample of 100 movies. These data sets reveal numerous findings about the motion picture industry that reveal useful information to an analyst.

The measure of location illustrates the location of data samples from a point of central tendency or the measure of the samples, grouped together, and there position spread across intervals. Locational measures include the mean, the median, the mode, percentiles and quartiles. The mean is the average of the date set. The median represents the middle of the group in ascending order. The mode can be various numbers in which similar numbers appear most frequently in the sample; there can be more than one mode. The percentiles and quartiles represent the position(s) that a set of numbers are grouped throughout the sample. Percentiles are defined by a particular percentages, while quartiles split the group into even quarters of 25% (Anderson, Sweeney, & Williams, 2012).

The measure of variability is expressed through an element of dispersion across a population sample. Variability is measured through the calculation of range, interquartile range, variance, standard deviation, and coefficient of variation. The range is not commonly used as it is influenced significantly by outliers; however, it is measured by the subtracting the smallest value by the largest value. Interquartile range represents the difference between the first quartile and the third quartile. This allows analyst to see the interior 50% of the data sample; therefore, it is less likely to be influenced by outliers that significantly impact the range data. Next, variability is measured through variance. Variance simply analyzes all sample data and compares these samples to the mean of the set. Standard deviation illustrates how much variation exists between the sample and the mean. This component is useful because it allows the analyst to derive the understanding of how spread the data is, meaning that a low standard deviation can illustrate just how little the dispersion within the sample population is. Finally, the coefficient of variation is a percentage number that illustrates relative difference between the mean and the standard deviation (Anderson, Sweeney, & Williams, 2012).

In statistical analysis, it is also important to analyze a data sets distribution shape, relative location, and to detect outliers. Distribution shape can be visualized through the use of a histogram, which can visually show skewness, which is the numerical measure of shape. Skewness can be both positive and negative, as well as zero. Skewness again illustrates where the overall sample measures from the mean. For example, a highly skewed right sample indicates that the skewness is positive and that the mean of the sample is significantly greater that the median. Thus, we can conclude that there are a number of large data points that are influencing the mean. Another important measure is the relative location of samples within the data set. The z-score is the number of standard deviation that a sample is from the mean. Finally, it is important for any analyst to determine the number of outliers that exist within the data set.

This is important because it allows one to identify possible errors within that set and analyze each outlier as needed. Under Chebyshev’s Theorem and the Empirical Rule is a guideline of dispersion along the bell and sets what percentage each data point should be within the mean, or what percentage or how many standard deviation each data point will be from the mean. However, as a standard rule, the z score can be used to identify outliers. As such, a z score that exceed +3 can indicate an outlier and a z-score of greater than -3 can also indicate an outlier. Overall, this z score indicates that the data set may have values within the set that exceed three standard deviation from the mean (Anderson, Sweeney, & Williams, 2012).

Finally, statistical analysis is useful when comparing the relationship between two variables. As a linear component, a scatter graph can be used to help visualize these relationships. However, these relationships are calculated through the methods of covariance and correlation. Covariance measure the linear relationship of two variable; however, the covariance can be heavily influenced by the units of measure. As such, a correlation coefficient is not influenced by the units as the covariance product is compared to the standard deviation of both samples analyzed. This relationship is called the correlation coefficient and is measured between +1 and -1, with highest linear relationship being +1 or -1. If a correlations has a value of zero, there is no measurable relationship between the variables (Anderson, Sweeney, & Williams, 2012).

Opening gross revenues for the motion picture industry produced a large range of 108.43 because of strong performing outliers such as Star Wars, Harry Potter and the Goblet of Fire, as well as War of the Worlds. Z scores for these three movies exceeded +3 indicating that these movies resided more than three standard deviations from the mean. Overall skewness was 3.43 indicating a highly skewed right distribution from the mean as a result of these outliers. The mean for all 100 movies was 9.37 while the median was .93. Again, the outliers influenced the mean and the median of .93 is likely a better measure of central tendency for the industry as a whole. The standard deviation for the data set was 18.87. The interquartile range was 12.34. This difference between the Q1 and the Q3 indicates that because of the outliers influencing the standard deviation, that the interquartile measure is a bit more accurate in terms of variability as it is not influence by the outliers.

Total gross revenues for the motion picture industry produced a large range of 380.15 because of strong performing outliers such as Star Wars, Harry Potter and the Goblet of Fire, as well as War of the Worlds. Z scores for these three movies exceeded +3 indicating that these movies resided more than three standard deviations from the mean. Overall skewness was 3.28 indicating a highly skewed right distribution from the mean as a result of these outliers. The mean for all 100 movies was 33.4 while the median was .5.13. Again, the outliers influenced the mean and the median of 5.13 is likely a better measure of central tendency for the industry as a whole. The standard deviation for the data set was 63.16. The interquartile range was 46.95. This difference between the Q1 and the Q3 indicates that because of the outliers influencing the standard deviation, that the interquartile measure is a bit more accurate in terms of variability as it is not influence by the outliers.

Number of theatres for the motion picture industry produced a range of 3905. Z scores for all movies were below the outlier threshold meaning that number of theatres for each movie was a bit more evenly distributed. As this result was significantly lower than the previous two variables at .56 indicating only a modestly skewed right distribution from the mean. The mean for all 100 movies was 1278 while the median was 387. Nevertheless, the right skewed distribution indicates that the central tendency measure is a bit more accurate through the median measure. The standard deviation for the data set was 1378. The interquartile range was 2529. This difference between the Q1 and the Q3 indicates that because of the modest skewness, that the interquartile measure is a bit more accurate in terms of variability as it is not influence by the outliers.

Finally, the top 60 week performers for the motion picture industry produced a range of 26. Z scores for all movies were below the outlier threshold meaning that 60-week z score measurement for each movie was a bit more evenly distributed. Skewness measured .67 indicating only a modestly skewed right distribution from the mean. The mean for all 100 movies was 8.7 while the median was 7.0. Nevertheless, the right skewed distribution indicates that the central tendency measure is a bit more accurate through the median measure. The standard deviation for the data set was 6.39. The interquartile range was 10.0. This difference between the Q1 and the Q3 indicates that because of the modest skewness, that the interquartile measure is a bit more accurate in terms of variability as it is not influenced by the outliers that reached near a z score of +3.

The relationship of opening gross sales was most correlated to total gross sales with a correlation coefficient of .964. This measure nears the perfectly positive correlation of +1. The next highest correlation was between opening gross and number of theaters at .714, following by a modest correlation between opening gross and weeks in top 60 at .453. Nevertheless, none of the comparable variable revealed a negative

correlation.

As the data illustrates, in all cases central tendency was more accurately measured by the median because of the right skewness in the data. Moreover, the measure of variability was better measured though the interquartile results over the standard deviation because of similar reasons. The presence of outliers, or close outliers, influenced this assumption. Descriptive statistics provide a basis of analysis information and the use of variability, location, shape, and comparison provide a unique look into the performance of the industry overall.

References

Anderson, D. R., Sweeney, D. J., & Williams, T. A. (2012). Essentials of statistics for business and economics (Revised 6th ed.). Mason, OH: South-Western Cengage Learning.