The Likert Format arises from the scale for measuring attitudes (Kaplan, & Saccuzzo, 2001), the first of which was originated in 1932 by Rensis Likert (Edmondson, 2005). The rationale behind development of the Likert Scale by Rensis Likert was to measure psychological attitudes in a “scientific” way in 1932 and later in 1934 to expand upon the scaling techniques developed by Thurston (Edmondson, 2005). Since Thurston’s scale requires an elaborate procedure and the usage of judges, Likert Scale was easier to conduct while yielding equally satisfactory reliability (Edmondson, 2005; Anastasi, & Urbina, 2002). In their most contemporary form Likert scales are a non-comparative scaling technique and are unidimensional (only measure a single trait) in nature (Bertram, 2007).
Respondents are asked to indicate their level of agreement with a given statement by way of an ordinal scale. Most commonly seen as a 5 point scale ranging from “Strongly Disagree” on one end to “Strongly Agree” on the other with “Neither Agree nor Disagree” in the middle; however, some practitioners advocate the use of 7 and 9 point scales which add additional granularity. Sometimes a 4-point (or other even numbered) scale is used to produce an ipsative (forced choice) measure where no indifferent option is available (Kaplan, & Saccuzzo, 2001).
One of the biggest advantages of Likert Scales is that it does not expect a simple yes or no answer from the respondent, but rather allows for degrees of opinion, and even no opinion at all. Therefore quantitative data can be obtained, which can be analyzed with relative ease (Edmondson, 2005; Anastasi, & Urbina, 2002). Also according to Bertram (2007) Likert scales are simple to construct, more likely to produce a highly reliable scale, and most importantly, easy to read and complete for participants. However, there are several issues that may affect the reliability of the Likert scale, some of which are discussed in more detail.
Issue of Use of Judge
Even though Likert scale was originally created in order to alleviate the use of judges, the procedure for rating the items after generating them still requires for an expert to give rating to that item (Edmondson, 2005).
Issues of Response Obtained
According to Chang (1994) response condition, respondent’s attitude and the attitude reflected by the items determine the responses on a Likert scale. Heterogeneity of the reference frames people employ, would need to be assessed when responding to a Likert Scale, such as experiences with research and quantitative methodology (Chang, 1994). Other issues that may occur are central tendency bias, that is, participants may avoid extreme response categories. There may also be acquiescence bias in which participants may agree with statements as presented in order to “please” the experimenter, participants may base answers on feelings toward surveyor or subject may answer according to what they feel is expected of them as participants. They may experience social desirability bias-portray themselves in a more socially favorable light rather than being honest (Kim, 2010).
Issue of Statistics
Another major issue with the use of the Likert scale is the assumption by the researchers that it is an interval scale, even though when it is truly an ordinal scale (Edmondson, D.R., 2005). This assumption leads researchers into employing statistical strategies to attain mean and standard deviation even though they are inappropriate for ordinal data. Thus the legitimacy of assuming an interval scale for Likert- type categories is an important issue, since the appropriate descriptive and inferential statistics differ for ordinal and interval variables and if the wrong statistical technique is used, the researcher increases the chance of coming to the wrong conclusion about the significance of the research (Jamieson, 2004). Another factor that affects the inferential statistics from likert scale is that the scaling presumes that there is an underlying (or latent or natural) continuous variable, the value of which, characterizes the respondents’ attitudes and opinions. The measurement scale would be an interval scale if it were possible to measure the latent variable directly (Claso & Dormody, 1989).
However according to Goldstein and Hersen (1984) the level of scaling obtained from the Likert procedure is rather difficult to determine. Issue of Decision Making for the appropriate Likert Scale Format While constructing items in a likert scale a researcher must keep in mind the number of scale categories to use, assess whether the scale should be Balanced or unbalanced scale and add Odd or even number of categories and finally whether to make the test with Forced or a non forced choice. Number of response category. Kim (2010) found that the optimal number of response categories for a Likert scale has not been determined definitely. Too few categories might create difficulty in investigating respondent’s attitudes precisely and analyzing the data, the scale will be a coarse instrument and much of the discriminative power that raters are capable of will be lost.
On the other hand, too many categories might yield more precise investigations into the attitude of the respondent, but they may induce fatigue and unreliable responses. Also, one could grade a scale so finely that it would be beyond the rater’s limited powers of discrimination (Kim, 2010). Midpoint category. Deciding whether a scale should have a midpoint category or not can also be a challenging task. A mid-point is useful in the case of apathy or refusal to respond. Respondents might use a mid-point when they have no idea about the question while in contrast, mid-points can be useful in precise expression of respondents attitudes however in case of preference for a scale without a midpoint, a mid-point might create the possibility of information loss and so respondents tend to respond more precisely after careful consideration when the mid-point is eliminated (Kim, 2010). Inclusion of a mid-point is also related to other issues like social desirability, cultural characteristics, response tendency, and meaning interpretation of respondents (Kim, 2010).
According to a study by Kim (2010) as the number of categories increased, the responses tended to lean towards the right (agreement), and respondents tended not to select the mid- point. Also, the conceptual construct of a scale with many categories tended to be different from that of other scales with fewer categories, and the construct validity was slightly poor compared to other scales with fewer categories. More interestingly, reliability increased as the number of categories increased. When a scale included a mid-point, responses to the scale tended to lean toward the right (agreement) less. However, unlike the number of categories that influenced most of the statistical results, mid- point inclusion in a scale did not seem to influence results such as construct validity or reliability (Kim, 2010). Thus, it is hard to determine which is better among scale formats varying in category numbers and mid- point inclusion.
Each scale format has merits and defects in various statistical properties such as descriptive statistics, validity, and reliability. Researchers would have to depend on empirical settings or the objectives of the survey (Kim, 2010). Therefore, Kim (2010) suggests that if researchers prefer more definite and not a neutral responses, exclusion of a mid- point or if included, many response categories would be preferable. But, if the aim of the research is to measure overall conceptual construct with greater validity and specificity, it would be preferable to have a smaller number of categories. However, if a scale that accounts for more variance is desirable, or where internal consistency is needed, then more categories would be recommended. This is because the numbers of categories are a more crucial element when considering instrument design than the inclusion of a mid-point (Kim, 2010).
In conclusion, Likert Scale is easy to administer, and are quick and economical. They are easily adapted to most attitude measurement situations and provide direct and reliable assessment of attitudes when scales are well constructed. Infact evidence shows that Children prefer the Likert scale to the numeric and simple Visual Analogue Scale and find it easiest to complete. It is therefore even recommended for use in questionnaires for children (Laerhoven, Zaag-Loonen, & Derkx, 2007). However, as it is clear from several issues, results of likert scale can be easily affected by factors of social desirability from the clients or from the items inappropriately constructed in the scale.
Also, inferences drawn can be misleading since Intervals between points on the scale do not present equal changes in attitude for all individuals since the differences between “strongly agree” and “agree” may be less for one individual and great for another. Internal consistency of the scale may be difficult to achieve and thus unidimensional items should be ensured aiming at a single person, group, event or method. Also, items reflecting reliability and ensuring validity are difficult to construct, and often expert ratings from judges may be required which could involve a long and elaborative procedure. According to Clason and Dormody (1989) Statistical procedures that meaningfully answer the research questions, maintain the richness of the data, and are not subject to scaling debates should be the methods of choice in analyzing Likert-type items.
Anastasi, A., & Urbina, S. (2002). Measuring interests and attitudes. Psychological Testing (7th Ed.). Delhi: Pearson Education Bertram, D. (2007). Likert Scales. Journal of Educational Measurement, 11, 45-49. Chang, L. Psychometric Evaluation of 4-Point and 6-Point Likert Type Scales in relation to Reliability and Validity. Journal of Applied Psychological Measurement, 18(3). Clason, D.L., & Dormody, T.J. (1989). Analyzing Data Measured by Individual Likert-Type Items. Journal of Agricultural Education, 35 (4). Edmondson, D.R. (2005). Liker Scales: A History. The Research Methods Knowledge Base. Cincinnati, OH: Atomic Dog Publishing. Goldstein, G., & Hersen, M. (1984). Handbook of Psychological Assessment. New York: Pergamon Press. Jamieson, S. (2004). Likert scales: How to (ab)use them. Medical Education, 38, 1212–1218. Kaplan, R.M, & Saccuzzo, D.P. (2001). Writing and evaluating test items. Psychological Testing: Principles, Applications and Issues (5th Ed.) Singapore: Wadsworth Kim, S. (2010). The Influence of Likert Scale Format on Response Result, Validity, and Reliability of Scale -Using Scales Measuring Economic Shopping Orientation. Journal of the Korean Society of Clothing and Textiles, 34(6), 913-927. Laerhoven, H., Zaag-Loonen, H.J., Derkx, B.H.F. (2007). A comparison of Liker scale and visual analogue scales as response options in children’s questionnaires. Acta Paediatrica, 93(6), 830–835.