Academic Exchange Quarterly Fall 2004: Volume 8, Issue 3
Collaborative Testing and Test Performance
William Breedlove, Ph.D is Associate Professor of Sociology and Director of the Center for Effective Teaching and Learning. His research interests include the scholarship of teaching and learning and comparative sociology.
Tracy Burkett, Ph.D is
Assistant Professor of Sociology. Her
research interests include social network analysis and political sociology. Idee Winfield, Ph.D is Associate
Professor of Sociology and Associate Dean of the
Does collaboration on a test, independent of prior collaborative learning, affect test performance? In almost all the descriptions, demonstrations, and studies of collaboration testing and academic achievement, the effect of collaborative learning on achievement is conflated with the effect of collaborative testing alone. The results of this study show that collaborative testing alone has a significant positive association with test performance that varies by the level of cognitive processing reflected in the test question.
Collaborative learning techniques have been widely used at the primary and secondary levels of education for some time now. The effects of collaborative learning are numerous and generalizable across different populations and settings. Perhaps in reaction to this growing body of evidence and because more emphasis is being placed on quality of teaching, increasing numbers of higher education faculty have come to experiment with and adopt collaborative learning. A review of the evidence for collaborative learning reveals than in almost all the descriptions, demonstrations, and studies of collaboration, the effect of collaborative learning on performance is conflated with the possible effect of collaborative testing alone on student achievement. That is, most studies present cases where students either engaged in both collaborative learning and collaborative testing, or more commonly, engaged collaborative learning but were evaluated individually. As a result, there currently is little evidence or knowledge of the effect of collaboration on test performance independent of prior collaborative learning experience or training. This study seeks to answer the question of whether collaborative testing alone affects academic achievement, that is, when students do not also engage in collaborative learning.
Evidence of collaborative testing effects is sparse. By contrast, a large body of research has documented beneficial effects of collaborative learning across diverse populations and disciplines. Studies have found positive effects among elementary school children (Billington, 1994; Fuchs, Fuchs, Karns, Hamlet, Katzaroff, & Dutka, 1998), developmental students (Ley, Hodges & Young, 1995), and college students (Giraud & Enders, 2000; Gokhale, 1995; Grzelkowski,1987; Guest & Murphy, 2000; Hanshaw, 1982; Helmericks, 1993; Morgan, 2003; Muir & Tracy, 1999; Nowak, Miller, & Washburn, 1996; Rau & Heyl, 1990; Reinhart, 1999; Russo & Warren, 1999; Sernau, 1995; Zimbardo, Butler, & Wolfe, 2003). Significant effects have been found across disciplines including business (Nowak et al, 1996), education (Morgan, 2003; Muir & Tracy, 1999), English composition (Russo & Warren, 1999), industrial technology (Gokhale, 1995), psychology (Guest & Murphy, 2000; Ley et al, 1995; Zimbardo et al, 2003), science (Hanshaw, 1982), sociology (Grzelkowski, 1987; Helmericks, 1993; Rau & Heyl, 1990; Reinhart, 1999; Sernau, 1995), and statistics (Giraud & Enders, 2000).
These studies and others find that collaborative learning leads to gains in both academic and nonacademic areas of student development. Among the many academic gains are improved performance on exams, greater retention of information, greater transfer of knowledge, and increased complexity of thought (Gamson, 1994; Johnson, Johnson, & Stanne, 2000). Additionally, collaborative learning fosters cooperation and connections with others (Muir & Tracy, 1999; Rau & Heyl, 1990), develops skills critical to workplace success such as team building and teamwork skills (Nowak et al, 1996; Russo & Warren, 1999), humanizes the learning experience (Grzelkowski, 1987), eliminates cheating (Grzelkowski, 1987; Ley et al, 1995), and is associated with higher levels of student satisfaction and motivation to learn (Chickering & Gamson, 1991; Fuchs et al, 1998; Giraud & Enders, 2000; Sernau, 1995; Slavin, 1980). Finally, collaboration lowers test anxiety which may, in turn, improve test performance since high levels of test anxiety have been found to negatively affect recall of learned information (Grzelkowski, 1987; Hanshaw, 1982; Helmericks, 1993; Ley et al, 1995; Muir & Tracy, 1999; Russo & Warren, 1999).
How might collaborative testing, independent of collaborative learning, affect test performance? One means is through reduced test anxiety and stress, both of which are associated with lower test performance. Post-collaborative test surveys show that students believe their levels of anxiety and stress were reduced by being allowed to work together (Morgan, 2003; Zimbardo et al, 2003). Another is that through working together, students can build upon each other’s knowledge with positive performance outcomes (Damon & Phelps, 1989). That building process can lead to understanding beyond what each individual could have accomplished working alone and should, therefore, positively affect test performance. Further, anticipation of collaborative testing may enhance test performance through a positive effect on motivation to learn. Slavin (1980), for example, finds that collaboration creates a more favorable attitude toward learning in general and this should result in better test preparation. Students may also be motivated by fears of looking ill-prepared, of failing their partners, or of being seen as a “social loafer” (Morgan, 2003).
Collaborative testing, minus collaborative learning, has further implications for performance on different kinds of cognitive tasks. Some cognitive tasks require the student to recall information. A test question may ask the student to choose from a list, fill in a blank, identify, match, or label. These types of questions emphasize recall, or the ability to bring to mind the appropriate material. Bloom (1956) classifies this level of thinking as “knowledge” and it represents the lowest level of cognitive processing in his taxonomy. Alternatively, a test question may ask the student to apply, draw conclusions, make inferences, or form generalizations. These types of questions emphasize applying generalizations, interpretation, and understanding patterns and relationships among parts. This level of thinking is “application” and “explanation” and represents a higher level of cognitive processing.
Collaborative learning facilitates the development of higher order thinking skills and should therefore affect students’ performance on questions that emphasize applying generalizations, interpretations, and making inferences. Collaborative testing without prior learning should not affect performance on these kinds of questions. On the other hand, if “two brains are better than one”, students collaborating on a test without prior collaborative learning may perform better on less complex questions that emphasize recall than students working alone on similar questions.
Teachers of introductory courses, and of many higher level courses too, are generally concerned that students learn the basic concepts and guiding theoretical perspectives of their discipline. Although those concepts and theories are both abstractions, it is probably true that theories represent a higher level of abstraction or generalization than do most concepts. To the extent that is true, correctly answering questions about concepts and theories involves different levels of cognitive processing. The difference between questions about concepts and questions about theories is not equivalent to the difference between levels of knowledge outlined by Bloom (1956), but the difference in complexity or abstraction does lead to plausible hypotheses about the effect of collaborative testing on test performance. We anticipate no effect of collaborative testing on correctly answering questions about theories. Students should gain from collaborative testing when questions are about concepts.
H1: Collaborative testing is positively associated with test performance.
H2: Collaborative testing is positively associated with performance on concept and “knowledge” questions.
H3: Collaborative testing is not associated with performance on theory, application, and explanation type questions.
Data and Methods
Subjects were drawn from seven sections of an introductory sociology course taught at a medium-sized Southern liberal arts college: four sections from Fall 2001 and three sections from Spring 2003. At the beginning of the semester, students were told that they had the opportunity to participate in a study of learning. One section in each semester was chosen as a control group. In the experimental group, students agreeing to participate would be allowed take an exam with a randomly assigned same-sex partner. Collaborating pairs, spread out as much as space would allow, could quietly discuss questions, answers, and explanations with their partners. The few students who chose to work alone were given the option of relocating to a quite room. Each student would have their own exam and would submit their own set of exam answers. Those answers could differ from their partner’s answers. Same-sex partnering was used to address any potential power imbalances or increased anxiety that might arise from cross-sex pairings. In the Spring 2003 semester, one student without a partner was added to an existing pair. All participation was voluntary and informed consent was given. To examine whether collaboration, rather than a possible differential rate of learning or differential improvement in test taking mattered, we varied the number of partnered exams across semesters. The Fall 2001 semester took the second exam with a partner while the Spring 2003 semester took both the first and second exams with a partner. In the results described below, collaboration is coded = 1 and non-collaboration is coded = 0.
The course was divided into three approximately equal segments. Each instructor covered the same general content for each segment with some allowance for differences in instructor interest. Because we allowed for variation in interest, different sets of test questions were developed and those that represented content not covered by all instructors were eliminated. This produced the common sets of fifteen multiple choice questions for the two tests that we examine here. These common questions were added to other questions that made up each instructor’s complete test. Test performance is measured as the percentage correct of the fifteen common questions.
To explore the association between collaborative testing and differences in cognitive tasks, questions were classified according to whether they asked students about concepts or theories. For each test, seven of the fifteen questions were classified as concept questions and seven were classified as theory questions. Examples of each type follow.
Martha has a taste for fine clothing that she cannot afford on her small salary, so she resorts to embezzling funds from her employer. This type of deviance is described by Robert K. Merton as:
the aftermath of the September 11th attack on the
b. symbolic interactionist
d. social solidarity
Other variables that may have affected test performance were controlled for as follows. More senior students should perform better due to their greater experience with test preparation and testing. Further, poorly performing students are likely to have been selected out leaving mostly good students among the junior and senior ranks. Female students may perform better under collaborative conditions given a preference for working together versus competitively. As reflected in SAT scores and other indicators, minority students often come to higher education less prepared than other students. Minority students are also more likely to face distracting financial circumstances and perhaps to feel more isolated and less supported than majority students. For these reasons, minority students’ test performance was expected to be lower.
Analysis and Discussion
Associations among variables are estimated by zero-order and partial Pearson’s correlation coefficients. In the Fall 2001 semester, on test 1 where all students worked alone, no significant association was found between experimental (n=91) or control (n=40) group membership and overall test performance (r=.136, p > .05). The same is true for answering concept (r=.166, p > .05) and answering theory questions (r=.071, p > .05). At this point, there appears to be no significant difference in test performance between the experimental and control groups. On test 2, collaboration was introduced and a positive zero-order association between group membership and correctly answering concept questions (r=.245, p <.01) was detected, but no significant association for the theory questions (r=.025, p > .05) or overall test performance (r=138, p>.05). Collaborating students scored significantly higher than non-collaborators on the concept questions. Controlling for the effects of class rank, gender, and minority status does not change the substantive results. The partial correlation coefficient for the association between collaboration and correctly answering concept questions is very modestly reduced but still significant (r=.239, p < .01). All other associations remain not significant.
The Spring 2003 semester had students in the experimental group collaborate on both test 1 and test 2. Again, a significant associations between experimental (n=67) or control (n=14) group membership and performance on concept questions was detected, but now for both test 1 (r=.344, p < .01) and test 2 (.423, p < .01). That the association is significant for both tests in Spring 2003, but only for test 2 in the Fall 2001 semester, suggests that that earlier effect was due to collaboration rather than a possible differential rate of learning or differential improvement test taking between test 1 and test 2. Consistent with expectations, there again is no significant association for performance on the theory questions on test 1 (r=.071, p > .05) or test 2 (r=-.035, p > .05). The strength of the association between collaboration and performance on concept questions is sufficient to boost the association between collaboration and overall test performance to statistical significance (r=.254, p < .05). The partial correlations show that introducing the control variables does not change the substantive results.
Prior research reports a wide range beneficial effects of collaborative learning across disciplines and diverse populations. Very few of these studies, however, carried collaboration into the process of evaluating student learning. Consequently, although much is known about how collaboration can affect learning, little systematic evidence exists for the association between collaboration on tests and test performance.
This study found that collaborative testing is significantly and positively associated with performance on concept questions but not on theory questions. To the extent that theory questions represent a higher level of abstraction and answering theory questions requires higher cognitive processing, it appears that collaborative testing, independent of collaborative learning, may not facilitate higher levels of cognitive processing. Future research might examine the effect of collaborative learning versus collaborative learning combined with collaborative testing on performance on questions requiring more complex reasoning. The significant association between effect of collaborative testing and concept questions performance shows that collaborative testing may be beneficial even if not combined with prior collaborative learning. Additional research should explore whether the gains on concept questions are retained or if it appears that students are simply borrowing knowledge to answer a question without really learning why the answer is correct. As some of the subjects stated “two brains are better than one”. In what way are they better? Are they better simply because information missing in one can be borrowed from another without understanding, or are two brains better because they learn from each other and reinforce knowledge? Classroom teachers may find collaborative testing useful for learning concepts, but they should examine whether that learning is retained, perhaps by retesting students individually at a later date.
Billington, R. (1994). Effects of collaborative test taking on retention in eight third-grade mathematics classes. The Elementary School Journal, 95, 23-32.
B. (1956). Taxonomy of educational objectives: The classification of
educational goals: Handbook I, cognitive domain.
Chickering, A. & Gamson, Z.
(1991). Applying the seven principles
for good practice in undergraduate education.
Damon, W. & Phelps, E. (1989). Critical distinctions among three methods of peer education. International Journal of Educational Research, 13, 9-19.
Fuchs, L., Fuchs, D., Karns, K., Hamlett, C., Katzaroff, C, & Dutka, S. (1998). Comparisons among individual and cooperative performance assessments and other measures of mathematics competence. The Elementary School Journal, 99, 23-52.
Gamson, Z. (1994). Collaborative learning comes of age. Change, 26, 44-50.
Giraud, G. & Enders, C. (2000). The effects of repeated cooperative testing in an introductory statistics course. Paper presented at the Annual Meeting of the American Educational Research Association.
Gokhale, A. (1995). Collaborative learning enhances critical thinking. Journal of Technology Education, 7, 1-2.
Grzelkowski, K. (1987). A journey toward humanistic testing. Teaching Sociology, 15, 27-32.
Guest, K. & Murphy, D. (2000). In support of memory retention: A cooperative oral final exam. Education, 121, 350-354.
Hanshaw, L. (1982). Test anxiety, self-concept, and the test performance of students paired with the same students working alone. Science Education, 66, 15-24.
Helmericks, S. (1993). Collaborative testing in social statistics: Toward gemeinstat. Teaching Sociology, 21, 287-297.
Johnson, D., Johnson, R. & Stanne, M. (2000). Cooperative learning methods: A meta-analysis. Retrieved from http://www.clcrc.com/pages/cl-methods.html. September 4 2002.
Ley, K., Hodges, R., & Young, D. (1995). Partner testing. Research and Teaching in Developmental Education, 12, 23-30.
Morgan, B. (2003). Cooperative learning in higher education: Undergraduate student reflections on group examinations for group grades. College Student Journal, 37, 40-50.
Muir, S. & Tracy, D. (1999). Collaborative essay testing. College Teaching, 47, 33-36.
Nowak, L., Miller, S., & Washburn, J. (1996). Team testing increases performance. Journal of Education for Business, 71, 253-256.
Rau, W. & Heyl, B. (1990). Humanizing the college classroom: Collaborative learning and social organization among students. Teaching Sociology, 18, 141-155.
Rinehart, J. (1999). Turning theory into theorizing: Collaborative learning in a sociology theory course. Teaching Sociology, 27, 216-232.
Russo, A., & Warren, S. 1999. Collaborative test taking. College Teaching, 47, 18-20.
Sernau, S. (1995). Using a collaborative problem-solving approach in teaching social stratification. Teaching Sociology, 23, 364-373.
Slavin, R. (1980). Cooperative learning. Review of Educational Research, 50, 315-342.