Findings of Panels Regarding Proper Placement of Standards/ Objectives for Assessment
Tables 1 and 2 identify those standards/objectives recommended for local assessment Table 3 identifies some standards/objectives deemed suitable for large-scale assessment, with some reservations
Standards/Objectives Suggested for Local Testing—Grade 4
Standard/Objective Reason for Identification
C.4.1 Orally communicate information, opinions, and ideas effectively to different audiences for a variety of purposes.
Speaking does not lend itself to large-scale testing.
C.4.3 Participate effectively in discussion Group discussions are not possible in a large-scale testing situation.
E.4.1 Use computers to acquire, organize, analyze, and communicate information.
Would require more computers at one time than most schools have available.
E.4.3 Create products appropriate to audiences and purposes Would take too long for large-scale testing. E.4.5 Analyze and edit media work as appropriate to audience and purpose.
Not practical or economical for large-scale testing.
C.4.6 Communicate the results of their investigations in ways their audiences will understand by using charts, graphs, drawings, written descriptions, and various other means, to display their answers.
Too time consuming for large-scale state testing.
C.4.7 Support their conclusions with logical arguments Difficult because of time constraints.
C.4.1 Identify and explain the individual’s responsibilities to family, peers, and the community, including the need for civility and respect for diversity.
Has a strong personal element Responses would be hard to evaluate on a large-scale test at state level.
C.4.6 Locate, organize, and use relevant information to understand an issue in the classroom or school, while taking into account the viewpoints and interests of different groups and individuals.
Research elements of this objective cause a major time constraint.
E.4.1 Explain the influence of prior knowledge, motivation, capabilities, personal interests, and other factors on individual learning.
The individual character of this objective makes it problematic for large-scale testing.
E.4.7 Explain the influence of factors such as family, neighborhood, personal interests, language, likes and dislikes, and accomplishments on individual identity and development.
Again, the individual nature of this objective makes it a poor choice for large- scale testing.
E.4.8 Describe and distinguish among the values and beliefs of different groups and institutions.
Value-oriented issues are probably best assessed locally.
E.4.14 Describe how differences in cultures may lead to understanding or misunderstanding Cultural differences around the state make this a better choice for local testing.
Standards/Objectives Suggesting for Local Testing—Grade 8
Standard/Objective Reason for Identification
C.8.1 Orally communicate information, opinions, and ideas effectively to different audiences for a variety of purposes.
Oral communication does not lend itself to large-scale testing.
C.8.3 Participate effectively in discussion Discussions will not work for large-scale tests.
E.8.1 Use computers to acquire, organize, analyze, and communicate information School would have difficulty providing sufficient computers at one time.
E.8.3 Create products appropriate to audiences and purposes.
Creating products causes a major time restraint.
E.8.5 Analyze and edit media work as appropriate to audience and purpose Again, a major time constraint, coupled with a probable insufficiency of equipment.
To create impactful oral and written presentations, it is essential to effectively utilize technology and adhere to the conventions of mathematical discourse, such as incorporating symbols, definitions, and labeled drawings Clear organization of ideas and procedures is crucial, along with a strong grasp of the intended purpose and audience By integrating these elements, presentations can convey mathematical concepts more clearly and engage the audience effectively.
Oral presentations not appropriate for large- scale testing.
In the context of real-world scenarios, it is essential to work with data by formulating targeted questions that facilitate effective data collection and analysis This process involves designing and conducting a statistical investigation, utilizing technology to create informative displays, generate summary statistics, and develop comprehensive presentations that convey the findings clearly.
Manipulation of data, collecting etc and designing investigations are all major time constraints.
C.8.3 Design and safely conduct investigations that provide reliable quantitative or qualitative data, as appropriate, to answer their questions.
Conducting investigations not practical in terms of time Also difficult to standardize.
C.8.8 Use computer software and other technologies to organize, process, and present their data.
Availability of sufficient computers would be a problem.
C.8.10 Discuss the importance of their results and implication of their work with peers, teachers, and other adults.
Too individualized for large-scale testing.
Standards/Objectives Suggesting for Local Testing—Grade 8
Propose a design or redesign of an applied science model or machine that can significantly impact the community or the world at large This innovative design should detail its functionality and effectiveness, while also addressing potential side effects that may arise from its implementation.
Time required to complete these activities would make them inappropriate for large- scale testing.
Investigating a specific local issue reveals how scientific or technological solutions have been implemented, alongside alternative options that were considered This analysis includes the rationale behind the chosen solutions, any new challenges that arose from these decisions, and the overall satisfaction of the community with the outcomes.
Local nature of this objective makes it inappropriate for large-scale, state testing.
H Science in Social and Personal Perspectives
H.8.2 Present a scientific solution to a problem involving the earth and space, life and environmental, or physical sciences and participate in a consensus-building discussion to arrive at a group decision.
Consensus building discussions do not lend themselves to large-scale assessment.
A.8.4 Conduct a historical study to analyze the use of the local environment in a Wisconsin community and to explain the effect of this use on the environment.
Time required to complete this objective makes it inappropriate for state tests
C.8.7 Locate, organize, and use relevant information to understand an issue of public concern, take a position, and advocate the position in a debate.
Research elements create a major time constraint.
E.8.1 Give examples to explain and illustrate the influence of prior knowledge, motivation, capabilities, personal interests, and other factors on individual learning.
Individual character of this objective makes it more appropriate for local assessment.
E.8.2 Give examples to explain and illustrate how factors such as family, gender, and socioeconomic status contribute to individual identity and development.
Again, the individual character of this objective makes it more appropriate for local assessment.
E.8.13 Select examples of artistic expressions from several different cultures for the purpose of comparing and contrasting the beliefs expressed
Major time constraint for this objective
Standards/Objectives Assessable, Grades 4 & 8, With Reservations
C.8.2 Listen to and comprehend oral communications Would require use of audio tapes Could cause equipment problems.
E.8.2 Make informed judgments about media and products Might require equipment not available to all districts Standardization problem.
E.8.4 Demonstrate a working knowledge of media production and distribution Possibly against the spirit of the objective, which is probably production of products.
F.8.1 Conduct research and inquiry on self- selected or assigned topics, issues, or problems and use an appropriate form to communicate their findings.
Would be an indirect measure of this standard A direct assessment would probably be more consistent with the intent of the standard.
Alignment Between the Model Academic Standards and TerraNova Tests
The results are displayed in tables, beginning with Table 4, which outlines the number of items per subject for each level and test format Tables 5 to 8 provide a basic assessment of the alignment between standards and tests according to the initial four criteria of Dr Webb’s alignment process Additionally, Tables 9 to 16 present detailed results for each subject and grade level, accompanied by analysis and discussion.
Number of Test Items for Each Academic Content Area
English/Languag e Arts Number of Items
Social Studies Number of Items
Summary of Attainment of Acceptable Alignment Level on Four Content Focus Criteria, Wisconsin Knowledge and Concepts Examination, Grades 4 and 8, English Language Arts
DOK ROK BOR CC DOK ROK BOR
A 13/17 C Yes No Yes Yes Yes Weak Yes Yes
Reading/Lit 14/18 C Yes No Yes Yes Yes No Yes Yes
14/18 D Yes No Yes Weak Yes No Yes Yes
15/19 C Yes Weak Yes Yes Yes Weak Yes Yes
B 13/17 C Yes Yes Yes Yes Yes Yes Yes Yes
Writing 14/18 C Yes Yes Yes Yes Yes Yes Yes Yes
14/18 D Yes Yes Yes Yes Yes Yes Yes Yes
15/19 C Yes Yes Yes Yes Yes Yes Yes Yes
C 13/17 C No Yes * No Yes * No Yes * No No
Oral Lang 14/18 C No Yes * No No No Yes * No No
14/18 D No No No No No Yes * No No
15/19 C No No No No No Yes * No No
D 13/17 C Yes Yes Yes Yes Yes Yes Yes Yes
Language 14/18 C Yes Yes Yes Yes Yes Yes Yes Yes
14/18 D Yes Yes Yes Yes Yes Yes Yes Yes
15/19 C Yes Yes Yes Yes Yes Yes Yes Yes
E 13/17 C No No No No No Yes * No No
Research/ 14/18 C No No No No No Yes * No No
Inquiry 14/18 D No No No No No No No No
15/19 C No No No No No No No No
F 13/17 C No No No No No No No No
Media/Tech 14/18 C No No No No No No No No
14/18 D No No No No No No No No
15/19 C No No No No No No No No
* Indicates that insufficient items exist to make the “Yes” or “Weak” meaningful
DOK Depth-of-Knowledge Consistency
ROK Range-of-Knowledge Correspondence
Summary of Attainment of Acceptable Alignment Level on Four Content Focus
Criteria, Wisconsin Knowledge and Concepts Examination, Grades 4 and 8 Mathematics
Level/Form CC ** DOK ROK BOR CC DOK ROK BOR
A 13/17 C Yes Yes Yes Yes Yes Weak Yea Yes
Math 14/18 C Yes Yes Yes Yes Yes Yes Weak Yes
Processes 14/18 D Yes Yes Yes Yes Yes Yes Weak Yes
15/19 C Yes Weak Yes Yes Yes Yes No Yes
B 13/17 C Yes Weak Yes Weak Yes No Yes Yes
Number 14/18 C Yes Yes Yes Yes Yes No Yes Weak
Oper & Rel 14/18 D Yes Yes Yes Weak Yes No Yes Yes
15/19 C Yes Yes Yes Yes Yes No Yes Yes
C 13/17 C No Yes Yes Yes Yes Yes Yes Yes
Geometry 14/18 C Yes Yes Yes Yes Yes Yes Yes Yes
14/18 D No Yes Yes Yes Yes Yes Yes Yes
15/19 C Yes Yes Yes Yes Yes Yes Yes Yes
D 13/17 C No Yes Yes Yes Yes Yes Yes Yes
Measurement 14/18 C Yes Yes Yes Yes No Yes Yes Yes
14/18 D Yes Yes Yes Yes Yes Yes Yes Yes
15/19 C Yes Yes Yes Yes Yes Yes Yes Yes
E 13/17 C No No No Weak Yes No Yes Yes
Statistics & 14/18 C Yes Weak Yes Yes Yes No Yes Yes
Probability 14/18 D Yes Yes Yes Yes Yes No Weak Yes
15/19 C Yes No Yes Weak Yes No Weak Yes
F 13/17 C No Yes Weak Weak No Weak Yes Yes
Algebraic 14/18 C No Yes Weak Yes No Yes Yes Yes
Relationships 14/18 D No Yes Weak Yes Yes Yes Yes Yes
15/19 C Yes Yes Yes Weak Yes Yes Yes Yes
* Indicates that insufficient items exist to make the “Yes” or “Weak” meaningful
DOK Depth-of-Knowledge Consistency
ROK Range-of-Knowledge Correspondence
Summary of Attainment of Acceptable Alignment Level on Four Content Focus
Criteria, Wisconsin Knowledge and Concepts Examination, Science, Grades 4 and 8
Level/Form CC ** DOK ROK BOR CC DOK ROK BOR
A 13/17 C No No No Weak * No Yes * No No
Science 14/18 C No Yes * No Yes * No Weak * No Yes *
Connections 14/18 D No Yes * No Yes * No Yes No Yes
15/19 C No Yes * No Yes * No No No Weak *
B 13/17 C No Yes * No No No Yes* No No
Nature of 14/18 C No Yes * No No No Yes * No Yes *
Science 14/18 D No Yes * No No No Yes * No Yes *
15/19 C No No No No No No No No
C 13/17 C Yes No Weak Yes Yes No No Yes
Science 14/18 C Yes Weak Yes Yes Yes Weak No Yes
Inquiry 14/18 D Yes Yes Weak Yes Yes Weak Weak Yes
15/19 C Yes No Weak Yes Yes Yes Weak Yes
D 13/17 C Yes Yes Weak Yes Yes Yes Yes Yes
Physical 14/18 C Yes Yes Yes Yes Yes Yes Yes Yes
Science 14/18 D Yes Yes Yes Yes Yes Yes Weak Yes
15/19 C Yes Yes Yes Yes Yes Weak Yes Yes
E 13/17 C Yes Yes Weak Yes Yes Yes Yes Yes
Earth/Space 14/18 C Yes Yes Yes Yes Yes Yes Yes Yes
Science 14/18 D Yes Yes Weak Yes No Yes Weak Yes
15/19 C No Yes Yes Yes Yes Yes Yes Yes
F 13/17 C Yes No Yes Yes Yes Yes Weak Yes
Life and 14/18 C Yes Yes Yes Yes Yes Yes Yes Yes
Environ 14/18 D Yes Yes Yes Yes Yes Yes Weak Yes
Science 15/19 C Yes Weak Yes Yes Yes Yes Yes Yes
G 13/17 C No No No Yes * No Weak No Yes
Science 14/18 C No Yes * No No No No No Yes *
Applications 14/18 D No No No Yes * No Weak Weak Yes
15/19 C No Yes * Weak Yes No No No Yes *
H 13/17 C No No No No No Yes * No Yes
Science in 14/18 C No Weak Weak * Yes * No Yes Weak Yes
Social and Per 14/18 D No Yes * No Weak * No Yes * No No
Perspectives 15/19 C No Yes * Yes * No No Yes * Weak Yes
* Indicates that insufficient items exist to make the “Yes” or “Weak” meaningful
DOK Depth-of-Knowledge Consistency
ROK Range-of-Knowledge Correspondence
Summary of Attainment of Acceptable Alignment Level on Four Content Focus
Criteria, Wisconsin Knowledge and Concepts Tests, Social Studies, Grades 4 and 8
Level/Form CC** DOK ROK BOR CC DOK ROK BOR
A 13/17 C Yes Yes Yes Yes Yes No Weak Yes
Geography 14/18 C Yes Yes Yes Yes Yes Weak Weak Yes
14/18 D Yes Yes Yes Yes Yes Weak No Yes
15/19 C Yes Yes Yes Yes Yes Yes Yes Yes
B 13/17 C Yes Weak Yes Yes Yes No Yes Yes
History 14/18 C Yes Yes Yes Yes Yes No Yes Yes
14/18 D Yes Yes Yes Yes Yes No Yes Yes
15/19 C Yes Yes Yes Yes Yes Weak Yes Yes
C 13/17 C Yes Yes Yes Yes Yes Yes No Yes
Political 14/18 C Yes Yes Yes Yes Yes Weak Weak Yes
Science and 14/18 D No Yes Weak Yes Yes Yes Yes Yes
Citizenship 15/19 C Yes Yes Yes Yes Yes Yes Yes Yes
D 13/17 C Yes Yes Yes Yes Yes Yes No Yes
Economics 14/18 C Yes Yes Yes Yes Yes Yes Weak Yes
14/18 D Yes Yes Yes Yes Yes Yes Weak Yes
15/19 C Yes Yes Yes Yes Yes Yes Weak Yes
E 13/17 C No No No Yes * No No No Yes*
Behavioral 14/18 C No Yes * No Yes * No No No Yes
Sciences 14/18 D No Yes * No No No Weak No Yes
15/19 C No Yes * No No No Weak No Yes
* Indicates that insufficient items exist to make the “Yes” or “Weak” meaningful
DOK Depth-of-Knowledge Consistency
ROK Range-of-Knowledge Correspondence
The Individual Charts and Analysis
The charts presented detail the evaluation of standards across various disciplines and grade levels Column 1 lists the abbreviated names of the standards along with the test levels and forms Column 2 shows the average number of items recognized by panel members as relevant to each standard Column 3 indicates the average percentage of items assessed at a depth-of-knowledge level that meets or exceeds the corresponding objective Column 4 reflects the percentage of objectives within each standard that are addressed by at least one item Column 5 illustrates the distribution of items across different objectives within a standard The alignment of each standard with TerraNova is categorized as “Yes,” “No,” or “Weak,” based on the established criteria for the study Each chart is followed by a discussion of the results it presents.
(No of Items: Form C, Level 13 = 57, 14 = 69, 15h Form D, Level 14 = 66)
Level/Form Avg # Items Avg at or above Avg Obj hit Avg Index Value
Reading/Lit 14 C Yes 50.25 30 No 88 Yes 73 Yes
Writing 14 C Yes 39.0 64 Yes 96 Yes 80 Yes
Oral Lang 14 C No 63 50 Yes * 13 No 23 No
Language 14 C Yes 15.25 52 Yes 81 Yes 80 Yes
Research 14 C No 00 00 No 00 No 00 No
Inquiry 14 D No 13 00 No 03 No 13 No
Media/ 14 C No 00 00 No 00 No 00 No
Technology 14 D No 00 00 No 00 No 00 No
Criteria for Categorical Concurrence = 6, Depth of Knowledge = 50, Range of Knowledge = 50,
* Indicates that insufficient items exist to make the “Yes” or “Weak” meaningful
The English language arts tests feature nearly twice as many items compared to the other subjects, primarily because TerraNova, when used outside Wisconsin, distinguishes between reading and English language arts scores, necessitating more items Additionally, most TerraNova items do not address Standards C, E, and F, likely due to the challenges of evaluating these objectives in large-scale assessments Notably, Level 13, Form C, contains approximately five fewer items than the other tests.
Categorical Concurrence: Standards A, B, and D and all of the test forms meet this criterion The alignment for the other three standards fails on all forms If the latter three
The assessment of 14 local standards indicates that the criterion is satisfied; however, two test forms only partially align with Standard C, lacking sufficient items for a meaningful analysis of other criteria Additionally, all forms fail to meet the requirements for Standards E and F, as there are no items addressing these standards.
The Depth-of-Knowledge Consistency is maintained in Standards B and D, as well as all test forms However, Standard A exhibits alignment issues, with three forms failing to meet the criteria and the remaining forms showing weak alignment To enhance the rigor of this standard, it may be beneficial to eliminate some of the less challenging items, particularly those that necessitate minimal or no inference.
Standards A, B, and D effectively meet the criteria for range of knowledge and balance of representation across all test forms However, the remaining three standards do not fulfill these criteria, as previously mentioned If these three standards are evaluated locally, this issue can be resolved.
(No of items: Form C, Level 17 = 68, 18 = 69, 19 = 68 Form D, Level 18 = 70)
Range of Knowledge Balance of
Level/Form Avg # Items Avg at or Above Avg Obj Hit Avg Index Value
Reading/Lit 18 C Yes 49.25 39 No 1.00 Yes 77 Yes
Writing 18 C Yes 37.50 85 Yes 96 Yes 76 Yes
Oral Lang 18 C No 63 51 Yes * 08 No 09 No
Language 18 C Yes 16.13 92 Yes 81 Yes 80 Yes
Research/ 18 C No 13 1.00 Yes * 03 No 13 No
Inquiry 18 D No 38 25 No 05 No 25 No
Media/ 18 C No 50 00 No 44 Weak * 50 No
Technology 18 D No 00 00 No 00 No 00 No
Criteria for Categorical Concurrence = 6, Depth of Knowledge = 50, Range of Knowledge = 50, Balance of Representation = 70
The grade 8 results show discrepancies primarily in false positives concerning depth-of-knowledge and range-of-knowledge criteria Similar to grade 4, the alignment fulfills three out of four standards but falls short in the remaining areas, which remain largely unaddressed Implementing local assessments for these three standards should effectively resolve the issues identified.
( No of Items: Form C, Level 138, 14C, 15C Form D, Level 14C)
Level/Form Avg # Items Avg at or Above Avg Obj Hit Avg Index Value
Mathematical 14 C Yes 13.38 52 Yes 83 Yes 76 Yes
Processes 14 D Yes 12.71 63 Yes 74 Yes 77 Yes
Num Oper 14 C Yes 22.0 51 Yes 64 Yes 72 Yes
Relationships 14 D Yes 21.43 50 Yes 68 Yes 67 Weak
Geometry 14 C Yes 6.63 77 Yes 74 Yes 88 Yes
Measurement 14 C Yes 8.50 82 Yes 73 Yes 79 Yes
Statistics and 14 C Yes 6.38 46 Weak 57 Yes 76 Yes
Probability 14 D Yes 8.0 67 Yes 71 Yes 74 Yes
Algebraic 14 C No 5.13 93 Yes 48 Weak 78 Yes
Relationships 14 D No 4.71 77 Yes 43 Weak 71 Yes
Criteria for Categorical Concurrence = 6, Depth of Knowledge = 50, Range of Knowledge = 50, Balance of Representation = 70
* Indicates that insufficient items exist to make the “Yes” or “Weak” meaningful
When considering Level 13, Form C, it's important to note that it is approximately five items shorter than other tests, which may contribute to a weaker alignment with the established standards Additionally, Standard B exhibits a notably larger discrepancy compared to the other standards.
16 number of corresponding items than do the other standards on all forms of the test 3) Alignment is weakest for Standard F.
The alignment between 4th grade mathematics standards and various test levels is generally strong, though additional items are required to fully meet categorical concurrence Specifically, one more item is needed for Standards C and D, and two for Standard E on Level 13, Form C Similarly, Level 14, Form D requires an extra item for Standard C and two for Standard F, while Level 14, Form C needs one additional item for Standard F To maintain test efficiency, reducing the number of items related to Standard B could help prevent unnecessarily lengthening the assessments.
The Depth-of-Knowledge (DOK) consistency shows generally good alignment, with the exception of Standard E, which does not fully meet the criterion To enhance the DOK criterion, introducing a rigorous item to Level 13 and Form C could address the categorical concurrence requirement effectively.
Range of Knowledge: Alignment meets this criterion for all forms of the test except for
To enhance alignment for Standard E on Level 13, Form C, it is essential to include items that address any unmeasured objectives, which could effectively resolve existing discrepancies Additionally, improving alignment for Standard F across three forms can be achieved by incorporating items that specifically measure objective F.4.4, thereby strengthening this criterion.
Balance of Representation: Alignment meets this criterion for all forms of the test, though weakly on some forms, particularly on Standard F
(No of Items: Form C, Level 17 = 42, 18 = 41, 19 = 35 Form D, Level 18 = 41)
Range of Knowledge Balance of
Level/Form Avg # Items Avg at or Above Avg Obj Hit Avg Index Value
Mathematical 18 C Yes 11.13 53 Yes 48 Weak 71 Yes
Processes 18 D Yes 10.0 55 Yes 49 Weak 73 Yes
Num Oper 18 C Yes 22.13 23 No 74 Yes 66 Weak
Relationships 18 D Yes 18.38 33 No 66 Yes 73 Yes
Geometry 18 C Yes 6.25 54 Yes 56 Yes 78 Yes
Measurement 18 C No 5.50 75 Yes 66 Yes 85 Yes
Statistics and 18 C Yes 6.88 23 No 52 Yes 78 Yes
Probability 18 D Yes 8.25 35 No 47 Weak 77 Yes
Algebraic 18 C No 5.13 76 Yes 52 Yes 85 Yes
Relationships 18 D Yes 7.25 65 Yes 63 Yes 87 Yes
Criteria for Categorical Concurrence = 6, Depth of Knowledge = 50, Range of Knowledge = 50, Balance of Representation = 70
* Indicates that insufficient items exist to make the “Yes” or “Weak” meaningful
Our findings indicate that overall alignment is satisfactory, with the exception of depth-of-knowledge consistency in Standards B and E Additionally, at grade 4, Standard B is notably represented by a significant number of items, with the only exception being Level 14, Form C.
Categorical Concurrence: Although Level 18, Form C fails to meet this criterion for
Standard C and Levels 17 and 18 of Form C do not meet the requirements for Standard F, with only one additional item needed for compliance Additionally, it is feasible to decrease the number of items related to Standard B by three without compromising alignment for that criterion.
The Depth-of-Knowledge Consistency in Standards B and E across all test forms is inadequate To align with this criterion, it is essential to add or replace certain items Additionally, removing some less rigorous items related to Standard B may enhance overall compliance with the standard.
Alignment effectively meets the criteria of knowledge range and representation across all test forms, with the exception of Level 19, Form C for Standard A, where knowledge range is lacking Additionally, alignment is notably weak in two other forms concerning this standard and the correspondence of knowledge range.
(No of Items: Form C, Level 13 = 29, 14 = 27, 15 = 35 Form D, Level 14 = 37)
Range of Knowledge Balance of
Level/Form Avg # Items Avg at or Above Avg Obj Hit Avg Index Value
Science 14 C No 3.33 86 Yes * 28 No 78 Yes *
Connections 14 D No 1.5 80 Yes * 22 No 79 Yes *
Nature of 14 C No 1.00 1.0 Yes * 04 No 17 No
Science 14 D No 67 90 Yes * 13 No 50 No
Science 14 C Yes 9.33 40 Weak 57 Yes 82 Yes
Inquiry 14 D Yes 8.17 50 Yes 42 Weak 74 Yes
Physical 14 C Yes 9.00 75 Yes 65 Yes 86 Yes
Science 14 D Yes 9.83 78 Yes 67 Yes 80 Yes
Earth and 14 C Yes 10.33 97 Yes 53 Yes 81 Yes
Space Science 14 D Yes 8.0 98 Yes 42 Weak 81 Yes
Life and 14 C Yes 15.50 60 Yes 83 Yes 84 Yes
Environmental 14 D Yes 13.67 61 Yes 87 Yes 74 Yes
Science 15 C Yes 14.43 47 Weak 89 Yes 74 Yes
Science 14 C No 83 88 Yes * 12 No 47 No
Applications 14 D No 2.00 33 No 26 No 76 Yes *
Science in 14 C No 2.67 46 Weak 47 Weak 73 Yes *
Social and Per.14 D No 1.67 80 Yes * 37 No 61 Weak *
Perspectives 15 C No 2.00 55 Yes * 50 Yes * 33 No
Criteria for Categorical Concurrence = 6, Depth of Knowledge = 50, Range of Knowledge = 50, Balance of Representation = 70
* Indicates that insufficient items exist to make the “Yes” or “Weak” meaningful
To be noted up front: 1) Alignment for Standards C through F is generally good
Source-of-Challenge Items
The tables present all comments from raters regarding potential sources of challenge issues, with the remarks taken verbatim from their sheets Comments that were noted by multiple raters as possible problems are highlighted for clarity Additionally, different shading is used to distinguish between adjoining sets of identified items.
English Language Arts, Grade 8 (None Identified at Grade 4)
Form Level Item # Rater Comment
C 17 52-55 31 Possible bias in choosing soccer as a topic; i.e., prior knowledge may vary by gender and SES
C 18 69 31 Possible SES bias with familiarity of musical instruments
C 19 15 31 Reading level of LeGuin biog Pilse (sic) continues challenging diction and syntax
C 19 19-24 31 Reading level of passage from
Dispossessed may be challenging, especially the obscure inverted names
Form Level Item # Rater Issue
C 13 8 22 How is this seen as estimation?
C 14 10 23 The distractor “classic movie” is too close to “new movie”
C 14 14 23 More than one answer possible B, C, D
C 14 23 18 It is not clear that box is in display case.
C 14 23 21 Display case does not necessarily equate to the rectangle
C 14 23 22 Display case does not equal rectangle
D 14 5 18 It is possible that only a few students are playing because he only says “some students”
D 14 12 21 1 st compute-not sure if Bs or just B 2 nd B3 order whole number (sic)
D 14 24 22 Perceptual problems What does this measure really? Bias!
C 15 5 17 What does “these” refer to (5 or 6 students)?
C 15 5 21 What does “they” refer to?
C 15 5 23 Cannot tell what “they” refers to; the sixth student could make a costume too
C 15 19 19 Picture not clear, more shadow
C 15 22 23 3 choices possible inches, meters, yards
Item #23 on Level 14 Form C merits a closer look, having been identified by three raters
Item #5 on Level 15, Form C was identified by five raters It seems to have a reference problem
Form Level Item # Rater Issue
C 17 12 23 Key is not necessarily noticeable State in item to (?)
C 17 16 23 Correct choice is only one with an arrow
Could answer correctly without knowing what a line segment is
C 17 22 21 Seems to more closely assess the grade 4 objective
C 17 23` 23 Have to assume the raft is a rectangle
C 17 33 17 If they measure the shapes, the sides aren’t
C 17 38 23 Not clear if students are to draw all the bars
(2X3=6) or only one bar for the parent and one for the student
C 18 14 23 Choice “C” is nearly correct for position of triangle Too close a distinction.
C 18 28 23 Distance B is too close to answer C
Counting error would result in the wrong answer, not perimeter boundary.
C 18 30 22 Writing inequality would have (sic)
D 18 7 23 “Make” could => profit Some students may take into account expenses
D 18 24 22 Abbreviation (# with line over) distracting notation How common?
D 18 25 23 There is more than one correct answer
Item 23 on Form 17 C was evaluated by three raters, revealing that the student is required to make an assumption This finding suggests that the item warrants further examination.
Form Level Item # Rater Issue
C 13 10 6 Knowing symbols for safety purposes
C 13 17 6 We don’t have mountains in Wisconsin
C 13 23 2 Possible challenge based on what and how electricity is used in the home
C 14 8 2 Negative question makes it complicated
C 14 15 1 Students must process every single word
C 14 23 1 Students will have to be careful to pick up on “not” and “unless”
C 14 27 1 This might be difficult for a child who never skated or whose parents can’t afford the equipment
D 14 2 6 Some inner city kids may not know what a deer is
D 14 11 6 Pine trees do not have the typical flower but they do go through a flowering process
D 14 16 4 Students would use all choices
D 14 16 7 Relative skate board park/cement dad and son working/age of boy
D 14 19 7 2 nd glass should show salt and water
D 14 20 1 Might be hard to tell the difference between the pen and the feather
D 14 20 7 Usual size plastic bottle and weather (sic) or not it is full of a liquid
D 14 23 4 Too difficult for fourth grade
D 14 36 1 Kids might think the grog’s life cycle is tadpoles-baby-frog-frog Lion looks (six)
D 14 36 4 Children will have difficulty with this question The images will cause students difficulty
C 15 8 7 Relative balance/top – a person sitting in the chair
C 15 20 7 Snow on mountain top? Unclear could be bare above tree line.
C 15 26 1 Kids who are not familiar with bikes may not answer correctly
C 15 31 6 No health nutrition standard used
Only item #36 on Level 14, Form D was identified by three raters The problem seems to be with the images This item should be reviewed.
Form Level Item # Rater Issue
C 17 12 7 “Oak Forest” name comes from primary plant species dominant
C 17 15 7 Testing logic, not necessarily knowledge
C 17 30 1 Pictures of airplane and Golden Gate
Bridge might cause an emotional response
C 17 32 7 Multiple answers dependent on rationale
Sneezing includes blood getting to muscle cells
C 18 2 1 The word “shortly” might be missed by some students
C 18 13 1 Kids might not know “propane”
Item #7 on Level 19, Form C was flagged by four raters They indicate a problem with the image This should be reviewed
Form Level Item # Rater Issue
C 13 1 12 Could be identified as a reg sub Eq
C 13 5 9 Some children do not learn this as truth
C 14 10 12 Student could interpret “quickly” in terms of sooner
D 14 11 15 Economists say there are no needs, only wants
D 14 18 8 MW not great plains Bad
D 14 18 12 Too many regions Are Great Lakes states the plains? Are the plains in the dairy belt?
D 14 23 13 Not entirely accurate land bridge dwellers?
C 15 6 13 No identification of amount made
C 15 7 9 Not all options on question
C 15 11 9 Wis = great plains or great lakes?
C 15 14 8 City Council doesn’t have to meet at City
C 15 15 10 Item too hard 4th graders might have a problem with the phrase “guiding common growth.”
C 15 15 12 Lacks clarity “guiding community growth”
C 15 18 9 Choices unclear, given labels on map
C 15 18 10 Item cannot be answered, based on map
Choices are confusing (illegible) recall
C 15 26 8 Canada is sub-arctic and WI is the northeast
C 15 27 12 Walrus tusk? Requires too much unrevealed background.
Four raters identified two problematic items in the assessment: Item #5 on Level 13, Form C, was deemed historically inaccurate, while Item #18 on Level 14, Form D, exhibited issues with the identification of geographical regions A review of these two items is recommended.
Form Level Item # Rater Issue
Reliability Among Reviewers
Reviewers consistently rated the depth-of-knowledge levels of test items across various forms, with an analysis conducted on both a grade 4 and a grade 8 test form in four content areas The average intraclass correlation measures, which compare ratings from six to nine reviewers, were 85 or higher, indicating strong reliability, except for one instance Specifically, the reliability for the six reviewers coding science Level 13 Form C was notably lower at 69 To address this, a subsequent analysis of a second grade 4 science test form, Level 15 Form C, yielded an intraclass correlation of 85, confirming that the lower reliability for Level 13 Form C was an anomaly.
Reliability of Depth-of-Knowledge Levels Ratings of Test Items by Reviewers for Four
Lower- Upper English Language Arts
Implications and Conclusions English Language Arts
The English Language Arts test features a significantly higher number of items compared to the other three subtests, which is crucial for understanding its structure Additionally, while TerraNova provides combined scores in reading and language arts outside of Wisconsin, it largely overlooks Standards C (Oral Language), E (Research and Inquiry), and F (Media and Technology) These aspects are essential for a comprehensive analysis of the re-alignment study.
The language arts panel members suggested that the three standards not covered by TerraNova should be assessed locally, while noting that items could be developed to evaluate Standard E, Research and Inquiry They acknowledged that this assessment would be indirect due to the time constraints associated with direct measurement Since these three standards are intended for local evaluation, the extensive number of items in the English language arts subtest is allocated among just these three standards, likely contributing to the observed misalignment.
TerraNova aligns with nearly all established criteria, demonstrating strong compliance However, it falls short in the depth of knowledge for Standard A in Reading/Literature at the 4th-grade level, and this weakness persists even at the 8th-grade level.
Local concerns may arise due to the significant portion of the language arts section allocated for local assessment Test development companies, such as CTB McGraw-Hill, offer oral language assessments that could be utilized at the local level Implementing these existing assessments would eliminate the need for developing a new oral language evaluation locally.
The findings of this study suggest that the depth-of-knowledge weaknesses identified in the reading sections of the English Language Arts test could be addressed by removing some of the less rigorous items, thereby enhancing the overall quality of the assessment.
No single item was identified as a source-of-challenge problem by more than one rater
Perhaps the most relevant observation about the Mathematics subtest is the predominance of items measuring the Number Operations and Relationships Standard (B) at both grades
At grade 4, an average of 21.6 items addresses the mathematics standards, while grade 8 averages 23.3 items This discrepancy indicates that the alignment between the tests and the standards does not fully meet the coverage criteria for four out of six standards at grade 4 Fortunately, only a few additional items are needed to satisfy these criteria By eliminating some items that assess Number Operations and Relationships, the test length can be maintained without significant changes Additionally, removing less rigorous items may improve compliance with the depth-of-knowledge requirements in this area.
At grade 8 in Standards A, B, and E, it may be necessary to revise items to improve rigor, or add items of sufficient rigor to those categories.
In grade 4, two items were flagged by multiple raters as potential sources of challenge: Level 14, C, item #23 (noted by three raters) and Level 15 C, item #5 (noted by five raters) Additionally, in grade 8, one item, Level 17C, item #23, was similarly identified by three raters.
The significant number of standards and objectives in the Science curriculum may contribute to the failure to meet alignment criteria in four out of eight standards While alignment is generally satisfactory in the remaining standards, implementing strategies such as extending the test by including additional items and evaluating poorly aligned standards at the local level could enhance overall alignment.
A significant challenge was identified in the assessments, with multiple raters noting concerns at both grade levels Specifically, at grade 4, three raters flagged Item #36 on Level 14, Form D, while at grade 8, four raters raised issues with Item #7 on Level 19, Form C.
The alignment study on Social Studies reveals two significant factors: the extensive number of objectives across the five standards, which complicates the ability to meet the knowledge criterion within a limited testing framework, and the lack of assessment items related to the Behavioral Science standard This standard, which encompasses cultural differences, beliefs, and attitudes, suggests that local-level assessment may be beneficial Panel sentiment indicates agreement with this approach, highlighting the potential need to incorporate additional items across all forms of the test.
Two items with potential source-of-challenge problems were identified at grade 4, none at grade 8 The two were Item #5, Level 13, Form C, and Item #18, Level 14, Form D
Cook, G et al (2002) Wisconsin’s re-alignment study: Preliminary findings Madison:
Wisconsin Department of Public Instruction.
Dold, S et al (1998) Wisconsin Knowledge and Concepts Examinations: An alignment study at Grade 8 Madison: Wisconsin Department of Public Instruction
Dold, S et.al (1998) Wisconsin Knowledge and Concepts Examinations: An alignment study at grade 4 Madison: Wisconsin Department of Public Instruction.
Governor’s Council on Model Academic Standards (1998) Wisconsin’s Model Academic
Standards Madison: Wisconsin Department of Public Instruction.
Shrout, P E., & Fleiss, J L (1979) Intraclass correlations: Uses in assessing rater reliability Psychological Bulletin, 86(2), 420-428.
U.S Congress (1965) Elementary and Secondary Education Act Washington, DC:
U.S Department of Education (1994) Goals 2000: Educate America Act Washington,
U.S Department of Education (1994) Improving America’s Schools Act of 1993: The reauthorization of the Elementary and Secondary Education Act and other amendments Washington, DC: Author.
Webb, N L., (1997) Criteria for alignment of expectations and assessments in mathematics and science education (Research Monograph No 6) Madison:
University of Wisconsin, National Institute for Science Education.
Webb, N L (1999) Alignment of science and mathematics standards and assessments in four states (Research Monograph No 18) Madison: University of Wisconsin,
National Institute for Science Education
Webb, N L (2001) Reviewer background information and instructions: Mathematics standards and assessment alignment analysis–CCSSO/TILSA alignment study
Washington, DC: Council of Chief State School Officers.
Webb, N L (2001) Reviewer background information and instructions: Science standards and assessment alignment analysis–CCSSO/TILSA alignment study
Washington, DC: Council of Chief State School Officers.
Wisconsin Department of Public Instruction (2000) Wisconsin High School Graduation
Test: Educator’s guide 2000 Madison: Author.