Downes.ca ~ Stephen's Web ~ Education Research

Education Research

Nov 16, 2006
By Stephen Downes

Responding to this article.

Without launching into a big long discussion (again, this is old ground for people familiar with empirical research), there are many reasons to question the conclusions offered by such studies.

No single study should ever be accepted as proving one or another hypothesis, no matter how large the sample size. The essence of empirical science is that the phenomena it describes must be replicable, which means they must actually be replicated. It is not enough to say that the odds of getting a different result are very small. The replication, by different scientists, in different circumstances, needs to be undertaken.

That is why I am a bit surprised that you would not at least attempt to demonsrate replicability of the theories tested in the RITE evauations. You could certainly, for example, point to the Direct Instruction & Open Court Fact Sheet, which documents numerous studies supporting Direct Instruction. Of course, it also cites a number of studies supporting Open Court.

So what do you do when there are many studies, all apparently equally rigorous, but which support different theories? One tactic is to start calling other theorists names - that's what I see happening in the Engelmann paper, for example. Or this author, who writes, "if we could only get the morons (ed perfessers, district pinheads) out of educationÃ¢Â€Â¦"

But people who are serious about learning won't resort to this. They focus on what makes studies more likely to be accurate or misleading, and even more importantly, they understand the limits of such studies.

For example, the first question when we look at comparisons is, "better than what?" For example: "'There's evidence here that Direct Instruction is definitely helping some of the students,' said Ms. Mac Iver, an associate research scientist at Johns Hopkins University in Baltimore. 'The issue we still need to ferret out is whether they're doing significantly better than students getting other types of instruction.'"

That's a significant problem. There is no shortage of studies proving the effectiveness of this or that educational methodology. How can this be?

The problem is, as I noted in my earlier comment, that the studies are insufficient to prove that one form of instruction is uniquely the best. Even controlling for variables, the design of such studies is not able to encompass the many factors involved in their success or failure.

Let's take the RITE studies as an example.

In the RITE studies, schools chose the model they wanted to support, then received $750 to implement the model. This means that schools using Direct Instruction at least nominally supported direct instruction, and had some funds to support this implementation. Does Direct Instruction work in schools that don't have additional funds and where the staff are not motivated?

In reports of implementation of Direct Instruction, the results are uneven - some schools (especially those with impoverished kids) the results are very good, but in other schools they are not so good. Even if overall Division scores have improved, can we conclude that RITE should be used for all schools, or only those with impoverished students?

In the RITE program, the test used to evaluate the students was the SAT-9. This test measures a certain competency in language learning. Does the study show us that this is an appropriate standard? Can the same improvements be detected in other tests? Is there a corresponding improvement in student college entrance exam essays?

Achievements in language learning can be measured in many ways. The measurement described here, "better than than the 50th percentile [and] below the 25th percentile" is an odd sort of measurement to use in what ought to be an objective evaluation, since these are relative measures, and not indicative of any concrete accomplishment. Why would the examiners not simply report improvements in actual scores on tests?

Direct Instruction required adherence to a very specific and orchestrated type of instruction. Is it possible that this sort of instruction works better in certain U.S. communities than others? Does the Texas test tell us it would work in an environment like a Canadian school, where students are much less likely to take orders?

The test results cited only covered results over a three year period. can we know, from this test, whether language learning continues to imporve at this pace? Is it show that Direct Instruction should (or should not) be used at higher grade levels? Does the RITE evaluation tell us whether success at lower levels using Direct Instruction translates to success at higher levels, or whether it translates to poorer results at higher levels?

Does the RITE evaluation reveal to us whether there are any negative effects from Direct Instruction? For example, both mathematics and grammar require formalism, and Direct Instruction de-emphasizes formalism. Does the RITE evaluation tell us whether there are therefore any effects on math scores, either right away or in the long term?

Does the RITE evaluation show us that Direct Instruction works for every single student? Does it actually harm some students? Should it be used anyways? If not, what shoudl be used? Will the use of this alternative impact the deployment of RITE?

How much does Direct Instruction cost? I saw a figure somewhere saying it takes $60K to implement fully. Is Direct Instruction the most effective way to spend 60K in a school? Are there other interventions - such as the provision of free hot lunches for students, which have been shown in numerous studies to far outweigh any pedagogical intervention? Does the RITE evaluation tell us about the relative merits of such interventions?

In some cases, students are not subjected to a school program at all (the clearest example being the Montessori program). Does the RITE evaluation show better performance on overall school achievement than the Montessori program?

I could continue with my questions, but I think the implications are clear:

- one single test, such as the RITE evaluation described here, leaves far too many questions unanswered

- we could conduct more tests, but what we find, empirically, is that different tests, under different conditions, produce different results

- there is no set of independent, empirically measurable, criteria to tell us which test to use. The tests assume the conclusion they are trying to prove - they do this in the way terms are defined and in the way variables are measured

- tests attempt to control for variables, but in environments of mutually dependent variables, controlling one variable actually changes the result of all the other variables

- in particular, the tests define a certain domain of acceptable alternatives - anything outside this domain cannot be contemplated. But better alternatives may exist outside the domain.

Now I don't like this any more than you do. I would love to be able to recommend phonics (I was actually schooled using phonics) or Direct Instruction or whole language or whatever to schools and teachers, if I knew it would work. But the more I look at this field, the more I understand that this knowledge is not forthcoming, not because we can't find out, but because there is no such fact of the matter.

Asking about the best way to teach language is like asking for the best letter of the alphabet. Trying to measure for the best way to teach language is like trying to find the warmest or the coldest letter.

Now, I don't expect that any of this has convinced you. It is difficult to abandon the idea of simplicity and certainty in science. But complex phenomena are real, just like the weather, and learning is one of them, which means that any simple cause and effect theory will be, by that fact, wrong.