“Laws” & reproducibility in language assessment

In response to an email on the LTEST-L, Language Testing Research and Practice mailing list, asking (1) whether language testing research actually aims at establishing laws, in the same way as the natural sciences, and (2) whether studies have shown language testing research to be reproducible:


I think one take in the philosophy of science these days is: The laws in the natural sciences generate predictions with a higher degree of accuracy, and cover a broader range of phenomena. Any “laws” that we’ve yet discovered in the social sciences – like the linguistics laws you mentioned (Zipf’s Law, Heap’s Law), or laws in economics – are likely to have narrower application, and be more probabilistic.


I think it’s often accepted that the object of study (i.e. the set of phenomena themselves) gets inherently more complex as you move from the physical sciences, to the life sciences, to psychology, and then to the social sciences. With each step, you’re adding another dimension of complexity. So, any laws are likely to become more complex, have more restricted application, and be more prone to error in their predictions. Also, and maybe most importantly, any such laws will become more difficult to discover. As the biologist E O Wilson writes, the objects of study in the social sciences are just more complex than those in physics or chemistry, since they involve all the objects from the physical sciences as well as those from biology, psychology, and economics/political science/linguistics/etc. So there are social sciences laws out there, but we’re just in the very early stages of establishing them.


(One of the complicating elements in the social sciences is also human reflexivity – even if you establish what seems like a pretty good law, once it’s communicated and possibly used for policy-making, people may then deliberately change their behaviour.)


On reproducibility, I think in language assessment we’d at least hope that most findings are reproducible? The core claim of validity for an educational assessment, for example, is the claim that this assessment can be used again and produce similar results, i.e. test-takers with higher ability will get higher scores, even when used in different populations. So the community seems to value reproducibility as much as e.g. psychology has come to value it. – Whether language assessment research has actually demonstrated reproducibility to the same extent that psychology has been trying to do in recent years (like in the OSF’s ‘Reproducibility’ projects), is another question. I’m not sure how encouraging language assessment journals are of replication studies? I tried searching “language assessment”/”language testing” and “replicat*” in Scopus and there do seem to be some recent replication studies in Language Teaching (e.g. Johnson & Nicodemus, 2016). – It’s been mentioned recently that the journal Language Testing is discussing accepting the Registered Reports format, so I wouldn’t be surprised if language assessment as a field is also going to be relatively advanced in its attitudes toward replication studies, and other efforts to assure reproducibility.

Leave a Reply

Your email address will not be published. Required fields are marked *