Abstract:
There is an ongoing debate in the NLP community whether modern language models
contain linguistic knowledge, recovered through so-called probes. This work examines
whether linguistic knowledge is a necessary condition for the good performance of
modern language models, which we call the rediscovery hypothesis.
In the first place, we show that language models that are significantly compressed
but perform well on their pretraining objectives retain good scores when probed for
linguistic structures. This result supports the rediscovery hypothesis and leads to
an information-theoretic framework that relates language modeling objectives with
linguistic information. This framework also provides a metric to measure the impact of
linguistic information on the word prediction task. We reinforce our analytical results
with various experiments, both on synthetic and on real NLP tasks in English.