1 Comment

The claim that transformer networks do not have inductive bias is not quite right.

Inductive bias is anything that sandboxes the model. This includes network topology. The moment we design a topology, we have sown the seeds of its limitations. There are certain parts of the world that the model will not do a good job of learning while there will be some that it will never "understand".

Transformer models are designed by us. Thus, they must have inductive bias. When we set the token limit on a transformer, we decide what is good for the model and we thus introduce inductive bias. When we stamp our "wisdom" through layer normalization and residual connections and fully connected networks, we introduce inductive bias.

But inductive bias is another name for domain knowledge. The usefulness of a model can only be measured in a context (pardon the pun). That is why even training a transformer becomes that of training a discriminative model in the end.

To paraphrase George Box -- all models have inductive bias. Some have less than others.

Expand full comment