# Christmas Trees and Invariance in Machine Learning

### Rotation invariant Christmas trees are trending this year.

Machine learning experts often talk about “invariance.”

“Invariance” is a term borrowed from math. In math, you say a property of a mathematical object is “invariant” if it remains unchanged after operations or transformations of a certain type are applied to the objects.

In machine learning, the language describes why a particular learning algorithm works so well. For example, “It works because it is invariant to ….”

For example, here’s a useful illustration of invariance in computer vision I found on StackExchange.

Suppose you have a computer vision algorithm trying to identify objects in images in this example. You want the learning algorithm to know that operations on the object in the image, such as translating it, rotating it, changing its size, or changing the illumination, don't change the object's identity.

The problem is that the algorithm can learn many various statistical patterns from data and that some patterns help identify the object, and some do not. We describe the patterns that are *not* useful in terms of invariance.

Am I the only one that feels like this is a weird way of getting machines to learn? I feel odd trying to represent something in terms of what it isn’t rather than what it is.

Imagine learning what a Christmas tree is like this.

What’s a Christmas tree?This is a Christmas tree.

Also, note that Christmas trees are invariant to being indoors or outdoors.

And they are size invariant.

But not proximity invariant. That’s pine needles.

And not

ornamentinvariant. That’s just a pine tree.Wait, except when indoors. And there’s presents. That’s still a Christmas tree.

And I guess they are rotation invariant?

And color invariant…

Except for these. These are just messed up…

This shows how little the machine really knows about Christmas tree, only use spurious corrleations that it has seen in the past to draw conclusion! That's the problem of covariate shift/Out of distribution. Can causal inference make machine learn better from the "true" concept that is invariant?