What is Masked Autoencoder?

Computer Vision

Masked Autoencoder

A masked autoencoder is a self-supervised learning method that masks random patches of an image and trains the model to reconstruct them. It has proven highly effective for pre-training vision models.

Understanding Masked Autoencoder

A masked autoencoder is a self-supervised learning architecture that learns meaningful data representations by masking a large portion of the input and training the model to reconstruct the missing parts. Popularized by the MAE approach for vision transformers, this technique randomly masks patches of an image and tasks the encoder-decoder network with predicting the original pixel values. The method draws inspiration from masked language modeling in natural language processing, where models like BERT predict hidden tokens. Masked autoencoders have proven remarkably effective for unsupervised pre-training, producing features that transfer well to downstream tasks like image classification, instance segmentation, and object tracking. By forcing the model to learn robust internal representations, masked autoencoders reduce the dependence on large labeled datasets and connect to broader ideas around imputation and self-supervised feature learning in modern deep learning.

Is AI recommending your brand?

Find out if ChatGPT, Perplexity, and Gemini mention you when people search your industry.

Check your brand — $9

Masked Language Model

Back to full glossary

Masked Autoencoder

Understanding Masked Autoencoder

Is AI recommending your brand?

Related Computer Vision Terms

Bounding Box

Computer Vision

Face Recognition

Image Captioning

Image Classification

Image Segmentation

Instance Segmentation

Neural Radiance Field