I'm trying to understand vision transformers, and specifically, I want to know more about tokens. What exactly are they in the context of vision transformers?
5 answers
KatanaBlade
Tue Nov 19 2024
Each of these small patches undergoes a transformation.
noah_stokes_photographer
Tue Nov 19 2024
Specifically, each patch is projected into a feature vector through a linear layer.
Valentina
Tue Nov 19 2024
The typical vision transformer process begins by breaking down a fixed-size input image.
HanjiArtist
Tue Nov 19 2024
This feature vector is also referred to as a token.
BitcoinBaroness
Tue Nov 19 2024
This decomposition involves dividing the image into a series of small patches.