[2103.10697v2] ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases

Convolutional architectures have proven extremely successful for vision tasks. Their hard inductive biases enable sample-efficient learning, but come at the cost of a potentially lower performance ceiling. Vision Transformers (ViTs) rely on more flexible self-attention layers, and have recently outperformed CNNs for image classification. However, they require costly pre-training on large external datasets or distillation from pre-trained convolutional networks. In this paper, we ask the following question: is it possible to combine the strengths of these two architectures while avoiding their respective limitations? To this end, we introduce gated positional self-attention (GPSA), a form of positional self-attention which can be equipped with a ``soft" convolutional inductive bias. We initialise the GPSA layers to mimic the locality of convolutional layers, then give each attention head the freedom to escape locality by adjusting a gating parameter regulating the attention paid to posi

1 mentions: @Maxwell_110
Keywords: Transformer
Date:

Referring Tweets

@Maxwell_110
@Maxwell_110 ConViT 📝 t.co/JmDwGpATWM ViT に CNN の利点である帰納バイアスをもたせる GPSA を提案 GPSA は「畳み込み層として初期化された Positional Self-Attention と通常の attention のバランスを Gate で制御」することでソフトな帰納バイアスを実現 GitHub ➡︎ t.co/jbFJTtEgQ5 t.co/rZKhCWvueu

Related Entries

[1703.10025] Flow-Guided Feature Aggregation for Video Object Detection
Read more [1703.10025] Flow-Guided Feature Aggregation for Video Object Detection
0 users, 1 mentions 2021/12/23 22:37
[1710.03958] Detect to Track and Track to Detect
Read more [1710.03958] Detect to Track and Track to Detect
2 users, 1 mentions 2021/12/28 22:37
[2201.10271v1] Convolutional Xformers for Vision
Read more [2201.10271v1] Convolutional Xformers for Vision
0 users, 1 mentions 2022/02/03 22:37
GitHub - tidymodels/multilevelmod: Parsnip wrappers for mixed-level and hierarchical models
Read more GitHub - tidymodels/multilevelmod: Parsnip wrappers for mixed-level and hierarchical models
0 users, 1 mentions 2022/05/12 22:38
GitHub - OpenNLPLab/AVSBench: Official implementation of the ECCV2022 paper: Audio-Visual Segmentati...
Read more GitHub - OpenNLPLab/AVSBench: Official implementation of the ECCV2022 paper: Audio-Visual Segmentati...
0 users, 1 mentions 2022/09/11 22:37
GitHub - fugue-project/fugue: A unified interface for distributed computing. Fugue executes SQL, Pyt...
Read more GitHub - fugue-project/fugue: A unified interface for distributed computing. Fugue executes SQL, Pyt...
0 users, 1 mentions 2022/09/13 22:37

ML-Newsについて

機械学習の技術に関する情報は流速も早いし、分野も多様でキャッチアップが大変です。Twitterで機械学習用のリストを作っても、普段は機械学習以外の話題が多く流れており、効率的に情報収集するのは困難です。

ML-NewsはSNSを情報源とした機械学習に特化したニュースサイトです。機械学習に関する論文、ブログ、ライブラリ、コンペティション、発表資料、勉強会などの最新の情報を効率的に収集できます。

機械学習を応用した自然言語処理、画像認識、情報検索などの分野の情報や機械学習で必要になるデータ基盤やMLOpsの話題もカバーしています。
安定したサイト運営のためにGitHub sponsorを募集しています。

お知らせ

  • 2021/12/31: デザインを刷新しました
  • 2021/04/08: 日本語とKaggleのカテゴリを新設しました