03 |
An image is worth 16x16 words: Transformers for image recognition at scale |
pdf |
#Pre-Print |
2021 |
04 |
End-to-end object detection with transformers |
pdf |
#EECV |
2020 |
05 |
Deformable DETR: # Deformable Transformers for End-to-End Object Detection |
pdf |
#Pre-Print |
2021 |
06 |
Dynamic DETR: End-to-End Object Detection With Dynamic Attention |
pdf |
#ICCV |
2021 |
07 |
UP-DETR: Unsupervised Pre-Training for Object Detection With Transformers |
pdf |
#CVPR |
2021 |
08 |
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection |
pdf |
#Pre-Print |
2022 |
09 |
DINOv2: Learning Robust Visual Features without Supervision |
pdf |
#Pre-Print |
2024 |
10 |
### Efficient detr: improving end-to-end object detector with dense prior |
|
|
2021 |
11 |
### Dab-detr: Dynamic anchor boxes are better queries for detr |
|
|
2022 |
12 |
### Sparse detr: Efficient end-to-end object detection with learnable sparsity |
|
|
2022 |
13 |
### Co-DETR: DETRs with Collaborative Hybrid Assignments Training |
|
|
2023 |
14 |
DETRs Beat YOLOs on Real-time Object Detection |
pdf |
#CVPR |
2024 |
15 |
PVT2 |
|
|
|
16 |
Twins |
|
|
|
17 |
Swin Transformer - Hierarchical Vision Transformer Using Shifted Windows |
pdf |
#ICCV |
2022 |
18 |
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers |
pdf |
#NeurIPS |
|
19 |
CG ViT: Global Context Vision Transformers |
pdf |
#PMIR |
2023 |
20 |
DynamicViT |
|
|
|
21 |
Focal Self-attention for Local-Global Interactions in Vision Transformers |
pdf |
#NeurIPS |
2022 |
22 |
CSWin Transformer |
|
|
|
23 |
MaxViT |
|
|
|
24 |
MinViT |
|
|
|
25 |
InternImage |
|
|
|
26 |
UFO (Unified Feature Optimization) Transformer |
|
|
|
27 |
LaVin-DiT: Large Vision Diffusion Transformer |
pdf |
#Pre-Print |
2024 |
|
|
|
|
|