awesome-vision-and-language
Vision and Language Resources
A curated list of resources and datasets for research in vision and language tasks.
A curated list of awesome vision and language resources (still under construction... stay tuned!)
500 stars
12 watching
40 forks
last commit: 18 days ago
Linked from 1 awesome list
awesomeawesome-listmultimodal-learningvision-and-language
Awesome Vision-and-Language: / Survey | |||
1506.06833 | |||
1705.09406 | |||
1810.04020 | |||
1907.09358 | |||
Scene-Graph-Survey | |||
1904.09317 | |||
ACCESS 2019 | |||
1911.03977 | |||
1912.11872 | |||
2010.09522 | |||
Awesome Vision-and-Language: / Dataset | |||
1505.00468 | |||
visualqa | |||
1604.03968 | |||
ai-visual-storytelling-seq2seq | |||
VIST | |||
1602.07332 | |||
visual_genome_python_driver | 357 | about 1 year ago | |
visualgenome | |||
1612.06890 | |||
1705.08421 | |||
AVA | |||
1711.11543 | |||
embodiedqa | |||
1711.07280 | |||
bringmeaspoon | |||
1902.09506 | |||
visualreasoning | |||
1811.10830 | |||
r2c | 466 | over 3 years ago | |
VCR | |||
1904.03493 | |||
2010.00763 | |||
Bongard-LOGO | 51 | over 2 years ago | |
2205.13803 | |||
Bongard-HOI | 64 | about 2 years ago | |
Awesome Vision-and-Language: / Image Captioning | |||
1411.4389 | |||
1412.2306 | |||
1411.4555 | |||
show_and_tell.tensorflow | 290 | about 8 years ago | |
1502.03044 | |||
show-attend-and-tell | 907 | over 6 years ago | |
1411.4952 | |||
visual-concepts | 151 | over 6 years ago | |
1603.03925 | |||
semantic-attention | 51 | about 8 years ago | |
1612.01887 | |||
AdaptiveAttention | 334 | almost 7 years ago | |
1612.00563 | |||
1611.06607 | |||
1704.03899 | |||
1611.08002 | |||
Semantic_Compositional_Nets | 70 | over 6 years ago | |
CVPR 2017 | |||
stylenet | 63 | almost 4 years ago | |
ENNLP 2018 | |||
image-paragraph-captioning | 90 | about 5 years ago | |
1803.09845 | |||
NeuralBabyTalk | 524 | over 5 years ago | |
1707.07998 | |||
1807.03871 | |||
1805.08191 | |||
1811.10787 | |||
unsupervised_captioning | 215 | over 1 year ago | |
1906.02365 | |||
CAVP | 47 | over 5 years ago | |
1903.05942 | |||
1903.12020 | |||
1904.01475 | |||
1812.02378 | |||
SGAE | 220 | over 2 years ago | |
1811.10787 | |||
unsupervised_captioning | 215 | over 1 year ago | |
CVPR 2019 | |||
1901.02527 | |||
1908.06954 | |||
2004.03708 | |||
2003.00387 | |||
asg2cap | 200 | almost 2 years ago | |
2007.11731 | |||
Sub-GC | 96 | 3 months ago | |
2009.12313 | |||
2102.04990 | |||
Awesome Vision-and-Language: / Image Retrieval | |||
1511.07067 | |||
VisualWord2Vec | 19 | over 5 years ago | |
1812.07119 | |||
tirg | 298 | over 3 years ago | |
2105.13868 | |||
IAIS | 30 | over 1 year ago | |
2203.15867 | |||
ImageCoDe | 39 | 9 months ago | |
2407.15239 | |||
2311.17136 | |||
UniIR | 110 | about 2 months ago | |
2407.12346 | |||
Q-Pert | 1 | 2 months ago | |
Awesome Vision-and-Language: / Scene Text Recognition | |||
1908.09231 | |||
1904.01906 | |||
clovaai | 3,755 | 9 months ago | |
Awesome Vision-and-Language: / Scene Graph | |||
7298990 | |||
1602.07332 | |||
visual_genome_python_driver | 357 | about 1 year ago | |
visualgenome | |||
1701.02426 | |||
scene-graph-TF-release | 425 | over 5 years ago | |
1707.09700 | |||
MSDN | 227 | about 5 years ago | |
1711.06640 | |||
neural-motifs | 525 | over 5 years ago | |
1802.02598 | |||
1811.06410 | |||
1804.01622 | |||
sg2im | 1,300 | 4 months ago | |
1808.00191 | |||
graph-rcnn.pytorch | 733 | over 4 years ago | |
1904.00560 | |||
1909.05379 | |||
scene_generation | 187 | about 1 year ago | |
1811.10696 | |||
sceneGraph_Mem | 4 | over 5 years ago | |
1903.02728 | |||
ContrastiveLosses4VRD | 200 | over 4 years ago | |
1903.03326 | |||
KERN | 120 | over 2 years ago | |
1812.01880 | |||
VCTree | 121 | 3 months ago | |
1812.02347 | |||
1904.11622 | |||
limited-label | 54 | about 1 year ago | |
2002.11949 | |||
Scene-Graph-Benchmark | 1,075 | 27 days ago | |
2003.12962 | |||
GPS-Net | 63 | over 4 years ago | |
2006.09623 | |||
2007.08760 | |||
het-eccv20 | 16 | over 4 years ago | |
Awesome Vision-and-Language: / text2image | |||
1605.05396 | |||
icml2016 | 913 | about 6 years ago | |
1612.03242 | |||
StackGAN | 1,860 | over 4 years ago | |
1711.10485 | |||
AttnGAN | 1,339 | 4 months ago | |
1802.09178 | |||
HDGan | 150 | about 6 years ago | |
1812.02784 | |||
StoryGAN | 233 | over 2 years ago | |
1903.05854 | |||
1904.01310 | |||
1904.01480 | |||
1811.09845 | |||
GeNeVA | 37 | over 1 year ago | |
1909.05379 | |||
scene_generation | 187 | about 1 year ago | |
Awesome Vision-and-Language: / Video Captioning | |||
1411.4389 | |||
1510.07712 | |||
1701.03126 | |||
1611.08002 | |||
CVPR_2017 | |||
1804.00100 | |||
1812.05634 | |||
adv-inf | 34 | over 5 years ago | |
1904.03870 | |||
DenseVideoCaptioning | 148 | over 5 years ago | |
1906.04375 | |||
2011.07735 | |||
iPerceive | |||
Awesome Vision-and-Language: / Video Question Answering | |||
1512.02902 | |||
MovieQA | 80 | almost 8 years ago | |
1809.01696 | |||
TVQA | 172 | about 2 years ago | |
2007.08751 | |||
ROLL-VideoQA | 19 | about 4 years ago | |
2011.07735 | |||
iPerceive | |||
Awesome Vision-and-Language: / Video Understanding | |||
1811.08383 | |||
temporal-shift-module | 2,068 | 4 months ago | |
1910.11009 | |||
Awesome Vision-and-Language: / Vision and Language Navigation | |||
1711.11543 | |||
embodiedqa | |||
1711.07280 | |||
bringmeaspoon | |||
fda_pdf | |||
fda_code | 13 | 11 months ago | |
mam_paper | |||
Awesome Vision-and-Language: / Vision-and-Language Pretraining | |||
1908.07490 | |||
lxmert | 935 | about 2 years ago | |
1904.01766 | |||
vilbert | 474 | almost 2 years ago | |
1907.07804 | |||
OmniNet | 512 | about 4 years ago | |
1908.06066 | |||
Unicoder | 88 | 12 months ago | |
1909.11059 | |||
VLP | 412 | almost 3 years ago | |
1911.11237 | |||
Oscar | 1,038 | about 1 year ago | |
2006.09882 | |||
swav | 2,005 | over 1 year ago | |
2004.06165 | |||
Oscar | 1,038 | about 1 year ago | |
2006.16934 | |||
ERNIE | 6,318 | 3 months ago | |
2101.00529 | |||
VinVL | 350 | over 1 year ago | |
2006.06666 | |||
virtex | 557 | 11 months ago | |
2103.00020 | |||
2103.05247 | |||
universal-computation | 245 | almost 3 years ago | |
2102.05918 | |||
2103.01988 | |||
2102.10772 | |||
2102.12092 | |||
2103.06561 | |||
2305.08675 | |||
Awesome Vision-and-Language: / Visual Dialog | |||
1611.08669 | |||
visdial | 228 | almost 6 years ago | |
visualdialog | |||
1803.11186 | |||
2303.05983 | |||
ATVC | 7 | over 1 year ago | |
Awesome Vision-and-Language: / Visual Grounding | |||
1611.09978 | |||
cmn | 67 | about 6 years ago | |
1908.07553 | |||
1812.03299 | |||
1908.06354 | |||
1908.07129 | |||
zsgnet | 69 | over 4 years ago | |
2203.16518 | |||
CoFormer | 43 | over 1 year ago | |
Awesome Vision-and-Language: / Visual Question Answering | |||
1505.00468 | |||
visualqa | |||
1606.00061 | |||
HieCoAttenVQA | 349 | about 6 years ago | |
1606.01847 | |||
vqa-mcb | 222 | over 8 years ago | |
1511.02274 | |||
imageqa-san | 107 | almost 8 years ago | |
1511.05234 | |||
AAAA | 25 | about 4 years ago | |
1603.01417 | |||
dmn-plus | 64 | over 6 years ago | |
1606.01847 | |||
vqa-mcb | 222 | over 8 years ago | |
1606.01455 | |||
nips-mrn-vqa | 39 | almost 8 years ago | |
1609.05600 | |||
1612.00837 | |||
1704.05526 | |||
1803.08896 | |||
PSLQA | 56 | almost 6 years ago | |
1707.07998 | |||
1708.02711 | |||
vqa-winner | 164 | almost 6 years ago | |
1810.02358 | |||
VQA-Transfer-ExternalData | 20 | over 5 years ago | |
1902.09506 | |||
visualreasoning | |||
1904.08920 | |||
ICCV2019 | |||
1907.12133 | |||
scene-graphs-vqa | |||
2204.11167 | |||
RelViT | 64 | about 2 years ago | |
2208.01813 | |||
TAG | 21 | almost 2 years ago | |
Awesome Vision-and-Language: / Visual Reasoning | |||
1612.06890 | |||
1705.03633 | |||
1902.09506 | |||
visualreasoning | |||
1812.01855 | |||
1811.10830 | |||
r2c | 466 | over 3 years ago | |
VCR | |||
1909.08164 | |||
1909.02701 | |||
VSRN | 294 | almost 5 years ago | |
2010.00763 | |||
Bongard-LOGO | 51 | over 2 years ago | |
2205.13803 | |||
Bongard-HOI | 64 | about 2 years ago | |
2204.11167 | |||
RelViT | 64 | about 2 years ago | |
2307.15199 | |||
PromptStyler | |||
Awesome Vision-and-Language: / Visual Relationship Detection | |||
1608.00187 | |||
Visual-Relationship-Detection | 214 | about 4 years ago | |
1702.07191 | |||
1702.08319 | |||
drnet | 202 | about 3 years ago | |
1703.03054 | |||
DeepVariationRL | 63 | almost 6 years ago | |
1704.03114 | |||
drnet | 202 | about 3 years ago | |
1611.06641 | |||
pl-clc | 39 | over 7 years ago | |
1707.09423 | |||
1803.10362 | |||
ReferringRelationships | 260 | almost 2 years ago | |
1807.04979 | |||
ZoomNet | |||
1808.00171 | |||
vrd | 94 | about 6 years ago | |
1910.12324 | |||
Awesome Vision-and-Language: / Visual Storytelling | |||
1604.03968 | |||
visual_genome_python_driver | 357 | about 1 year ago | |
VIST | |||
1804.09160 | |||
AREL | 137 | almost 4 years ago | |
2002.00774 | |||
AAAI 2020 | |||
More related projects:
- jcjohnson/neural-style
- dmitryulyanov/fast-neural-doodle
- fmassa/object-detection.torch
- princeton-vl/pose-hg-demo
- eladhoffer/tripletnet
- manuelruder/artistic-videos
- xunhuang1995/adain-style
- cvondrick/torch-starter
- jcjohnson/torch-rnn
- sshuair/torchsat
- hszhao/semseg
- yuval-alaluf/hyperstyle
- csailvision/places365
- wkentaro/pytorch-fcn