awesome-vision-and-language
Vision and Language Resources
A curated list of resources and datasets for research in vision and language tasks.
A curated list of awesome vision and language resources (still under construction... stay tuned!)
510 stars
12 watching
40 forks
last commit: about 1 year ago
Linked from 1 awesome list
awesomeawesome-listmultimodal-learningvision-and-language
Awesome Vision-and-Language: / Survey | |||
| 1506.06833 | |||
| 1705.09406 | |||
| 1810.04020 | |||
| 1907.09358 | |||
| Scene-Graph-Survey | |||
| 1904.09317 | |||
| ACCESS 2019 | |||
| 1911.03977 | |||
| 1912.11872 | |||
| 2010.09522 | |||
Awesome Vision-and-Language: / Dataset | |||
| 1505.00468 | |||
| visualqa | |||
| 1604.03968 | |||
| ai-visual-storytelling-seq2seq | |||
| VIST | |||
| 1602.07332 | |||
| visual_genome_python_driver | 357 | about 2 years ago | |
| visualgenome | |||
| 1612.06890 | |||
| 1705.08421 | |||
| AVA | |||
| 1711.11543 | |||
| embodiedqa | |||
| 1711.07280 | |||
| bringmeaspoon | |||
| 1902.09506 | |||
| visualreasoning | |||
| 1811.10830 | |||
| r2c | 466 | over 4 years ago | |
| VCR | |||
| 1904.03493 | |||
| 2010.00763 | |||
| Bongard-LOGO | 51 | over 3 years ago | |
| 2205.13803 | |||
| Bongard-HOI | 64 | about 3 years ago | |
Awesome Vision-and-Language: / Image Captioning | |||
| 1411.4389 | |||
| 1412.2306 | |||
| 1411.4555 | |||
| show_and_tell.tensorflow | 291 | about 9 years ago | |
| 1502.03044 | |||
| show-attend-and-tell | 907 | over 7 years ago | |
| 1411.4952 | |||
| visual-concepts | 150 | over 7 years ago | |
| 1603.03925 | |||
| semantic-attention | 51 | about 9 years ago | |
| 1612.01887 | |||
| AdaptiveAttention | 335 | almost 8 years ago | |
| 1612.00563 | |||
| 1611.06607 | |||
| 1704.03899 | |||
| 1611.08002 | |||
| Semantic_Compositional_Nets | 70 | over 7 years ago | |
| CVPR 2017 | |||
| stylenet | 63 | almost 5 years ago | |
| ENNLP 2018 | |||
| image-paragraph-captioning | 90 | about 6 years ago | |
| 1803.09845 | |||
| NeuralBabyTalk | 525 | over 6 years ago | |
| 1707.07998 | |||
| 1807.03871 | |||
| 1805.08191 | |||
| 1811.10787 | |||
| unsupervised_captioning | 215 | over 2 years ago | |
| 1906.02365 | |||
| CAVP | 46 | over 6 years ago | |
| 1903.05942 | |||
| 1903.12020 | |||
| 1904.01475 | |||
| 1812.02378 | |||
| SGAE | 221 | over 3 years ago | |
| 1811.10787 | |||
| unsupervised_captioning | 215 | over 2 years ago | |
| CVPR 2019 | |||
| 1901.02527 | |||
| 1908.06954 | |||
| 2004.03708 | |||
| 2003.00387 | |||
| asg2cap | 200 | almost 3 years ago | |
| 2007.11731 | |||
| Sub-GC | 96 | about 1 year ago | |
| 2009.12313 | |||
| 2102.04990 | |||
Awesome Vision-and-Language: / Image Retrieval | |||
| 1511.07067 | |||
| VisualWord2Vec | 19 | over 6 years ago | |
| 1812.07119 | |||
| tirg | 300 | over 4 years ago | |
| 2105.13868 | |||
| IAIS | 30 | over 2 years ago | |
| 2203.15867 | |||
| ImageCoDe | 39 | over 1 year ago | |
| 2407.15239 | |||
| 2311.17136 | |||
| UniIR | 114 | about 1 year ago | |
| 2407.12346 | |||
| Q-Pert | 1 | about 1 year ago | |
Awesome Vision-and-Language: / Scene Text Recognition | |||
| 1908.09231 | |||
| 1904.01906 | |||
| clovaai | 3,769 | over 1 year ago | |
Awesome Vision-and-Language: / Scene Graph | |||
| 7298990 | |||
| 1602.07332 | |||
| visual_genome_python_driver | 357 | about 2 years ago | |
| visualgenome | |||
| 1701.02426 | |||
| scene-graph-TF-release | 425 | over 6 years ago | |
| 1707.09700 | |||
| MSDN | 227 | almost 6 years ago | |
| 1711.06640 | |||
| neural-motifs | 526 | over 6 years ago | |
| 1802.02598 | |||
| 1811.06410 | |||
| 1804.01622 | |||
| sg2im | 1,302 | over 1 year ago | |
| 1808.00191 | |||
| graph-rcnn.pytorch | 732 | over 5 years ago | |
| 1904.00560 | |||
| 1909.05379 | |||
| scene_generation | 188 | about 2 years ago | |
| 1811.10696 | |||
| sceneGraph_Mem | 4 | over 6 years ago | |
| 1903.02728 | |||
| ContrastiveLosses4VRD | 199 | over 5 years ago | |
| 1903.03326 | |||
| KERN | 121 | about 3 years ago | |
| 1812.01880 | |||
| VCTree | 121 | over 1 year ago | |
| 1812.02347 | |||
| 1904.11622 | |||
| limited-label | 54 | about 2 years ago | |
| 2002.11949 | |||
| Scene-Graph-Benchmark | 1,085 | about 1 year ago | |
| 2003.12962 | |||
| GPS-Net | 64 | over 5 years ago | |
| 2006.09623 | |||
| 2007.08760 | |||
| het-eccv20 | 16 | over 5 years ago | |
Awesome Vision-and-Language: / text2image | |||
| 1605.05396 | |||
| icml2016 | 912 | about 7 years ago | |
| 1612.03242 | |||
| StackGAN | 1,863 | over 5 years ago | |
| 1711.10485 | |||
| AttnGAN | 1,343 | over 1 year ago | |
| 1802.09178 | |||
| HDGan | 150 | about 7 years ago | |
| 1812.02784 | |||
| StoryGAN | 233 | over 3 years ago | |
| 1903.05854 | |||
| 1904.01310 | |||
| 1904.01480 | |||
| 1811.09845 | |||
| GeNeVA | 37 | over 2 years ago | |
| 1909.05379 | |||
| scene_generation | 188 | about 2 years ago | |
Awesome Vision-and-Language: / Video Captioning | |||
| 1411.4389 | |||
| 1510.07712 | |||
| 1701.03126 | |||
| 1611.08002 | |||
| CVPR_2017 | |||
| 1804.00100 | |||
| 1812.05634 | |||
| adv-inf | 34 | over 6 years ago | |
| 1904.03870 | |||
| DenseVideoCaptioning | 149 | over 6 years ago | |
| 1906.04375 | |||
| 2011.07735 | |||
| iPerceive | |||
Awesome Vision-and-Language: / Video Question Answering | |||
| 1512.02902 | |||
| MovieQA | 80 | almost 9 years ago | |
| 1809.01696 | |||
| TVQA | 172 | about 3 years ago | |
| 2007.08751 | |||
| ROLL-VideoQA | 19 | about 5 years ago | |
| 2011.07735 | |||
| iPerceive | |||
Awesome Vision-and-Language: / Video Understanding | |||
| 1811.08383 | |||
| temporal-shift-module | 2,078 | over 1 year ago | |
| 1910.11009 | |||
Awesome Vision-and-Language: / Vision and Language Navigation | |||
| 1711.11543 | |||
| embodiedqa | |||
| 1711.07280 | |||
| bringmeaspoon | |||
| fda_pdf | |||
| fda_code | 13 | almost 2 years ago | |
| mam_paper | |||
Awesome Vision-and-Language: / Vision-and-Language Pretraining | |||
| 1908.07490 | |||
| lxmert | 938 | about 3 years ago | |
| 1904.01766 | |||
| vilbert | 473 | almost 3 years ago | |
| 1907.07804 | |||
| OmniNet | 515 | about 5 years ago | |
| 1908.06066 | |||
| Unicoder | 89 | almost 2 years ago | |
| 1909.11059 | |||
| VLP | 416 | almost 4 years ago | |
| 1911.11237 | |||
| Oscar | 1,039 | about 2 years ago | |
| 2006.09882 | |||
| swav | 2,014 | over 2 years ago | |
| 2004.06165 | |||
| Oscar | 1,039 | about 2 years ago | |
| 2006.16934 | |||
| ERNIE | 6,331 | about 1 year ago | |
| 2101.00529 | |||
| VinVL | 350 | over 2 years ago | |
| 2006.06666 | |||
| virtex | 556 | almost 2 years ago | |
| 2103.00020 | |||
| 2103.05247 | |||
| universal-computation | 245 | almost 4 years ago | |
| 2102.05918 | |||
| 2103.01988 | |||
| 2102.10772 | |||
| 2102.12092 | |||
| 2103.06561 | |||
| 2305.08675 | |||
Awesome Vision-and-Language: / Visual Dialog | |||
| 1611.08669 | |||
| visdial | 228 | almost 7 years ago | |
| visualdialog | |||
| 1803.11186 | |||
| 2303.05983 | |||
| ATVC | 7 | over 2 years ago | |
Awesome Vision-and-Language: / Visual Grounding | |||
| 1611.09978 | |||
| cmn | 67 | about 7 years ago | |
| 1908.07553 | |||
| 1812.03299 | |||
| 1908.06354 | |||
| 1908.07129 | |||
| zsgnet | 69 | over 5 years ago | |
| 2203.16518 | |||
| CoFormer | 45 | over 2 years ago | |
Awesome Vision-and-Language: / Visual Question Answering | |||
| 1505.00468 | |||
| visualqa | |||
| 1606.00061 | |||
| HieCoAttenVQA | 349 | about 7 years ago | |
| 1606.01847 | |||
| vqa-mcb | 222 | about 9 years ago | |
| 1511.02274 | |||
| imageqa-san | 108 | almost 9 years ago | |
| 1511.05234 | |||
| AAAA | 25 | about 5 years ago | |
| 1603.01417 | |||
| dmn-plus | 64 | over 7 years ago | |
| 1606.01847 | |||
| vqa-mcb | 222 | about 9 years ago | |
| 1606.01455 | |||
| nips-mrn-vqa | 39 | almost 9 years ago | |
| 1609.05600 | |||
| 1612.00837 | |||
| 1704.05526 | |||
| 1803.08896 | |||
| PSLQA | 56 | almost 7 years ago | |
| 1707.07998 | |||
| 1708.02711 | |||
| vqa-winner | 164 | almost 7 years ago | |
| 1810.02358 | |||
| VQA-Transfer-ExternalData | 20 | over 6 years ago | |
| 1902.09506 | |||
| visualreasoning | |||
| 1904.08920 | |||
| ICCV2019 | |||
| 1907.12133 | |||
| scene-graphs-vqa | |||
| 2204.11167 | |||
| RelViT | 64 | about 3 years ago | |
| 2208.01813 | |||
| TAG | 21 | almost 3 years ago | |
Awesome Vision-and-Language: / Visual Reasoning | |||
| 1612.06890 | |||
| 1705.03633 | |||
| 1902.09506 | |||
| visualreasoning | |||
| 1812.01855 | |||
| 1811.10830 | |||
| r2c | 466 | over 4 years ago | |
| VCR | |||
| 1909.08164 | |||
| 1909.02701 | |||
| VSRN | 294 | almost 6 years ago | |
| 2010.00763 | |||
| Bongard-LOGO | 51 | over 3 years ago | |
| 2205.13803 | |||
| Bongard-HOI | 64 | about 3 years ago | |
| 2204.11167 | |||
| RelViT | 64 | about 3 years ago | |
| 2307.15199 | |||
| PromptStyler | |||
Awesome Vision-and-Language: / Visual Relationship Detection | |||
| 1608.00187 | |||
| Visual-Relationship-Detection | 214 | almost 5 years ago | |
| 1702.07191 | |||
| 1702.08319 | |||
| drnet | 202 | about 4 years ago | |
| 1703.03054 | |||
| DeepVariationRL | 63 | almost 7 years ago | |
| 1704.03114 | |||
| drnet | 202 | about 4 years ago | |
| 1611.06641 | |||
| pl-clc | 39 | over 8 years ago | |
| 1707.09423 | |||
| 1803.10362 | |||
| ReferringRelationships | 260 | almost 3 years ago | |
| 1807.04979 | |||
| ZoomNet | |||
| 1808.00171 | |||
| vrd | 94 | about 7 years ago | |
| 1910.12324 | |||
Awesome Vision-and-Language: / Visual Storytelling | |||
| 1604.03968 | |||
| visual_genome_python_driver | 357 | about 2 years ago | |
| VIST | |||
| 1804.09160 | |||
| AREL | 136 | almost 5 years ago | |
| 2002.00774 | |||
| AAAI 2020 | |||
More related projects:
-
jcjohnson/neural-style
-
dmitryulyanov/fast-neural-doodle
-
fmassa/object-detection.torch
-
princeton-vl/pose-hg-demo
-
eladhoffer/tripletnet
-
manuelruder/artistic-videos
-
xunhuang1995/adain-style
-
cvondrick/torch-starter
-
jcjohnson/torch-rnn
-
sshuair/torchsat
-
hszhao/semseg
-
yuval-alaluf/hyperstyle
-
csailvision/places365
-
wkentaro/pytorch-fcn