Top pick
Contrastive vision–language pretraining reference implementation: map images and text to a shared embedding space.
multimodalvisionnlpembeddingstaaft-repositories
Filter by platform, license text, maturity, maintenance cadence, and editorial tags like privacy-focused or self-hosted. Search matches names, summaries, tags, and use cases.
2 tools match your filters
Contrastive vision–language pretraining reference implementation: map images and text to a shared embedding space.
DeepSeek Janus series: unified multimodal understanding and generation models with MIT-licensed research code.