File size: 2,555 Bytes
acd771b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | raon-vision-encoder Copyright 2024-2026 Raon Vision Team This product includes software derived from the following projects: =============================================================================== OpenCLIP https://github.com/mlfoundations/open_clip Licensed under the MIT License (see LICENSES/MIT-OpenCLIP.txt) Copyright (c) 2012-2021 Gabriel Ilharco, Mitchell Wortsman, Nicholas Carlini, Rohan Taori, Achal Dave, Vaishaal Shankar, John Miller, Hongseok Namkoong, Hannaneh Hajishirzi, Ali Farhadi, Ludwig Schmidt Used in: model/ and train/ packages (LocCa, CLIP, loss, factory, transformer, data pipeline, training loop, etc.) =============================================================================== OpenAI CLIP https://github.com/openai/CLIP Licensed under the MIT License (see LICENSES/MIT-OpenAI-CLIP.txt) Copyright (c) 2021 OpenAI Used in: model/tokenizer.py, model/bpe_simple_vocab_16e6.txt.gz =============================================================================== Meta Platforms, Inc. (MAE / MoCo v3) Licensed under the MIT License via OpenCLIP Copyright (c) Meta Platforms, Inc. and affiliates Used in: model/pos_embed.py (sincos position embedding utilities) =============================================================================== timm (pytorch-image-models) https://github.com/huggingface/pytorch-image-models Licensed under the Apache License 2.0 Copyright (c) Ross Wightman Used in: model/transform.py (ResizeKeepRatio) =============================================================================== References The following papers informed the design and implementation of features in this software. Code was independently implemented unless noted above. - CoCa: Yu et al., "CoCa: Contrastive Captioners are Image-Text Foundation Models", 2022 - SigLIP: Zhai et al., "Sigmoid Loss for Language Image Pre-Training", 2023 - SigLIP2: Tschannen et al., "SigLIP 2: Multilingual Vision-Language Encoders", 2025 - DINO: Caron et al., "Emerging Properties in Self-Supervised Vision Transformers", 2021 - DINOv2: Oquab et al., "DINOv2: Learning Robust Visual Features without Supervision", 2024 - SILC: Naeem et al., "SILC: Improving Vision Language Pretraining with Self-Distillation", 2023 - TIPS: Huang et al., "TIPS: Text-Image Pretraining with Spatial Awareness", 2024 - Koleo: Sablayrolles et al., "Spreading vectors for similarity search", ICLR 2019 - Gram Anchoring: Simeoni et al., "DINOv3", 2025 (independently implemented) - NaFlex: from SigLIP2 / PaLI (independently implemented in PyTorch) |