Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching Paper • 2308.09346 • Published Aug 18, 2023
M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition Paper • 2401.11649 • Published Jan 22, 2024 • 3
Synchronize Feature Extracting and Matching: A Single Branch Framework for 3D Object Tracking Paper • 2308.12549 • Published Aug 24, 2023
DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation Paper • 2403.19235 • Published Mar 28, 2024 • 1
Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing Paper • 2410.18756 • Published Oct 24, 2024
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves Paper • 2505.02831 • Published May 5, 2025 • 2
Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling Paper • 2507.17801 • Published Jul 23, 2025 • 1
TriCLIP-3D: A Unified Parameter-Efficient Framework for Tri-Modal 3D Visual Grounding based on CLIP Paper • 2507.14904 • Published Jul 20, 2025
Deforming Videos to Masks: Flow Matching for Referring Video Segmentation Paper • 2510.06139 • Published Oct 7, 2025 • 3
Distribution Matching Distillation Meets Reinforcement Learning Paper • 2511.13649 • Published Nov 17, 2025 • 6
SRA 2: Variational Autoencoder Self-Representation Alignment for Efficient Diffusion Training Paper • 2601.17830 • Published Jan 25 • 1