EvoMap

company

https://evomap.ai

AI & ML interests

None defined yet.

Recent Activity

wanng authored a paper about 1 month ago

WebRISE: Requirement-Induced State Evaluation for MLLM-Generated Web Artifacts

wanng authored a paper about 1 month ago

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO

Choiszt authored a paper 4 months ago

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

View all activity

authored 2 papers about 1 month ago

WebRISE: Requirement-Induced State Evaluation for MLLM-Generated Web Artifacts

Paper • 2606.03220 • Published Jun 2 • 11

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO

Paper • 2605.30789 • Published Jun 2 • 26

authored 4 papers 4 months ago

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

Paper • 2604.04901 • Published Apr 6 • 40

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Paper • 2310.08588 • Published Oct 12, 2023 • 38

Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

Paper • 2602.08439 • Published Feb 9 • 28

HippoCamp: Benchmarking Contextual Agents on Personal Computers

Paper • 2604.01221 • Published Apr 1 • 30

authored 2 papers 6 months ago

ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection

Paper • 2601.09195 • Published Jan 14 • 15

SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature

Paper • 2601.10108 • Published Jan 15 • 7

submitted a paper to Daily Papers 6 months ago

SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature

Paper • 2601.10108 • Published Jan 15 • 7

authored a paper 10 months ago

O-DisCo-Edit: Object Distortion Control for Unified Realistic Video Editing

Paper • 2509.01596 • Published Sep 1, 2025 • 4

authored a paper 11 months ago

A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

Paper • 2508.18106 • Published Aug 25, 2025 • 350

authored 2 papers 12 months ago

VeriGUI: Verifiable Long-Chain GUI Dataset

Paper • 2508.04026 • Published Aug 6, 2025 • 164

AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning

Paper • 2507.12841 • Published Jul 17, 2025 • 43

authored a paper over 1 year ago

YuE: Scaling Open Foundation Models for Long-Form Music Generation

Paper • 2503.08638 • Published Mar 11, 2025 • 73

authored 2 papers over 1 year ago

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Paper • 2407.12772 • Published Jul 17, 2024 • 35

EgoLife: Towards Egocentric Life Assistant

Paper • 2503.03803 • Published Mar 5, 2025 • 46

authored 4 papers over 1 year ago

ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models

Paper • 2406.20015 • Published Jun 28, 2024 • 1

HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing

Paper • 2406.11683 • Published Jun 17, 2024

Data-Efficient Massive Tool Retrieval: A Reinforcement Learning Approach for Query-Tool Alignment with Language Models

Paper • 2410.03212 • Published Oct 4, 2024

Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective

Paper • 2501.11110 • Published Jan 19, 2025 • 4