RL LLM AGENT

community

https://www.sanjibanchoudhury.com/

AI & ML interests

None defined yet.

models 13

rl-llm-agent/Llama-3.2-3B-Instruct-sft-alfworld-leap-iter1

Text Generation • 3B • Updated Feb 12, 2025 • 11

rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-iqlearn-iter1

Updated Jan 20, 2025 • 8

rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-exploration-aflworld-iter0-checkpoint-50

Updated Jan 16, 2025 • 3

rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-iter2-70k

Updated Jan 16, 2025 • 11

rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-shaped-iter0

Updated Jan 14, 2025 • 5

rl-llm-agent/Llama-3.2-3B-Instruct-value-alfworld-8b-sft

Updated Jan 13, 2025 • 5

rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iqlearn-iter0

Updated Jan 13, 2025 • 4

rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-iqlearn-iter0

Updated Jan 13, 2025 • 4

rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iter2

Updated Jan 11, 2025 • 11

rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iter1

Text Generation • 3B • Updated Jan 10, 2025 • 6

datasets 0

None public yet