AI & ML interests
None defined yet.
models
13
rl-llm-agent/Llama-3.2-3B-Instruct-sft-alfworld-leap-iter1
Text Generation
•
3B
•
Updated
rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-iqlearn-iter1
Updated
rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-exploration-aflworld-iter0-checkpoint-50
Updated
rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-iter2-70k
Updated
rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-shaped-iter0
Updated
rl-llm-agent/Llama-3.2-3B-Instruct-value-alfworld-8b-sft
Updated
rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iqlearn-iter0
Updated
rl-llm-agent/Llama-3.2-3B-Instruct-reward-alfworld-iqlearn-iter0
Updated
rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iter2
Updated
rl-llm-agent/Llama-3.2-3B-Instruct-online-dpo-alfworld-iter1
Text Generation
•
3B
•
Updated
•
1