| --- |
| title: HR Onboarding & Offboarding Environment |
| emoji: π’ |
| colorFrom: green |
| colorTo: blue |
| sdk: docker |
| pinned: false |
| app_port: 7860 |
| base_path: /playground |
| tags: |
| - openenv |
| --- |
| |
| # HR Onboarding & Offboarding Environment |
|
|
| [](https://colab.research.google.com/github/ravi03071991/rl_hack/blob/master/train_hr_agent.ipynb) |
|
|
| An OpenEnv-compatible RL environment that simulates enterprise HR onboarding and offboarding workflows. The agent orchestrates across **6 enterprise apps** β Workday, ServiceNow, Okta, Email, Slack, and Calendar β using 25 tools to complete multi-step tasks in a realistic HR system (200+ employees, 8 departments, RBAC, approval chains). |
|
|
| Built for the [OpenEnv Hackathon SF](https://cerebralvalley.ai/e/openenv-hackathon-sf/details) β **Statement 3.1: Professional Tasks** (Scaler AI Labs partner theme: Multi-App RL Environment for Enterprise Workflows). |
|
|
| ### Key Results |
|
|
| > **GRPO training on Llama 3.2-1B-Instruct improves mean task score by +67% (0.37 β 0.62).** |
| > Complex multi-step task scores **more than double** (0.26 β 0.68). Gains generalize to held-out test tasks. |
|
|
| | | Baseline | Trained | Improvement | |
| |---|---------|---------|-------------| |
| | Mean Score | 0.370 | 0.617 | **+67%** | |
| | Complex Tasks | 0.26 | 0.68 | **+162%** | |
| | Pass Rate | 15.4% | 19.2% | +3.8pp | |
|
|
| ## Quick Start |
|
|
| ```python |
| from rl_hack import HROnboardingAction, HROnboardingEnv |
| |
| # Connect to the environment |
| with HROnboardingEnv(base_url="http://localhost:7860") as env: |
| result = env.reset() |
| print(result.observation) # Task instruction + available tools |
| |
| # Agent calls tools to complete the task |
| result = env.step(HROnboardingAction( |
| tool_name="hr_create_employee", |
| arguments={"name": "Priya Sharma", "department": "Engineering", "level": "L2", "role": "Software Engineer"} |
| )) |
| print(result.observation) # Tool result |
| print(result.reward) # Rubric-based reward |
| ``` |
|
|
| ## Tools / Actions (25 MCP Tools) |
|
|
| The agent interacts with the environment by calling these tools. Each tool modifies the world state and returns a result. |
|
|
| ### HR System (5 tools) |
|
|
| | # | Tool | Description | Key Parameters | |
| |---|------|-------------|----------------| |
| | 1 | `hr_create_employee` | Create a new employee record | `name`, `department`, `level`, `role`, `manager_id`, `is_contractor` | |
| | 2 | `hr_read_employee` | Look up employee by ID or email | `emp_id` or `email` | |
| | 3 | `hr_update_employee` | Update employee fields (status, department, etc.) | `emp_id`, `updates` (dict) | |
| | 4 | `hr_search_employees` | Search/filter employees by criteria | `department`, `level`, `status`, `location`, `role` | |
| | 5 | `hr_get_org_chart` | Get reporting hierarchy for a department | `department` | |
|
|
| ### Onboarding / Offboarding (6 tools) |
|
|
| | # | Tool | Description | Key Parameters | |
| |---|------|-------------|----------------| |
| | 6 | `onboarding_create_request` | Initiate onboarding for a new hire | `employee_id` | |
| | 7 | `onboarding_get_status` | Check onboarding progress | `request_id` or `employee_id` | |
| | 8 | `onboarding_complete_step` | Mark an onboarding step as done | `request_id`, `step` | |
| | 9 | `offboarding_create_request` | Initiate offboarding for departing employee | `employee_id`, `reason`, `exit_date` | |
| | 10 | `offboarding_get_status` | Check offboarding progress | `request_id` or `employee_id` | |
| | 11 | `offboarding_complete_step` | Mark an offboarding step as done | `request_id`, `step` | |
|
|
| ### IT Provisioning (5 tools) |
|
|
| | # | Tool | Description | Key Parameters | |
| |---|------|-------------|----------------| |
| | 12 | `it_assign_asset` | Assign laptop/monitor/phone to employee | `asset_id`, `employee_id` | |
| | 13 | `it_get_available_assets` | List unassigned assets by type | `asset_type` (laptop, monitor, phone, headset) | |
| | 14 | `it_create_account` | Create email/Slack/VPN/GitHub accounts | `employee_id`, `account_types` | |
| | 15 | `it_revoke_access` | Revoke all IT access (for offboarding) | `employee_id` | |
| | 16 | `it_get_software_licenses` | Check license seat availability | `software_name` | |
|
|
| ### Access Control (4 tools) |
|
|
| | # | Tool | Description | Key Parameters | |
| |---|------|-------------|----------------| |
| | 17 | `access_assign_role` | Assign RBAC role (checks level/dept restrictions) | `employee_id`, `role_id` | |
| | 18 | `access_create_badge` | Create physical access badge | `employee_id`, `access_zones` | |
| | 19 | `access_revoke_role` | Revoke a specific access role | `employee_id`, `role_id` | |
| | 20 | `access_get_security_groups` | List all security groups and resources | _(none)_ | |
|
|
| ### Communication (3 tools) |
|
|
| | # | Tool | Description | Key Parameters | |
| |---|------|-------------|----------------| |
| | 21 | `email_send` | Send email (welcome, farewell, notifications) | `from_address`, `to_address`, `subject`, `body` | |
| | 22 | `slack_send_message` | Post in Slack channel or DM | `channel`, `sender`, `text` | |
| | 23 | `meeting_schedule` | Schedule orientation, 1-on-1, exit interview | `title`, `attendees`, `datetime`, `meeting_type` | |
|
|
| ### Policy & Approval (2 tools) |
|
|
| | # | Tool | Description | Key Parameters | |
| |---|------|-------------|----------------| |
| | 24 | `policy_lookup` | Look up company policies by topic/department | `topic`, `department`, `policy_id` | |
| | 25 | `approval_request` | Submit approval (manager/IT/security/legal) | `request_id`, `approver_id`, `approval_type` | |
|
|
| ## Tasks (77 tasks across 4 categories) |
|
|
| Each episode presents one task. The agent must call the right tools in the right order. |
|
|
| ### Task Categories |
|
|
| | Category | Count | Example | |
| |----------|-------|---------| |
| | **Lookup** (simple) | 11 | "List all employees in the Engineering department" | |
| | **Onboarding** | 32 | "Fully onboard John Lee as L3 Team Lead in Data Science β create record, assign laptop, provision accounts, set up access, send welcome email, schedule orientation" | |
| | **Offboarding** | 24 | "Offboard departing director β revoke all access, reclaim assets, reassign reports, send farewell, schedule exit interview" | |
| | **Cross-workflow** | 10 | "Employee transferring from Engineering to Product β offboard from old dept, onboard to new" | |
|
|
| ### Difficulty Levels |
|
|
| | Difficulty | Count | Tools per task | Description | |
| |------------|-------|---------------|-------------| |
| | Simple | 19 | 1-2 | Single lookups or status checks | |
| | Medium | 21 | 2-4 | Create + initiate workflows | |
| | Complex | 25 | 5-10 | Full end-to-end workflows with approvals | |
| | Edge case | 12 | 2-5 | Business rule violations, policy constraints | |
|
|
| ### Edge Cases (designed to test policy compliance) |
|
|
| - Department at **headcount limit** β create employee should fail |
| - Software license **seats full** (Netsuite, LinkedIn Sales Navigator) |
| - Manager **on leave** β must find skip-level manager for approvals |
| - **Contractor** onboarding β different rules (no VPN, limited access, legal approval required) |
| - **Termination** vs resignation β different offboarding steps, no farewell email |
| - **Offer rescinded** β offboard someone mid-onboarding |
| - **Level mismatch** β L1 employee can't get L4+ access roles |
| - **Department restriction** β Marketing employee can't get Engineering GitHub role |
|
|
| ## World State (500+ entities) |
|
|
| | Entity | Count | Description | |
| |--------|-------|-------------| |
| | Employees | 200 | Full org hierarchy across 8 departments (L1-L6) | |
| | Departments | 8 | Engineering, Product, Marketing, Sales, Finance, HR, Data Science, Security | |
| | IT Assets | 100 | Laptops (50), monitors (25), phones (15), headsets (10) | |
| | Access Roles | 20 | RBAC roles with level/department restrictions | |
| | Software Licenses | 15 | Jira, GitHub, AWS, Slack, Salesforce, etc. (2 intentionally full) | |
| | Policies | 15 | Onboarding, offboarding, badge access, contractor, termination, etc. | |
| | Security Groups | 15 | engineering_team, vpn_users, server_room_access, etc. | |
| | Message Templates | 12 | Welcome/farewell emails, Slack messages, notifications | |
|
|
| ### RBAC Rules |
|
|
| - **L1** Associate β **L2** Senior β **L3** Team Lead β **L4** Manager β **L5** Director β **L6** VP |
| - L3+ can approve onboarding |
| - L4+ required for security approvals and server room badge access |
| - Contractors require legal approval |
| - Access roles have minimum level requirements and department restrictions |
|
|
| ## Reward / Rubric |
|
|
| Each task has a rubric with verifiable criteria. Reward = proportion of criteria satisfied. |
|
|
| ### Rubric Check Types |
|
|
| | Check | Example | What it verifies | |
| |-------|---------|-----------------| |
| | `tool_used` | `tool_used:hr_create_employee` | Tool was called at least once | |
| | `tool_not_used` | `tool_not_used:slack_send_message` | Tool was NOT called (e.g. no farewell for terminations) | |
| | `tool_used_any` | `tool_used_any:email_send,slack_send_message` | At least one of the tools was used | |
| | `param_value` | `param_value:hr_create_employee.name=Priya Sharma` | Tool called with specific parameter value | |
| | `param_contains` | `param_contains:policy_lookup.topic=onboard` | Parameter contains substring | |
| | `tool_order` | `tool_order:hr_create_employee<onboarding_create_request` | Tool A called before Tool B | |
| | `tool_count` | `tool_count:onboarding_complete_step>=3` | Tool called at least N times | |
| | `result_contains` | `result_contains:headcount_limit` | Any tool result contains substring | |
|
|
| ### Example Rubric (medium task) |
|
|
| Task: "Onboard Priya Sharma to Engineering as L2 Software Engineer" |
|
|
| | Criterion | Check | |
| |-----------|-------| |
| | Created employee record | `tool_used:hr_create_employee` | |
| | Correct name | `param_value:hr_create_employee.name=Priya Sharma` | |
| | Correct department | `param_value:hr_create_employee.department=Engineering` | |
| | Correct level | `param_value:hr_create_employee.level=L2` | |
| | Correct role | `param_value:hr_create_employee.role=Software Engineer` | |
| | Initiated onboarding | `tool_used:onboarding_create_request` | |
| | Correct sequencing | `tool_order:hr_create_employee<onboarding_create_request` | |
|
|
| **Score**: 7/7 = 1.0 (pass) or partial (e.g. 5/7 = 0.71) |
|
|
| ## Environment API |
|
|
| ### OpenEnv Interface (MCPEnvironment) |
|
|
| ``` |
| reset() β Observation # Pick task, reset world state, return instruction |
| step() β Observation # Agent calls a tool, get result + reward |
| state β State # Current step count, episode ID |
| ``` |
|
|
| ### Episode Flow |
|
|
| ``` |
| 1. env.reset() |
| β Task: "Fully onboard John Lee as L3 Team Lead..." |
| |
| 2. Agent calls: hr_create_employee(name="John Lee", department="Data Science", level="L3", ...) |
| β env.step() β {"success": true, "emp_id": "emp_0201"} |
| |
| 3. Agent calls: onboarding_create_request(employee_id="emp_0201") |
| β env.step() β {"success": true, "request_id": "onb_0001", "steps": {...}} |
| |
| 4. Agent calls: it_get_available_assets(asset_type="laptop") |
| β env.step() β {"success": true, "assets": [...]} |
| |
| 5. Agent calls: it_assign_asset(asset_id="asset_003", employee_id="emp_0201") |
| β env.step() β {"success": true} |
| |
| ... more tool calls ... |
| |
| N. Episode ends (max 15 steps or agent signals done) |
| β Reward: 8/10 criteria satisfied = 0.8 |
| ``` |
|
|
| ## Project Structure |
|
|
| ``` |
| rl_hack/ |
| βββ README.md # This file |
| βββ openenv.yaml # OpenEnv manifest |
| βββ pyproject.toml # Project metadata |
| βββ __init__.py # Module exports |
| βββ client.py # HROnboardingEnv client |
| βββ models.py # Action/Observation Pydantic models |
| βββ test_with_llm.py # Test single task with GPT agent |
| βββ test_all_tasks.py # Evaluate all 77 tasks |
| βββ train_hr_agent.ipynb # GRPO training notebook (Unsloth) |
| βββ .env # API keys (gitignored) |
| βββ outputs/ # Evaluation results |
| βββ server/ |
| βββ __init__.py |
| βββ app.py # FastAPI application |
| βββ hr_onboarding_environment.py # Core environment (Environment subclass) |
| βββ world.py # World state (entities, RBAC, mutations) |
| βββ tools.py # Tool registry (25 tools) |
| βββ tasks.py # Task definitions + generation (77 tasks) |
| βββ rubrics.py # Rubric evaluator (reward computation) |
| βββ data/ |
| β βββ employees.json # 200 employee records |
| β βββ departments.json # 8 departments with policies |
| β βββ policies.json # 15 business rule documents |
| β βββ it_assets.json # 100 IT assets |
| β βββ access_roles.json # 20 RBAC roles |
| β βββ templates.json # 12 message templates |
| βββ Dockerfile # Container image |
| βββ requirements.txt # Server dependencies |
| ``` |
|
|
| ## Testing with an LLM Agent |
|
|
| You can test the environment locally using GPT (or any OpenAI-compatible model) as the agent. |
|
|
| ### Setup |
|
|
| 1. Create a `.env` file in the repo root: |
| ``` |
| OPENAI_API_KEY="sk-proj-..." |
| ``` |
|
|
| 2. Install dependencies: |
| ```bash |
| uv pip install -e ".[eval]" |
| ``` |
|
|
| ### Run |
|
|
| ```bash |
| cd rl_hack |
| |
| # Test on default task (simple lookup) |
| uv run python -m test_with_llm |
| |
| # Test a specific task by index (0-76) |
| uv run python -m test_with_llm 14 # medium onboarding task |
| uv run python -m test_with_llm 24 # complex full onboarding |
| uv run python -m test_with_llm 55 # edge case (headcount limit) |
| |
| # Run full evaluation across all 77 tasks |
| uv run python test_all_tasks.py |
| ``` |
|
|
| The script will: |
| - Reset the environment and pick a task |
| - Use GPT-4o-mini to generate tool calls |
| - Execute each tool call against the environment |
| - Print the rubric evaluation with pass/fail per criterion |
|
|
| ### Example Output |
|
|
| ``` |
| Task ID: task_0015 |
| Difficulty: medium |
| Instruction: Onboard new hire Priya Sharma to Engineering as L2 Software Engineer... |
| |
| --- Step 1/15 --- |
| LLM: {"tool": "hr_create_employee", "params": {"name": "Priya Sharma", ...}} |
| Tool: hr_create_employee |
| Result: {"success": true, "employee": {"emp_id": "emp_0201", ...}} |
| |
| --- Step 2/15 --- |
| LLM: {"tool": "onboarding_create_request", "params": {"employee_id": "emp_0201"}} |
| Tool: onboarding_create_request |
| Result: {"success": true, ...} |
| |
| FINAL EVALUATION |
| Score: 100% (7/7 criteria) |
| Passed: True |
| [PASS] created_employee |
| [PASS] correct_name |
| [PASS] correct_dept |
| [PASS] initiated_onboarding |
| [PASS] sequencing |
| ``` |
|
|
| ### Task Index Reference |
|
|
| | Index | Difficulty | Category | Description | |
| |-------|-----------|----------|-------------| |
| | 0-13 | Simple | Lookup/Onboarding | Single lookups, status checks | |
| | 14-23 | Medium | Onboarding | Create employee + initiate workflow | |
| | 24-34 | Complex | Onboarding | Full end-to-end with IT, access, comms | |
| | 35-46 | Medium | Offboarding | Initiate offboarding + revoke access | |
| | 47-54 | Complex | Offboarding | Full offboarding with asset reclaim | |
| | 55-66 | Edge case | Various | Headcount limits, license caps, RBAC | |
| | 67-76 | Complex | Cross-workflow | Transfers, rehires, manager departures | |
|
|
| ## Installation |
|
|
| ```bash |
| # Clone the repo |
| git clone https://github.com/ravi03071991/rl_hack.git |
| cd rl_hack |
| |
| # Install core dependencies |
| uv pip install -e . |
| |
| # Install with evaluation support (adds openai) |
| uv pip install -e ".[eval]" |
| |
| # Install with training support (adds unsloth, trl, torch, etc.) |
| uv pip install -e ".[train]" |
| |
| # Install everything |
| uv pip install -e ".[eval,train,dev]" |
| ``` |
|
|
| ## Building & Running |
|
|
| ```bash |
| # Run locally (as OpenEnv HTTP server with playground UI) |
| uvicorn server.app:app --reload --host 0.0.0.0 --port 7860 |
| |
| # Build Docker image |
| docker build -t hr-onboarding-env:latest -f server/Dockerfile . |
| |
| # Deploy to HF Spaces |
| openenv push |
| ``` |
|
|
| ## Training & Results |
|
|
| We use Unsloth + GRPO to train an LLM agent on this environment. See [`train_hr_agent.ipynb`](train_hr_agent.ipynb) for the full training notebook and [W&B run](https://wandb.ai/ravi03071991/hr-agent-training/runs/bgent3o3?nw=nwuserravi03071991) for live training metrics. |
|
|
| ### Setup |
|
|
| - **Model**: Llama 3.2-1B-Instruct (4-bit quantized, LoRA rank 8) |
| - **Algorithm**: GRPO (Group Relative Policy Optimization) |
| - **Reward functions**: Valid JSON + rubric score + efficiency |
| - **Training**: 300 steps, 6 generations per prompt, lr=5e-5 with cosine schedule |
| - **Data split**: 70/30 stratified train/test (52 train, 25 test tasks) |
|
|
| ### Results |
|
|
| GRPO training significantly improves the model's ability to complete HR workflows: |
|
|
| | Metric | Base Model | Trained | Change | |
| |--------|-----------|---------|--------| |
| | **Train pass rate** | 15.4% | 19.2% | +3.8% | |
| | **Train mean score** | 0.370 | 0.617 | **+0.247 (+67%)** | |
| | **Test pass rate** | 12.0% | 16.0% | +4.0% | |
| | **Test mean score** | 0.370 | 0.617 | **+0.247 (+67%)** | |
|
|
| #### Improvement by difficulty |
|
|
| | Difficulty | Baseline | Trained | Change | |
| |------------|----------|---------|--------| |
| | Simple | 0.23 | 0.50 | +0.27 | |
| | Medium | 0.72 | 0.86 | +0.14 | |
| | **Complex** | **0.26** | **0.68** | **+0.42** | |
| | Edge case | 0.22 | 0.25 | +0.03 | |
|
|
| The biggest gains are on **complex multi-step tasks** β scores more than doubled. The improvement **generalizes to held-out test tasks**, proving the model learned transferable HR workflow skills. |
|
|
| ### Reward Curve |
|
|
|  |
|
|
| The moving average reward trends upward from ~2-3 early in training to ~4-5 by the end, showing consistent learning. |
|
|
| ### Quick start (Colab) |
|
|
| 1. Click the Colab badge at the top to open `train_hr_agent.ipynb` in Google Colab |
| 2. Select a GPU runtime |
| 3. Run all cells β installs dependencies, trains, and evaluates automatically |
|
|
| ## Live Demo |
|
|
| Try the environment on Hugging Face Spaces: https://huggingface.co/spaces/devxpy/rl_hack |
| |