AlphaBypass.3 🧠

"The first RL agent trained to understand what a national firewall finds suspicious - and what it doesn't."

What is this?

AlphaBypass is a PPO-based reinforcement learning agent trained to automatically discover optimal VLESS+REALITY proxy configurations that evade Roskomnadzor's (Russian Internet Censorship Agency) Deep Packet Inspection (DPI) systems.

Instead of manually tuning parameters, a neural network figures it out by trial and error - against a real, live DPI system. It learns what combinations of transport, fingerprint, domain, and other parameters actually work.

This is a research project studying automated network censorship through adversarial machine learning. Any resemblance to practical use is purely coincidental :)

Model Details

Property	Value
Architecture	MLP, 3×512 hidden layers with LayerNorm
Parameters	~787K
Algorithm	PPO (Proximal Policy Optimization)
Action space	Mixed discrete + continuous
Observation space	75-dimensional vector
Training episodes (basically VPNs tried)	~1,100
Target protocol	VLESS + REALITY (xray-core)
Success rate	93%
Avg reward	+0.81 (scale: −1.0 to +1.0)

Reward Function

def compute_reward(metrics, baseline_mbps=32.0):
    if not metrics.connected:
        return -1.0

    r  = 0.50 * connection_quality(metrics)  # ping, loss, connect time
    r += 0.35 * metrics.stability_ratio      # probe success rate
    r += 0.15 * log_speed_score(metrics, baseline_mbps)
    return r

Usage

Requires xray-core.

Load and query the model

import torch
import numpy as np
from agent import PolicyNetwork
from environment import decode_action

policy = PolicyNetwork()
ck = torch.load("best.pt", map_location="cpu", weights_only=False)
policy.load_state_dict(ck["policy_state"])
policy.eval()

obs = torch.zeros(1, 75)
with torch.no_grad():
    logits, mu, _, _ = policy(obs)

discrete   = np.array([l.argmax().item() for l in logits])
continuous = mu.squeeze().numpy()
config     = decode_action(discrete, continuous)

print(f"{config.transport_type}:{config.proxy_port} → {config.dest_domain}")
print(f"fingerprint={config.fingerprint} frag={config.fragment_strategy}")

Server config example

{
  "inbounds": [{
    "port": 443,
    "protocol": "vless",
    "settings": {
      "clients": [{"id": "YOUR-UUID-HERE", "flow": ""}],
      "decryption": "none"
    },
    "streamSettings": {
      "network": "grpc",
      "security": "reality",
      "grpcSettings": {"serviceName": "YOUR-SERVICE-NAME"},
      "realitySettings": {
        "dest": "YOUR-SNI-DOMAIN:443",
        "serverNames": ["YOUR-SNI-DOMAIN"],
        "privateKey": "YOUR-PRIVATE-KEY",
        "shortIds": ["YOUR-SHORT-ID"]
      }
    }
  }],
  "outbounds": [{"tag": "direct", "protocol": "freedom"}]
}

Client config example

{
  "inbounds": [{
    "port": 10808,
    "protocol": "socks",
    "settings": {"auth": "noauth", "udp": true}
  }],
  "outbounds": [{
    "protocol": "vless",
    "settings": {
      "vnext": [{
        "address": "YOUR-SERVER-IP",
        "port": 443,
        "users": [{"id": "YOUR-UUID-HERE", "encryption": "none"}]
      }]
    },
    "streamSettings": {
      "network": "grpc",
      "security": "reality",
      "grpcSettings": {"serviceName": "YOUR-SERVICE-NAME"},
      "realitySettings": {
        "fingerprint": "safari",
        "serverName": "YOUR-SNI-DOMAIN",
        "publicKey": "YOUR-PUBLIC-KEY",
        "shortId": "YOUR-SHORT-ID"
      }
    }
  }]
}

Limitations

DPI behavior varies by provider and region - results may differ.
REALITY is fundamentally difficult to block without collateral damage. Some success may be protocol strength, not agent cleverness.
No memory between deployments - unaware of overnight DPI updates.
787K parameters is intentional. The problem doesn't need GPT-6.

Citation

@misc{alphabypass2026,
  title = {AlphaBypass: Reinforcement Learning for Automated DPI Evasion},
  year  = {2026},
  url   = {https://huggingface.co/NickupAI/alphabypass3}
}

License

MIT. Use responsibly. Especially if you live somewhere where VPNs are considered a thought crime.

"It's not about hiding. It's about the right to reach the open internet."

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning