AlphaBypass.3 🧠
"The first RL agent trained to understand what a national firewall finds suspicious - and what it doesn't."
What is this?
AlphaBypass is a PPO-based reinforcement learning agent trained to automatically discover optimal VLESS+REALITY proxy configurations that evade Roskomnadzor's (Russian Internet Censorship Agency) Deep Packet Inspection (DPI) systems.
Instead of manually tuning parameters, a neural network figures it out by trial and error - against a real, live DPI system. It learns what combinations of transport, fingerprint, domain, and other parameters actually work.
This is a research project studying automated network censorship through adversarial machine learning. Any resemblance to practical use is purely coincidental :)
Model Details
| Property | Value |
|---|---|
| Architecture | MLP, 3×512 hidden layers with LayerNorm |
| Parameters | ~787K |
| Algorithm | PPO (Proximal Policy Optimization) |
| Action space | Mixed discrete + continuous |
| Observation space | 75-dimensional vector |
| Training episodes (basically VPNs tried) | ~1,100 |
| Target protocol | VLESS + REALITY (xray-core) |
| Success rate | 93% |
| Avg reward | +0.81 (scale: −1.0 to +1.0) |
Reward Function
def compute_reward(metrics, baseline_mbps=32.0):
if not metrics.connected:
return -1.0
r = 0.50 * connection_quality(metrics) # ping, loss, connect time
r += 0.35 * metrics.stability_ratio # probe success rate
r += 0.15 * log_speed_score(metrics, baseline_mbps)
return r
Usage
Requires xray-core.
Load and query the model
import torch
import numpy as np
from agent import PolicyNetwork
from environment import decode_action
policy = PolicyNetwork()
ck = torch.load("best.pt", map_location="cpu", weights_only=False)
policy.load_state_dict(ck["policy_state"])
policy.eval()
obs = torch.zeros(1, 75)
with torch.no_grad():
logits, mu, _, _ = policy(obs)
discrete = np.array([l.argmax().item() for l in logits])
continuous = mu.squeeze().numpy()
config = decode_action(discrete, continuous)
print(f"{config.transport_type}:{config.proxy_port} → {config.dest_domain}")
print(f"fingerprint={config.fingerprint} frag={config.fragment_strategy}")
Server config example
{
"inbounds": [{
"port": 443,
"protocol": "vless",
"settings": {
"clients": [{"id": "YOUR-UUID-HERE", "flow": ""}],
"decryption": "none"
},
"streamSettings": {
"network": "grpc",
"security": "reality",
"grpcSettings": {"serviceName": "YOUR-SERVICE-NAME"},
"realitySettings": {
"dest": "YOUR-SNI-DOMAIN:443",
"serverNames": ["YOUR-SNI-DOMAIN"],
"privateKey": "YOUR-PRIVATE-KEY",
"shortIds": ["YOUR-SHORT-ID"]
}
}
}],
"outbounds": [{"tag": "direct", "protocol": "freedom"}]
}
Client config example
{
"inbounds": [{
"port": 10808,
"protocol": "socks",
"settings": {"auth": "noauth", "udp": true}
}],
"outbounds": [{
"protocol": "vless",
"settings": {
"vnext": [{
"address": "YOUR-SERVER-IP",
"port": 443,
"users": [{"id": "YOUR-UUID-HERE", "encryption": "none"}]
}]
},
"streamSettings": {
"network": "grpc",
"security": "reality",
"grpcSettings": {"serviceName": "YOUR-SERVICE-NAME"},
"realitySettings": {
"fingerprint": "safari",
"serverName": "YOUR-SNI-DOMAIN",
"publicKey": "YOUR-PUBLIC-KEY",
"shortId": "YOUR-SHORT-ID"
}
}
}]
}
Limitations
- DPI behavior varies by provider and region - results may differ.
- REALITY is fundamentally difficult to block without collateral damage. Some success may be protocol strength, not agent cleverness.
- No memory between deployments - unaware of overnight DPI updates.
- 787K parameters is intentional. The problem doesn't need GPT-6.
Citation
@misc{alphabypass2026,
title = {AlphaBypass: Reinforcement Learning for Automated DPI Evasion},
year = {2026},
url = {https://huggingface.co/NickupAI/alphabypass3}
}
License
MIT. Use responsibly. Especially if you live somewhere where VPNs are considered a thought crime.
"It's not about hiding. It's about the right to reach the open internet."
