Safetensors
qwen3_5
wbhVince829 commited on
Commit
2c4e721
·
1 Parent(s): ef8264e

upload v2

Browse files
Files changed (1) hide show
  1. README.md +64 -93
README.md CHANGED
@@ -1,27 +1,26 @@
1
  ---
2
  license: apache-2.0
3
  ---
4
- # TelecomGPT-R1: The Best Open-Source Telecom Large Language Model
5
 
6
- > A 27B open model that ranks **#1 on the GSMA Open Telco Leaderboard** among all open-source models by a 27-point margin**, and **beats GPT-5 on 6 of 7 benchmarks**.
7
 
8
  ---
9
 
10
  ## 1 — A New State of the Art for Telecom LLMs
11
 
12
- **TelecomGPT-R1 is the strongest publicly available large language model (LLM) for telecommunications.**
13
 
14
- On the public **[GSMA Open Telco Leaderboard](https://huggingface.co/spaces/GSMA/open-telco-leaderboard)** the complete benchmark suite that aggregates seven public telecom benchmarks across knowledge QA, protocol understanding, fault analysis, and modeling & computation, as shown in Figure 1, TelecomGPT-R1 is:
 
 
 
15
 
16
- - Ranked #1 Open-Source Globally TelecomGPT-R1 secures the #1 overall spot out of all 86 evaluated models on the leaderboard, beating any open-source models by a large 27-point margin.
17
 
18
- - Outperforming General Domain Giants – In head-to-head match-ups against GPT-5, our 27B open policy wins 6 out of 7 benchmarks.
19
 
20
- - Cracking the Hardest Axis – On **TeleLogs**, the leaderboard's most notoriously difficult axis — multi-step root-cause analysis over dense RAN engineering features and drive-test measurements, TelecomGPT-R1 lifts the score to **97%**, a **+55-point** leap over the strongest open-source baseline (DeepSeek-V3, 685B) and ahead of every frontier closed-source generalist.
21
-
22
-
23
- ![radar_chart_v0](https://cdn-uploads.huggingface.co/production/uploads/6882f57510e86d9f80580702/-ZkxlB0p1XHmJCEDS6MKb.png)
24
- **Figure 1 | TelecomGPT-R1 vs frontier closed-source models on the GSMA Open Telco Leaderboard.** *Each spoke is one benchmark (plus the overall average), normalized by its per-axis leaderboard best so that `1.0` = best score on that benchmark. Our 27B open-source policy reaches `1.0` on four of eight axes (3GPP-TSG, TeleLogs, TeleTables, Average) and stays at or above `0.89` on every other axis — visibly tracing the outer edge of the radar where no other model can match it on all axes simultaneously.*
25
 
26
 
27
  ---
@@ -29,15 +28,35 @@ On the public **[GSMA Open Telco Leaderboard](https://huggingface.co/spaces/GSMA
29
 
30
  ## 2 — Toward Universal Telecom Reasoning
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
- The telecommunications sector does not communicate in a single data language. As shown in Fig 2, true telecom intelligence demands working across highly diverse tasks and data modalities: interpreting the legalistic text of 3GPP standards, navigating through the highly structured layout of configuration tables, designing the algorithmic logic of MATLAB code, and debugging from the messy strings of raw hardware network logs.
34
 
 
 
 
 
 
 
35
 
36
- Until now, general-purpose AI giants have stumbled when confronted with these highly diverse domain-specific data landscapes, despite powerful native reasoning abilities. Meanwhile, most existing telecom domain LLMs has been to focus on narrow tasks such as log classification or domain knowledge question answering, leaving them unable to perform complex real-life tasks such as root-cause diagnostics.
37
 
38
- TelecomGPT-R1 represents a definitive leap forward. Built on top of an open-source 27B parameter base, it establishes a new state of the art as the industry's first true universal reasoning model capable of fluent, polymathic intelligence across diverse telecom task and data types **knowledge QA, protocol understanding, fault analysis, and modeling & computation** under a single unified policy.
39
- ![four_axes_radia](https://cdn-uploads.huggingface.co/production/uploads/6882f57510e86d9f80580702/IYs4rpe9Ij1e6KJf5qKy5.png)
40
- **Figure 2 | The four kinds of reasoning a telecom engineer juggles.** *Each scope shows one axis of telecom work — knowledge QA (15.3%), protocol understanding (22.7%), fault analysis (18.5%), modeling & computation (43.5%) — and the share of the 158,915-example TelecomGPT-R1 training corpus that targets it. The cross-axis distribution explains why we train one unified policy rather than four specialists: a real workflow mixes all four in the same session.*
 
41
 
42
  <!-- A telecom engineer's day cuts across four very different kinds of thinking — and a useful AI has to fluidly switch between them:
43
 
@@ -50,104 +69,65 @@ TelecomGPT-R1 represents a definitive leap forward. Built on top of an open-sour
50
 
51
  ---
52
 
53
- ## 3 — How We Did It: The Recipe at a Glance
54
-
55
- To train an unified model capable of navigating through this diverse data landscape, we had to rethink both data curation and post-training choices. The resulting recipe rests on two foundational design decisions:
56
-
57
- 1) Instead of training separate models or disconnected datasets for standards QA, logs, tables, math, and code, we curate all sources into a single unified telecom reasoning corpus and train one policy over the whole space. This matters because telecom concepts do not stay inside one format. A scheduling rule may appear as prose in a standard, as a row in a configuration table, as a constraint in an equation, as a pattern in logs, or as logic inside code. TelecomGPT-R1 is trained on a 158,915-example unified corpus constructed through an eight-step pipeline. Each example is converted into the same chat format, tagged by reasoning axis and source type, verified with task-specific checks, and prepared for both supervised fine-tuning (SFT) and reinforcement learning (RL).
58
-
59
- 2) We post-train with SFT followed by a **three-pillar RL recipe** that combines Dynamic sAmpling Policy Optimization (DAPO) for stable training of diverse tasks and data types, a difficulty-mined multi-stage curriculum learning, and dense reward signals from self-rubric on highly complex, derivation-heavy tasks.
60
-
61
- ![recipe_4stage_v0.png-2026-05-19-00-20-52-309](https://cdn-uploads.huggingface.co/production/uploads/6882f57510e86d9f80580702/v0pnV58Y3uu3hqPQ6ZQeu.png)
62
- **Figure 3 | The TelecomGPT-R1 three-stage post-training recipe.** *Stage ① curates heterogeneous telecom sources through an eight-step pipeline into one axis-indexed 158,915-example corpus. Stage ② installs cross-axis long-CoT reasoning on [Qwen3.5-27B](https://huggingface.co/Qwen/Qwen3.5-27B) via LoRA-SFT. Stage ③ is combined of **DAPO** which stabilizes the gradient, a **difficulty-mined curriculum** advances the prompt distribution from easy to hard, and a **self-rubric reward** — rubrics generated by LLM or projected from the expert reference, then scored as a decomposed sum of per-rubric binary indicators — densifies the sparse 0/1 outcome signal, yielding the final TelecomGPT-R1 27B policy.*
63
-
64
- ---
65
 
66
- ## 4 Stage 1: Training Data Curation
67
 
68
- TelecomGPT-R1 begins with corpus construction. We collect heterogeneous telecom sources standards, technical documents, tables, logs, code, math problems, and modeling examples and convert them into a unified training format.
69
 
70
- The pipeline has eight steps:
71
 
72
- ### The eight pipeline steps at a glance
73
 
74
- | Step | What it does | Why it matters |
75
- |---|---|---|
76
- | **S1** — Source-grounded extraction | Modality-specific extractors (AST for code, VLM PDF parsing for textbooks, working-group label projection for specs, row-window slicing for tables, formula masking for math papers, engineering-feature aggregation for raw logs) | Converts heterogeneous telecom data into a common schema. |
77
- | **S2** — Long-CoT generation | Three trace generators chosen by reasoning type: multiple teacher LLMs (with self-validation) for QA, executable-Python-grounded CoT for derivations, deterministic rule-replay CoT for diagnosis | Right tool for each reasoning type — not one teacher for everything. |
78
- | **S3** — Multi-pass verification | Axis-matched verifiers (exact match / unit-tolerant numeric closeness / rule-replay accuracy / on-policy re-answering) | Filters incorrect or ungrounded reasoning before training. |
79
- | **S4** — Augmentation | Variable resampling 5×–20×; prefix/suffix decomposition into intermediate-target + final-target pairs | Expands diversified data coverage while preserving structured reasoning. |
80
- | **S5** — Leakage prevention | Cross-benchmark dedup vs. all public eval splits; SHA-256-archived test sets | Ensures leaderboard gains reflect learned capability rather than benchmark contamination. |
81
- | **S6** — Difficulty stratification | Estimate example difficulty using model pass rates and verifier outcomes. | Provides the difficulty signal later used by the RL curriculum. |
82
- | **S7** — Format unification | One `{system, user, assistant}` chat schema; fixed answer-format vocabulary; `meta.axis` and `meta.source_track` tags on every row | Makes the corpus trainable, searchable, reweightable, and ablatable as one whole. |
83
- | **S8** — Reasoning-style mixing | Mix in a small amount of general long-reasoning data from student before SFT. | Preserves self-correction and reflective reasoning patterns.|
84
-
85
-
86
- ![data_radar_v0](https://cdn-uploads.huggingface.co/production/uploads/6882f57510e86d9f80580702/82qavUtMTYeOs3fEGcIWm.png)
87
- **Figure 4 | Composition of the 158,915-example unified telecom training corpus.** *Five source families on the outer ring fold into the four reasoning axes of Figure 2; the middle ring breaks each family into its sub-corpora, and the inner radial bars encode per-corpus row counts on a log scale.*
88
 
89
  ---
90
 
91
- ## 5Stage 2: Post Training Algorithms
92
-
93
- We first perform supervised fine-tuning (LoRA adapters on a Qwen3.5 27B). This stage teaches the base model to speak the language of telecom reasoning: how to interpret standards, follow protocol constraints, reason over tables, analyze logs, solve derivations, and produce structured answers. This stage is critical because vertical-domain RL cannot create missing knowledge from nothing. A strong base model may know how to reason in general, but if it lacks the relevant telecom facts and conventions, RL can amplify fluent wrong reasoning.
94
-
95
- Reinforcement learning is appealing for telecom because many tasks have clear correctness signals: a table question has a right option, a derivation has a right result, and a code problem either satisfies the expected behavior or not. But naïve RL fails quickly in this setting.
96
-
97
- The first issue is sparse feedback. Telecom problems are structured but unforgiving: one wrong unit, condition, protocol branch, or table row can make the final answer wrong, even if most of the reasoning is useful. A pure final-answer reward turns these cases into zeros and gives the model little guidance about what went right.
98
-
99
- The second issue is uneven difficulty and learning progress across domains. In unified training, knowledge QA may improve early, while table reasoning, log analysis, math derivation, and code understanding often need much longer. If all domains are trained uniformly, the training will be long and inefficient.
100
-
101
- The third issue is shortcut learning. On benchmark-style tasks, a model can sometimes guess from answer priors, exploit formatting artifacts, or produce plausible explanations without using the right telecom evidence. For a domain model, this is unacceptable: we want grounded reasoning, not better guessing.
102
 
103
- TelecomGPT-R1 addresses these problems with three RL ingredients.
104
 
105
- 1. DAPO-style stabilization
106
 
107
- We use DAPO-style optimization to make GRPO training more stable on long telecom reasoning traces. After SFT, the model already knows telecom terminology, answer formats, and response style. RL should improve the reasoning policy without destroying that structure. DAPO helps here through training mechanics: it reduces wasted updates from uninformative rollout groups, improves token-level credit assignment for long responses, and keeps policy updates from becoming too aggressive. In practice, this helps prevent format drift, repetitive reasoning, and over-optimization to lucky final answers.
108
 
109
- 2. Difficulty-aware training
110
 
111
- Different telecom skills become learnable at different times. DAPO-style dynamic sampling focuses updates on prompt groups with meaningful reward variation: not groups where every rollout is already correct, and not groups where every rollout fails identically. Combined with difficulty-mined curriculum, this lets slower domains — tables, logs, math, and code keep receiving useful training signal instead of being drowned out by easier QA examples. In this sense, DAPO is not only a stabilizer. It also helps manage asynchronous capability progression across heterogeneous telecom reasoning skills.
112
 
113
- 3. Self-rubric reward
114
 
115
- Final-answer rewards are sparse. Self-rubric reward makes the signal denser and more trustworthy. For different task groups, we accumulate prior experiences, derive with LLM self-analysis or expert reference to define criteria over logic, evidence use, format, key facts, and final correctness. This gives partial credit when the model follows the right reasoning path but misses a local detail. It also reduces guessing and reward hacking: a response that jumps to the right option without using the right standard, table row, log evidence, or equation no longer receives the same credit as a grounded solution.
116
 
117
- ---
118
-
119
- ## 6 — Five Things We Learned
120
 
121
- | # | Lesson | Why it matters |
122
- |:---:|---|---|
123
- | **1** | **Domain knowledge is the biggest bottleneck.** | A strong general reasoner produces well-formed chains operating on **wrong telecom facts**. RL cannot manufacture knowledge that was never in the model. Invest in SFT data curation *first*. |
124
- | **2** | **Self-rubric reward is what makes the model universal.** | Without rubric-decomposed credit, a 27B base produces zero correct rollouts on derivation-heavy axes for hundreds of training steps, and RL gets no gradient. |
125
- | **3** | **Verifier rigor matters as much as reward weights.** | A general verifier (e.g. math verifier directly applied on Telecom Math reasoning) silently rewards lucky digit matches and penalizes correct reasoning in the wrong format. Unit normalization, tolerance bands, symbolic equivalence, and code execution were all as important as choosing the reward weights themselves. |
126
- | **4** | **Difficulty-mined curriculum prevents axis collapse.** | Easy axes (knowledge QA) saturate within hundreds of RL steps; hard axes (math, code, complex logs) keep improving. Without curriculum, easy axes stall the rest. |
127
- | **5** | **Mixing general-domain CoT preserves reasoning style.** | Student-specific reasoning and self-reflective style words are thin or different in CoTs distilled from teacher models. A small mix helps preserving self-correction and reflective reasoning patterns throughout SFT. |
128
 
129
- ---
130
 
 
131
 
 
132
 
133
- ## 7 Toward Telecom's Cognitive Architecture
134
 
135
- *TelecomGPT-R1 is the cognitive core. The next steps stack vertically on top of it perception, action, world model each docked into the brain, not bolted around it.*
136
 
137
- 1. **The cognitive core is reasoning, not retrieval.** Telecom decisions stitch evidence across specs, tables, logs, equations, and code no RAG pipeline can compose this cross-modal evidence into a single chain. TelecomGPT-R1 makes long-CoT reasoning the central primitive over which every other telecom capability can be layered. Everything that follows is plugged into this core, not bolted around it.
138
 
139
- 2. **Senses: docking RF-GPT to read the spectrum.** Today the core reads telecom — specs, tables, logs, code. The substance of telecom, however, is waveforms. [RF-GPT](https://arxiv.org/abs/2602.14833) [[Zou et al., 2026](#cite-rfgpt)], our group's recent foundation model, encodes IQ samples as RF tokens that a decoder-only LLM can natively consume; fusing it with TelecomGPT-R1 yields a single reasoning chain that crosses the protocol–physical boundary — *spectrum capture → standard clause → log evidence → configuration fix*.
140
 
141
- 3. **Hands: agents that act on the network, not just talk about it.** A reasoning core that only produces text is still a passive observer. Wrapped as a tool-using agent invoking simulators, SDR rigs, srsRAN runtimes, OSS/BSS APIs, and digital-twin replays TelecomGPT-R1 becomes an operator: a NOC co-pilot that pulls live KPIs, simulates a config change, and drafts the change ticket end-to-end. The interface is no longer "ask the model"; it is "deploy the model."
142
 
143
- 4. **The long bet: a network that runs its own world model.** Close the loop brain, senses, hands onto a continuously updated *cognitive twin* that the policy can simulate against and reason about counterfactually. Predicting a failure hours before it happens is not a smarter monitor; it is a network that **thinks about itself**. This is the step from *"5G with AI features"* to a telecom architecture that is **categorically** intelligent — and it is the only item on this list we cannot promise; we can only commit to building toward it openly. The same brain–senses–hands–world template should generalize to any heterogeneous infrastructure vertical that reasons over standards, telemetry, and physical signals — but telecom is where it has to be proven first.
144
 
145
  ---
146
 
147
  ### Resources
148
 
149
- - **Paper.** [arXiv link coming soon!]
150
- - **Model weights.** [HuggingFace link coming soon!]
151
  - **Unified benchmark.** [GSMA Open Telco Leaderboard](https://huggingface.co/spaces/GSMA/open-telco-leaderboard)
152
 
153
  ### Citation
@@ -160,6 +140,7 @@ Final-answer rewards are sparse. Self-rubric reward makes the signal denser and
160
  booktitle = {[Venue coming soon!]},
161
  year = {2026}
162
  }
 
163
  @article{zou2025telecomgpt,
164
  title ={Telecomgpt: A framework to build telecom-specific large language models},
165
  author ={Zou, Hang and Zhao, Qiyang and Tian, Yu and Bariah, Lina and Bader, Faouzi and Lestable, Thierry and Debbah, M\'{e}rouane},
@@ -167,19 +148,9 @@ Final-answer rewards are sparse. Self-rubric reward makes the signal denser and
167
  year ={2025},
168
  publisher ={IEEE}
169
  }
170
- @article{zou2026rfgpt,
171
- title = {RF-GPT: Teaching AI to See the Wireless World},
172
- author = {Zou, Hang and Tian, Yu and Wang, Bohao and Bariah, Lina
173
- and Lasaulce, Samson and Huang, Chongwen and Debbah, M\'{e}rouane},
174
- journal = {arXiv preprint arXiv:2602.14833},
175
- year = {2026},
176
- url = {https://arxiv.org/abs/2602.14833}
177
- }
178
 
179
  ```
180
 
181
  ### Acknowledgements
182
 
183
  This work was supported by the Digital Future Institute of Khalifa University; the College of Information Science and Electronic Engineering, Zhejiang University; the College of Computer Science and Technology, Zhejiang University; and the Research Computing team of Khalifa University.
184
-
185
-
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ # TelecomGPT-R1: The Best Telecom-Specific Large Language Model
5
 
6
+ > A 27B open model that ranks **#1 on the GSMA Open Telco Leaderboard** across **all 86 evaluated models** (open or closed, general-purpose or operator-specialized), with an average score of **89.0%**, ahead of every other model on the board.
7
 
8
  ---
9
 
10
  ## 1 — A New State of the Art for Telecom LLMs
11
 
12
+ **TelecomGPT-R1 (27B) ranks #1 on the [GSMA Open Telco Leaderboard](https://huggingface.co/spaces/GSMA/open-telco-leaderboard) at 89.0% average, leading every open-source and closed-source entrant across both general-purpose and operator-specialized categories.** The leaderboard aggregates 7 benchmarks spanning 4 evaluation axes (telecom knowledge QA, 3GPP protocol comprehension, fault and log diagnosis, and RF/network modeling), as reported in Figure 1.
13
 
14
+ - **Among open-source models**, TelecomGPT-R1 leads DeepSeek-V3-0324 (685B) by **+29.7**, LLaMA-3.3-70B by **+34.3**, and Qwen2.5-72B by **+35.0**, while operating at roughly **25× fewer active parameters than the next-best open entrant**.
15
+ - **Among closed-source models**, TelecomGPT-R1 leads both the general-purpose frontier tier and the operator-specialized tier, as detailed in the two bullets below.
16
+ - **Among general-purpose frontier models**, TelecomGPT-R1 leads Gemini-3.1-Pro by **+13.4**, Claude-Opus-4.6 by **+15.7**, and GPT-5 by **+17.1**. These systems sit at the **trillion-parameter-class frontier** (active-parameter counts are not publicly disclosed but are widely reported as orders of magnitude larger than 27B), making the margin a parameter-efficiency result as much as an accuracy result.
17
+ - **Among operator-specialized telecom models**, TelecomGPT-R1 leads AT&T OTel-LLM-8.3B-QnA by **+3.0** (and OTel-LLM is narrow-task trained) and SoftBank LTM by **+15.4** — the **first model, open or closed, to outscore an operator-internal telecom baseline** on the GSMA Open Telco Leaderboard.
18
 
19
+ **In one line: a 27B open specialist beats both trillion-parameter-class generalists and operator-locked verticals on the same public benchmark suite.**
20
 
 
21
 
22
+ ![Figure 1. TelecomGPT-R1 vs frontier closed-source models on the GSMA Open Telco Leaderboard](https://cdn-uploads.huggingface.co/production/uploads/6882f57510e86d9f80580702/1jpJq-UoSFK1GYhwjLA9y.png)
23
+ **Figure 1 | TelecomGPT-R1 vs frontier closed-source models on the GSMA Open Telco Leaderboard.** *Each spoke is one benchmark (plus the overall average), normalized by its per-axis leaderboard best so that `1.0` = best score on that benchmark. Our 27B open-source policy reaches `1.0` on **five of eight axes** (3GPP-TSG, srsRANBench, TeleLogs, TeleTables, Average) and stays at or above `0.95` on every other axis, visibly tracing the outer edge of the radar where no other model, open or closed, matches it on all axes simultaneously.*
 
 
 
24
 
25
 
26
  ---
 
28
 
29
  ## 2 — Toward Universal Telecom Reasoning
30
 
31
+ ### 2.1 — Why telecom needs specialized reasoning models
32
+
33
+ The telecommunications sector does not communicate in a single data language. A practical telecom workflow has to read 3GPP specification clauses written in stilted normative prose, parse RAN logs and PCAPs at the byte level, interpret KPI dashboards as time-series, walk fault trees across multi-vendor subsystems, and close RF/network derivations symbolically. Moreover, many such questions route through specification text, structured telemetry, and physical-layer math in a single chain.
34
+
35
+ Therefore, these tasks demand **complex multi-step reasoning across heterogeneous modalities**, which cannot be reduced to surface retrieval, MCQ classification, or single-axis fact lookup.
36
+
37
+ ### 2.2 — Why existing general-purpose LLMs are not enough
38
+
39
+ Yet until now, general-purpose AI giants have stumbled when confronted with these highly diverse domain-specific data landscapes, despite powerful native reasoning abilities. A strong general reasoner produces well-formed chains operating on wrong telecom facts. RL cannot manufacture knowledge that was never in the model.
40
+
41
+ Therefore, the path forward is to construct dense **telecom-specific domain knowledge** that anchors general reasoning ability onto concrete telecom tasks.
42
+
43
+ ### 2.3 — Why open-source matters compared with closed proprietary models
44
 
45
+ Building a real telecom LLM requires substantial compute, carefully curated multi-modal telecom data, and engineering investment beyond what most academic groups can muster. A handful of operators with the resources to absorb that cost have made attempts (such as AT&T's OTel-LLM-8.3B-QnA and SoftBank's LTM), yet their models remain inaccessible to anyone outside the issuing organization. Most publicly released "telecom AI" stops at narrow extractive baselines (log classifiers, MCQ taggers, RAG retrieval) rather than full-stack reasoning systems.
46
 
47
+ Therefore, the industry needs an **open-source telecom reasoner** that can be:
48
+ - Self-hosted behind an operator's firewall.
49
+ - Run directly on operator-confidential data: RAN logs, PCAP captures, KPI dashboards, customer traffic.
50
+ - Fine-tuned on each operator's proprietary subsystem data.
51
+ - Audited line-by-line for 3GPP / GSMA / O-RAN compliance.
52
+ - Transferred across carriers and equipment vendors without renegotiating an API contract.
53
 
54
+ ### 2.4 What TelecomGPT-R1 improves
55
 
56
+ TelecomGPT-R1 represents a definitive leap forward: a **27B open-weights base** trained to perform **universal reasoning across knowledge QA, 3GPP protocol comprehension, fault/log diagnosis, and RF/network modeling under a single unified policy**. Rather than stitching together specialized heads per task, one model handles the full four-axis surface evaluated by the GSMA Open Telco Leaderboard (producing the leaderboard result reported in §1), while remaining small enough to **self-host, fine-tune, and audit inside an operator environment**.
57
+
58
+ ![Diverse reasoning tasks and data modalities a telecom engineer may encounter in day-to-day work.](https://cdn-uploads.huggingface.co/production/uploads/6882f57510e86d9f80580702/M6SlDTzpx4W6wvAGE0eTp.png)
59
+ **Figure 2 | The four kinds of reasoning a telecom engineer juggles.** *Each scope shows one axis of telecom work (knowledge QA 15.3%, protocol understanding 22.7%, fault analysis 18.5%, modeling & computation 43.5%) and the share of the 158,915-example TelecomGPT-R1 training corpus that targets it. The cross-axis distribution explains why we train one unified policy rather than four specialists: a real workflow mixes all four in the same session.*
60
 
61
  <!-- A telecom engineer's day cuts across four very different kinds of thinking — and a useful AI has to fluidly switch between them:
62
 
 
69
 
70
  ---
71
 
72
+ ## 3 — How We Built TelecomGPT-R1
 
 
 
 
 
 
 
 
 
 
 
73
 
74
+ The challenges in §2 (heterogeneous modalities, missing telecom domain knowledge in general LLMs, and the scarcity of open vertical reasoners) required an end-to-end recipe rather than a single training trick. TelecomGPT-R1 is built on two design pillars.
75
 
76
+ **A single unified telecom-reasoning corpus, not a stack of per-task datasets.** Telecom concepts do not stay in one format: a scheduling rule can appear as prose in a standard, a row in a configuration table, a constraint in an equation, a pattern in a log, or logic inside code. We curate all five source families into one 158,915-example corpus indexed by reasoning axis and train one policy over the whole space, so that cross-modal reasoning is learned jointly rather than glued together at inference time.
77
 
78
+ **A multi-stage post-training procedure that grounds general reasoning in telecom facts.** Supervised fine-tuning installs the telecom "language" (how to read standards, follow protocol constraints, walk a log, close a derivation) that subsequent reinforcement learning then sharpens. Without this grounding step, RL amplifies *fluent wrong reasoning*: well-formed chains that happen to operate on hallucinated 3GPP clauses, mis-read log features, or unit-dropped derivations. The RL stage targets the three failure modes that naïve outcome-reward training suffers on heterogeneous telecom data (sparse final-answer signal, uneven learning progress across axes, and reward gaming via shortcut answers), with the full algorithmic details described in the accompanying paper.
79
 
80
+ The combined effect is what §1 reports: a single 27B open policy that reaches **89.0% average on the GSMA Open Telco Leaderboard**, leading every open-source, frontier-closed, and operator-internal entrant.
81
 
82
+ ![TelecomGPT-R1 end-to-end recipe](https://cdn-uploads.huggingface.co/production/uploads/6882f57510e86d9f80580702/WkwUMMWWFpS1EAJls6jhO.png)
83
+ **Figure 3 | The simplified end-to-end TelecomGPT-R1 recipe.** *Heterogeneous telecom sources → a fine-grained dataset processing pipeline → one unified, axis-indexed corpus of 158,915 examples → supervised fine-tuning of [Qwen3.5-27B](https://huggingface.co/Qwen/Qwen3.5-27B) → experience-pool-differentiated GRPO, yielding the final TelecomGPT-R1 27B policy.*
 
 
 
 
 
 
 
 
 
 
 
 
84
 
85
  ---
86
 
87
+ ## 4KU/DFI's Open Telecom-AI Program
 
 
 
 
 
 
 
 
 
 
88
 
89
+ *TelecomGPT-R1 is the latest milestone in KU/DFI's open telecom-AI program: a focused effort to build auditable, reproducible, and domain-grounded foundation models for the telecom industry. The program started from telecom-language modeling, expanded into RF perception and network world modeling, and now moves toward standards-grounded reasoning for real telecom workflows.*
90
 
91
+ ### Why KU/DFI
92
 
93
+ KU/DFI is positioned to lead open telecom AI because it combines three assets that are rarely found together: world-class wireless research leadership, a dedicated applied-AI institute, and direct engagement with telecom operators, vendors, and standards ecosystems.
94
 
95
+ The program is led by **Prof. Mérouane Debbah**, a leading figure in modern wireless communications whose work spans 4G small cells, 5G Massive MIMO, 6G intelligent surfaces, semantic communications, distributed AI, and foundation models for networks. This gives the program a critical advantage: KU/DFI is not adapting generic AI to telecom from the outside; it is building telecom AI from inside the discipline.
96
 
97
+ The **Digital Future Institute (DFI)** gives this long-running research trajectory an institutional home. Formally launched in January 2026, DFI was created as Khalifa University's applied AI and ICT institute to turn domain-specific foundation models, benchmarks, validation pilots, and deployable AI systems into real operational infrastructure.
98
 
99
+ In less than six months, that mandate has already become visible: KU/DFI has moved from prior telecom-AI research foundations to a coordinated open program spanning telecom-language modeling, RF understanding, network-world modeling, and standards-grounded reasoning. This speed is the central point: DFI did not start from zero; it concentrated years of wireless-AI expertise into an execution platform for open telecom AI.
100
 
101
+ ### What the program has already built
102
 
103
+ - **[Large Generative AI Models for Telecom](https://ieeexplore.ieee.org/abstract/document/10384630?casa_token=jVKA7rjl-TEAAAAA:3INS4yhKTzcYr6sY3Qm4rIaiFxRXQDsFwvB7H3YK7owbKa91StR9QDpO_HNSNGGPxbTFhMUzdJQ)** [Bariah et al., 2023]. Established the original vision that large generative models could become a foundation for self-evolving wireless networks, instead of remaining task-specific optimization tools.
 
 
104
 
105
+ - **[Understanding Telecom Language Through Large Language Models](https://ieeexplore.ieee.org/abstract/document/10437725?casa_token=D-EWLMAo7EMAAAAA:ELTpS6PTAla3oTbjYdt-D6LE68JiPk7YcAW7SwdeobdVqTRWAgFoEfn614NXotYwAwHpAGcF2fw)** [Bariah et al., 2023]. Demonstrated that LLMs can learn telecom standards language, using 3GPP technical documents as an early test case for telecom-domain adaptation.
 
 
 
 
 
 
106
 
107
+ - **[TelecomGPT](https://ieeexplore.ieee.org/abstract/document/11097898)** [Zou et al., 2025]. Built the first major telecom-specific LLM framework from the group, covering telecom standards, RAN logs, mathematical modeling, code tasks, and domain evaluation.
108
 
109
+ - **[Seeing Radio](https://arxiv.org/abs/2601.13157)** [Zou et al., 2026]. Opened the RF-perception direction by showing that wireless signals can be converted into interpretable visual representations for multimodal AI models.
110
 
111
+ - **[RF-GPT](https://arxiv.org/abs/2602.14833)** [Zou et al., 2026]. Delivered the program's first open RF foundation model, enabling LLM-style reasoning over RF spectrograms and wireless-spectrum scenes.
112
 
113
+ - **[Telecom World Models](https://arxiv.org/abs/2604.06882)** [Zou et al., 2026]. Proposed a world-model architecture that unifies digital twins, foundation models, uncertainty-aware prediction, and action-conditioned planning for 6G networks.
114
 
115
+ - **[RF-Analyzer](https://arxiv.org/abs/2605.04676)** [Bara et al., 2026]. Built an SDR-to-AI evaluation platform to test whether VLMs trained on synthetic RF spectrograms can generalize to real over-the-air wireless environments.
116
 
117
+ - **TelecomGPT-R1** [this work, 2026]. Extends the program from telecom knowledge and RF perception to standards-grounded reasoning, producing an open telecom reasoning model for verifiable decision support.
118
 
119
+ ### The open-program thesis
120
 
121
+ The core logic is simple: telecom AI cannot be led by closed models alone. Operators, vendors, regulators, and standards bodies need systems that can be inspected, benchmarked, reproduced, adapted, and deployed under real telecom constraints.
122
 
123
+ KU/DFI's role is to build that open commons. The program now spans the key layers of the future telecom-AI stack: telecom language, RF perception, network-world modeling, and reasoning. **TelecomGPT-R1 is therefore a starting point, not an endpoint: the beginning of an open, full-stack telecom-AI foundation that the wider industry can audit, improve, and build upon.**
124
 
125
  ---
126
 
127
  ### Resources
128
 
129
+ - **Paper.** [Coming soon!]
130
+ - **Model weights.** [KU-DFI/TelecomGPT-R1](https://huggingface.co/KU-DFI/TelecomGPT-R1/tree/main)
131
  - **Unified benchmark.** [GSMA Open Telco Leaderboard](https://huggingface.co/spaces/GSMA/open-telco-leaderboard)
132
 
133
  ### Citation
 
140
  booktitle = {[Venue coming soon!]},
141
  year = {2026}
142
  }
143
+
144
  @article{zou2025telecomgpt,
145
  title ={Telecomgpt: A framework to build telecom-specific large language models},
146
  author ={Zou, Hang and Zhao, Qiyang and Tian, Yu and Bariah, Lina and Bader, Faouzi and Lestable, Thierry and Debbah, M\'{e}rouane},
 
148
  year ={2025},
149
  publisher ={IEEE}
150
  }
 
 
 
 
 
 
 
 
151
 
152
  ```
153
 
154
  ### Acknowledgements
155
 
156
  This work was supported by the Digital Future Institute of Khalifa University; the College of Information Science and Electronic Engineering, Zhejiang University; the College of Computer Science and Technology, Zhejiang University; and the Research Computing team of Khalifa University.