Commit ·
2c4e721
1
Parent(s): ef8264e
upload v2
Browse files
README.md
CHANGED
|
@@ -1,27 +1,26 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
-
# TelecomGPT-R1: The Best
|
| 5 |
|
| 6 |
-
> A 27B open model that ranks **#1 on the GSMA Open Telco Leaderboard**
|
| 7 |
|
| 8 |
---
|
| 9 |
|
| 10 |
## 1 — A New State of the Art for Telecom LLMs
|
| 11 |
|
| 12 |
-
**TelecomGPT-R1
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
- Outperforming General Domain Giants – In head-to-head match-ups against GPT-5, our 27B open policy wins 6 out of 7 benchmarks.
|
| 19 |
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-

|
| 24 |
-
**Figure 1 | TelecomGPT-R1 vs frontier closed-source models on the GSMA Open Telco Leaderboard.** *Each spoke is one benchmark (plus the overall average), normalized by its per-axis leaderboard best so that `1.0` = best score on that benchmark. Our 27B open-source policy reaches `1.0` on four of eight axes (3GPP-TSG, TeleLogs, TeleTables, Average) and stays at or above `0.89` on every other axis — visibly tracing the outer edge of the radar where no other model can match it on all axes simultaneously.*
|
| 25 |
|
| 26 |
|
| 27 |
---
|
|
@@ -29,15 +28,35 @@ On the public **[GSMA Open Telco Leaderboard](https://huggingface.co/spaces/GSMA
|
|
| 29 |
|
| 30 |
## 2 — Toward Universal Telecom Reasoning
|
| 31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
-
|
| 34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
-
|
| 37 |
|
| 38 |
-
TelecomGPT-R1 represents a definitive leap forward
|
| 39 |
-
|
| 40 |
-
|
|
|
|
| 41 |
|
| 42 |
<!-- A telecom engineer's day cuts across four very different kinds of thinking — and a useful AI has to fluidly switch between them:
|
| 43 |
|
|
@@ -50,104 +69,65 @@ TelecomGPT-R1 represents a definitive leap forward. Built on top of an open-sour
|
|
| 50 |
|
| 51 |
---
|
| 52 |
|
| 53 |
-
## 3 — How We
|
| 54 |
-
|
| 55 |
-
To train an unified model capable of navigating through this diverse data landscape, we had to rethink both data curation and post-training choices. The resulting recipe rests on two foundational design decisions:
|
| 56 |
-
|
| 57 |
-
1) Instead of training separate models or disconnected datasets for standards QA, logs, tables, math, and code, we curate all sources into a single unified telecom reasoning corpus and train one policy over the whole space. This matters because telecom concepts do not stay inside one format. A scheduling rule may appear as prose in a standard, as a row in a configuration table, as a constraint in an equation, as a pattern in logs, or as logic inside code. TelecomGPT-R1 is trained on a 158,915-example unified corpus constructed through an eight-step pipeline. Each example is converted into the same chat format, tagged by reasoning axis and source type, verified with task-specific checks, and prepared for both supervised fine-tuning (SFT) and reinforcement learning (RL).
|
| 58 |
-
|
| 59 |
-
2) We post-train with SFT followed by a **three-pillar RL recipe** that combines Dynamic sAmpling Policy Optimization (DAPO) for stable training of diverse tasks and data types, a difficulty-mined multi-stage curriculum learning, and dense reward signals from self-rubric on highly complex, derivation-heavy tasks.
|
| 60 |
-
|
| 61 |
-

|
| 62 |
-
**Figure 3 | The TelecomGPT-R1 three-stage post-training recipe.** *Stage ① curates heterogeneous telecom sources through an eight-step pipeline into one axis-indexed 158,915-example corpus. Stage ② installs cross-axis long-CoT reasoning on [Qwen3.5-27B](https://huggingface.co/Qwen/Qwen3.5-27B) via LoRA-SFT. Stage ③ is combined of **DAPO** which stabilizes the gradient, a **difficulty-mined curriculum** advances the prompt distribution from easy to hard, and a **self-rubric reward** — rubrics generated by LLM or projected from the expert reference, then scored as a decomposed sum of per-rubric binary indicators — densifies the sparse 0/1 outcome signal, yielding the final TelecomGPT-R1 27B policy.*
|
| 63 |
-
|
| 64 |
-
---
|
| 65 |
|
| 66 |
-
|
| 67 |
|
| 68 |
-
|
| 69 |
|
| 70 |
-
|
| 71 |
|
| 72 |
-
|
| 73 |
|
| 74 |
-
|
| 75 |
-
|---
|
| 76 |
-
| **S1** — Source-grounded extraction | Modality-specific extractors (AST for code, VLM PDF parsing for textbooks, working-group label projection for specs, row-window slicing for tables, formula masking for math papers, engineering-feature aggregation for raw logs) | Converts heterogeneous telecom data into a common schema. |
|
| 77 |
-
| **S2** — Long-CoT generation | Three trace generators chosen by reasoning type: multiple teacher LLMs (with self-validation) for QA, executable-Python-grounded CoT for derivations, deterministic rule-replay CoT for diagnosis | Right tool for each reasoning type — not one teacher for everything. |
|
| 78 |
-
| **S3** — Multi-pass verification | Axis-matched verifiers (exact match / unit-tolerant numeric closeness / rule-replay accuracy / on-policy re-answering) | Filters incorrect or ungrounded reasoning before training. |
|
| 79 |
-
| **S4** — Augmentation | Variable resampling 5×–20×; prefix/suffix decomposition into intermediate-target + final-target pairs | Expands diversified data coverage while preserving structured reasoning. |
|
| 80 |
-
| **S5** — Leakage prevention | Cross-benchmark dedup vs. all public eval splits; SHA-256-archived test sets | Ensures leaderboard gains reflect learned capability rather than benchmark contamination. |
|
| 81 |
-
| **S6** — Difficulty stratification | Estimate example difficulty using model pass rates and verifier outcomes. | Provides the difficulty signal later used by the RL curriculum. |
|
| 82 |
-
| **S7** — Format unification | One `{system, user, assistant}` chat schema; fixed answer-format vocabulary; `meta.axis` and `meta.source_track` tags on every row | Makes the corpus trainable, searchable, reweightable, and ablatable as one whole. |
|
| 83 |
-
| **S8** — Reasoning-style mixing | Mix in a small amount of general long-reasoning data from student before SFT. | Preserves self-correction and reflective reasoning patterns.|
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-

|
| 87 |
-
**Figure 4 | Composition of the 158,915-example unified telecom training corpus.** *Five source families on the outer ring fold into the four reasoning axes of Figure 2; the middle ring breaks each family into its sub-corpora, and the inner radial bars encode per-corpus row counts on a log scale.*
|
| 88 |
|
| 89 |
---
|
| 90 |
|
| 91 |
-
##
|
| 92 |
-
|
| 93 |
-
We first perform supervised fine-tuning (LoRA adapters on a Qwen3.5 27B). This stage teaches the base model to speak the language of telecom reasoning: how to interpret standards, follow protocol constraints, reason over tables, analyze logs, solve derivations, and produce structured answers. This stage is critical because vertical-domain RL cannot create missing knowledge from nothing. A strong base model may know how to reason in general, but if it lacks the relevant telecom facts and conventions, RL can amplify fluent wrong reasoning.
|
| 94 |
-
|
| 95 |
-
Reinforcement learning is appealing for telecom because many tasks have clear correctness signals: a table question has a right option, a derivation has a right result, and a code problem either satisfies the expected behavior or not. But naïve RL fails quickly in this setting.
|
| 96 |
-
|
| 97 |
-
The first issue is sparse feedback. Telecom problems are structured but unforgiving: one wrong unit, condition, protocol branch, or table row can make the final answer wrong, even if most of the reasoning is useful. A pure final-answer reward turns these cases into zeros and gives the model little guidance about what went right.
|
| 98 |
-
|
| 99 |
-
The second issue is uneven difficulty and learning progress across domains. In unified training, knowledge QA may improve early, while table reasoning, log analysis, math derivation, and code understanding often need much longer. If all domains are trained uniformly, the training will be long and inefficient.
|
| 100 |
-
|
| 101 |
-
The third issue is shortcut learning. On benchmark-style tasks, a model can sometimes guess from answer priors, exploit formatting artifacts, or produce plausible explanations without using the right telecom evidence. For a domain model, this is unacceptable: we want grounded reasoning, not better guessing.
|
| 102 |
|
| 103 |
-
TelecomGPT-R1
|
| 104 |
|
| 105 |
-
|
| 106 |
|
| 107 |
-
|
| 108 |
|
| 109 |
-
|
| 110 |
|
| 111 |
-
|
| 112 |
|
| 113 |
-
|
| 114 |
|
| 115 |
-
|
| 116 |
|
| 117 |
-
---
|
| 118 |
-
|
| 119 |
-
## 6 — Five Things We Learned
|
| 120 |
|
| 121 |
-
|
| 122 |
-
|:---:|---|---|
|
| 123 |
-
| **1** | **Domain knowledge is the biggest bottleneck.** | A strong general reasoner produces well-formed chains operating on **wrong telecom facts**. RL cannot manufacture knowledge that was never in the model. Invest in SFT data curation *first*. |
|
| 124 |
-
| **2** | **Self-rubric reward is what makes the model universal.** | Without rubric-decomposed credit, a 27B base produces zero correct rollouts on derivation-heavy axes for hundreds of training steps, and RL gets no gradient. |
|
| 125 |
-
| **3** | **Verifier rigor matters as much as reward weights.** | A general verifier (e.g. math verifier directly applied on Telecom Math reasoning) silently rewards lucky digit matches and penalizes correct reasoning in the wrong format. Unit normalization, tolerance bands, symbolic equivalence, and code execution were all as important as choosing the reward weights themselves. |
|
| 126 |
-
| **4** | **Difficulty-mined curriculum prevents axis collapse.** | Easy axes (knowledge QA) saturate within hundreds of RL steps; hard axes (math, code, complex logs) keep improving. Without curriculum, easy axes stall the rest. |
|
| 127 |
-
| **5** | **Mixing general-domain CoT preserves reasoning style.** | Student-specific reasoning and self-reflective style words are thin or different in CoTs distilled from teacher models. A small mix helps preserving self-correction and reflective reasoning patterns throughout SFT. |
|
| 128 |
|
| 129 |
-
--
|
| 130 |
|
|
|
|
| 131 |
|
|
|
|
| 132 |
|
| 133 |
-
|
| 134 |
|
| 135 |
-
*
|
| 136 |
|
| 137 |
-
|
| 138 |
|
| 139 |
-
|
| 140 |
|
| 141 |
-
|
| 142 |
|
| 143 |
-
|
| 144 |
|
| 145 |
---
|
| 146 |
|
| 147 |
### Resources
|
| 148 |
|
| 149 |
-
- **Paper.** [
|
| 150 |
-
- **Model weights.** [
|
| 151 |
- **Unified benchmark.** [GSMA Open Telco Leaderboard](https://huggingface.co/spaces/GSMA/open-telco-leaderboard)
|
| 152 |
|
| 153 |
### Citation
|
|
@@ -160,6 +140,7 @@ Final-answer rewards are sparse. Self-rubric reward makes the signal denser and
|
|
| 160 |
booktitle = {[Venue coming soon!]},
|
| 161 |
year = {2026}
|
| 162 |
}
|
|
|
|
| 163 |
@article{zou2025telecomgpt,
|
| 164 |
title ={Telecomgpt: A framework to build telecom-specific large language models},
|
| 165 |
author ={Zou, Hang and Zhao, Qiyang and Tian, Yu and Bariah, Lina and Bader, Faouzi and Lestable, Thierry and Debbah, M\'{e}rouane},
|
|
@@ -167,19 +148,9 @@ Final-answer rewards are sparse. Self-rubric reward makes the signal denser and
|
|
| 167 |
year ={2025},
|
| 168 |
publisher ={IEEE}
|
| 169 |
}
|
| 170 |
-
@article{zou2026rfgpt,
|
| 171 |
-
title = {RF-GPT: Teaching AI to See the Wireless World},
|
| 172 |
-
author = {Zou, Hang and Tian, Yu and Wang, Bohao and Bariah, Lina
|
| 173 |
-
and Lasaulce, Samson and Huang, Chongwen and Debbah, M\'{e}rouane},
|
| 174 |
-
journal = {arXiv preprint arXiv:2602.14833},
|
| 175 |
-
year = {2026},
|
| 176 |
-
url = {https://arxiv.org/abs/2602.14833}
|
| 177 |
-
}
|
| 178 |
|
| 179 |
```
|
| 180 |
|
| 181 |
### Acknowledgements
|
| 182 |
|
| 183 |
This work was supported by the Digital Future Institute of Khalifa University; the College of Information Science and Electronic Engineering, Zhejiang University; the College of Computer Science and Technology, Zhejiang University; and the Research Computing team of Khalifa University.
|
| 184 |
-
|
| 185 |
-
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
+
# TelecomGPT-R1: The Best Telecom-Specific Large Language Model
|
| 5 |
|
| 6 |
+
> A 27B open model that ranks **#1 on the GSMA Open Telco Leaderboard** across **all 86 evaluated models** (open or closed, general-purpose or operator-specialized), with an average score of **89.0%**, ahead of every other model on the board.
|
| 7 |
|
| 8 |
---
|
| 9 |
|
| 10 |
## 1 — A New State of the Art for Telecom LLMs
|
| 11 |
|
| 12 |
+
**TelecomGPT-R1 (27B) ranks #1 on the [GSMA Open Telco Leaderboard](https://huggingface.co/spaces/GSMA/open-telco-leaderboard) at 89.0% average, leading every open-source and closed-source entrant across both general-purpose and operator-specialized categories.** The leaderboard aggregates 7 benchmarks spanning 4 evaluation axes (telecom knowledge QA, 3GPP protocol comprehension, fault and log diagnosis, and RF/network modeling), as reported in Figure 1.
|
| 13 |
|
| 14 |
+
- **Among open-source models**, TelecomGPT-R1 leads DeepSeek-V3-0324 (685B) by **+29.7**, LLaMA-3.3-70B by **+34.3**, and Qwen2.5-72B by **+35.0**, while operating at roughly **25× fewer active parameters than the next-best open entrant**.
|
| 15 |
+
- **Among closed-source models**, TelecomGPT-R1 leads both the general-purpose frontier tier and the operator-specialized tier, as detailed in the two bullets below.
|
| 16 |
+
- **Among general-purpose frontier models**, TelecomGPT-R1 leads Gemini-3.1-Pro by **+13.4**, Claude-Opus-4.6 by **+15.7**, and GPT-5 by **+17.1**. These systems sit at the **trillion-parameter-class frontier** (active-parameter counts are not publicly disclosed but are widely reported as orders of magnitude larger than 27B), making the margin a parameter-efficiency result as much as an accuracy result.
|
| 17 |
+
- **Among operator-specialized telecom models**, TelecomGPT-R1 leads AT&T OTel-LLM-8.3B-QnA by **+3.0** (and OTel-LLM is narrow-task trained) and SoftBank LTM by **+15.4** — the **first model, open or closed, to outscore an operator-internal telecom baseline** on the GSMA Open Telco Leaderboard.
|
| 18 |
|
| 19 |
+
**In one line: a 27B open specialist beats both trillion-parameter-class generalists and operator-locked verticals on the same public benchmark suite.**
|
| 20 |
|
|
|
|
| 21 |
|
| 22 |
+

|
| 23 |
+
**Figure 1 | TelecomGPT-R1 vs frontier closed-source models on the GSMA Open Telco Leaderboard.** *Each spoke is one benchmark (plus the overall average), normalized by its per-axis leaderboard best so that `1.0` = best score on that benchmark. Our 27B open-source policy reaches `1.0` on **five of eight axes** (3GPP-TSG, srsRANBench, TeleLogs, TeleTables, Average) and stays at or above `0.95` on every other axis, visibly tracing the outer edge of the radar where no other model, open or closed, matches it on all axes simultaneously.*
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
|
| 26 |
---
|
|
|
|
| 28 |
|
| 29 |
## 2 — Toward Universal Telecom Reasoning
|
| 30 |
|
| 31 |
+
### 2.1 — Why telecom needs specialized reasoning models
|
| 32 |
+
|
| 33 |
+
The telecommunications sector does not communicate in a single data language. A practical telecom workflow has to read 3GPP specification clauses written in stilted normative prose, parse RAN logs and PCAPs at the byte level, interpret KPI dashboards as time-series, walk fault trees across multi-vendor subsystems, and close RF/network derivations symbolically. Moreover, many such questions route through specification text, structured telemetry, and physical-layer math in a single chain.
|
| 34 |
+
|
| 35 |
+
Therefore, these tasks demand **complex multi-step reasoning across heterogeneous modalities**, which cannot be reduced to surface retrieval, MCQ classification, or single-axis fact lookup.
|
| 36 |
+
|
| 37 |
+
### 2.2 — Why existing general-purpose LLMs are not enough
|
| 38 |
+
|
| 39 |
+
Yet until now, general-purpose AI giants have stumbled when confronted with these highly diverse domain-specific data landscapes, despite powerful native reasoning abilities. A strong general reasoner produces well-formed chains operating on wrong telecom facts. RL cannot manufacture knowledge that was never in the model.
|
| 40 |
+
|
| 41 |
+
Therefore, the path forward is to construct dense **telecom-specific domain knowledge** that anchors general reasoning ability onto concrete telecom tasks.
|
| 42 |
+
|
| 43 |
+
### 2.3 — Why open-source matters compared with closed proprietary models
|
| 44 |
|
| 45 |
+
Building a real telecom LLM requires substantial compute, carefully curated multi-modal telecom data, and engineering investment beyond what most academic groups can muster. A handful of operators with the resources to absorb that cost have made attempts (such as AT&T's OTel-LLM-8.3B-QnA and SoftBank's LTM), yet their models remain inaccessible to anyone outside the issuing organization. Most publicly released "telecom AI" stops at narrow extractive baselines (log classifiers, MCQ taggers, RAG retrieval) rather than full-stack reasoning systems.
|
| 46 |
|
| 47 |
+
Therefore, the industry needs an **open-source telecom reasoner** that can be:
|
| 48 |
+
- Self-hosted behind an operator's firewall.
|
| 49 |
+
- Run directly on operator-confidential data: RAN logs, PCAP captures, KPI dashboards, customer traffic.
|
| 50 |
+
- Fine-tuned on each operator's proprietary subsystem data.
|
| 51 |
+
- Audited line-by-line for 3GPP / GSMA / O-RAN compliance.
|
| 52 |
+
- Transferred across carriers and equipment vendors without renegotiating an API contract.
|
| 53 |
|
| 54 |
+
### 2.4 — What TelecomGPT-R1 improves
|
| 55 |
|
| 56 |
+
TelecomGPT-R1 represents a definitive leap forward: a **27B open-weights base** trained to perform **universal reasoning across knowledge QA, 3GPP protocol comprehension, fault/log diagnosis, and RF/network modeling under a single unified policy**. Rather than stitching together specialized heads per task, one model handles the full four-axis surface evaluated by the GSMA Open Telco Leaderboard (producing the leaderboard result reported in §1), while remaining small enough to **self-host, fine-tune, and audit inside an operator environment**.
|
| 57 |
+
|
| 58 |
+

|
| 59 |
+
**Figure 2 | The four kinds of reasoning a telecom engineer juggles.** *Each scope shows one axis of telecom work (knowledge QA 15.3%, protocol understanding 22.7%, fault analysis 18.5%, modeling & computation 43.5%) and the share of the 158,915-example TelecomGPT-R1 training corpus that targets it. The cross-axis distribution explains why we train one unified policy rather than four specialists: a real workflow mixes all four in the same session.*
|
| 60 |
|
| 61 |
<!-- A telecom engineer's day cuts across four very different kinds of thinking — and a useful AI has to fluidly switch between them:
|
| 62 |
|
|
|
|
| 69 |
|
| 70 |
---
|
| 71 |
|
| 72 |
+
## 3 — How We Built TelecomGPT-R1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
|
| 74 |
+
The challenges in §2 (heterogeneous modalities, missing telecom domain knowledge in general LLMs, and the scarcity of open vertical reasoners) required an end-to-end recipe rather than a single training trick. TelecomGPT-R1 is built on two design pillars.
|
| 75 |
|
| 76 |
+
**A single unified telecom-reasoning corpus, not a stack of per-task datasets.** Telecom concepts do not stay in one format: a scheduling rule can appear as prose in a standard, a row in a configuration table, a constraint in an equation, a pattern in a log, or logic inside code. We curate all five source families into one 158,915-example corpus indexed by reasoning axis and train one policy over the whole space, so that cross-modal reasoning is learned jointly rather than glued together at inference time.
|
| 77 |
|
| 78 |
+
**A multi-stage post-training procedure that grounds general reasoning in telecom facts.** Supervised fine-tuning installs the telecom "language" (how to read standards, follow protocol constraints, walk a log, close a derivation) that subsequent reinforcement learning then sharpens. Without this grounding step, RL amplifies *fluent wrong reasoning*: well-formed chains that happen to operate on hallucinated 3GPP clauses, mis-read log features, or unit-dropped derivations. The RL stage targets the three failure modes that naïve outcome-reward training suffers on heterogeneous telecom data (sparse final-answer signal, uneven learning progress across axes, and reward gaming via shortcut answers), with the full algorithmic details described in the accompanying paper.
|
| 79 |
|
| 80 |
+
The combined effect is what §1 reports: a single 27B open policy that reaches **89.0% average on the GSMA Open Telco Leaderboard**, leading every open-source, frontier-closed, and operator-internal entrant.
|
| 81 |
|
| 82 |
+

|
| 83 |
+
**Figure 3 | The simplified end-to-end TelecomGPT-R1 recipe.** *Heterogeneous telecom sources → a fine-grained dataset processing pipeline → one unified, axis-indexed corpus of 158,915 examples → supervised fine-tuning of [Qwen3.5-27B](https://huggingface.co/Qwen/Qwen3.5-27B) → experience-pool-differentiated GRPO, yielding the final TelecomGPT-R1 27B policy.*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
|
| 85 |
---
|
| 86 |
|
| 87 |
+
## 4 — KU/DFI's Open Telecom-AI Program
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
|
| 89 |
+
*TelecomGPT-R1 is the latest milestone in KU/DFI's open telecom-AI program: a focused effort to build auditable, reproducible, and domain-grounded foundation models for the telecom industry. The program started from telecom-language modeling, expanded into RF perception and network world modeling, and now moves toward standards-grounded reasoning for real telecom workflows.*
|
| 90 |
|
| 91 |
+
### Why KU/DFI
|
| 92 |
|
| 93 |
+
KU/DFI is positioned to lead open telecom AI because it combines three assets that are rarely found together: world-class wireless research leadership, a dedicated applied-AI institute, and direct engagement with telecom operators, vendors, and standards ecosystems.
|
| 94 |
|
| 95 |
+
The program is led by **Prof. Mérouane Debbah**, a leading figure in modern wireless communications whose work spans 4G small cells, 5G Massive MIMO, 6G intelligent surfaces, semantic communications, distributed AI, and foundation models for networks. This gives the program a critical advantage: KU/DFI is not adapting generic AI to telecom from the outside; it is building telecom AI from inside the discipline.
|
| 96 |
|
| 97 |
+
The **Digital Future Institute (DFI)** gives this long-running research trajectory an institutional home. Formally launched in January 2026, DFI was created as Khalifa University's applied AI and ICT institute to turn domain-specific foundation models, benchmarks, validation pilots, and deployable AI systems into real operational infrastructure.
|
| 98 |
|
| 99 |
+
In less than six months, that mandate has already become visible: KU/DFI has moved from prior telecom-AI research foundations to a coordinated open program spanning telecom-language modeling, RF understanding, network-world modeling, and standards-grounded reasoning. This speed is the central point: DFI did not start from zero; it concentrated years of wireless-AI expertise into an execution platform for open telecom AI.
|
| 100 |
|
| 101 |
+
### What the program has already built
|
| 102 |
|
| 103 |
+
- **[Large Generative AI Models for Telecom](https://ieeexplore.ieee.org/abstract/document/10384630?casa_token=jVKA7rjl-TEAAAAA:3INS4yhKTzcYr6sY3Qm4rIaiFxRXQDsFwvB7H3YK7owbKa91StR9QDpO_HNSNGGPxbTFhMUzdJQ)** [Bariah et al., 2023]. Established the original vision that large generative models could become a foundation for self-evolving wireless networks, instead of remaining task-specific optimization tools.
|
|
|
|
|
|
|
| 104 |
|
| 105 |
+
- **[Understanding Telecom Language Through Large Language Models](https://ieeexplore.ieee.org/abstract/document/10437725?casa_token=D-EWLMAo7EMAAAAA:ELTpS6PTAla3oTbjYdt-D6LE68JiPk7YcAW7SwdeobdVqTRWAgFoEfn614NXotYwAwHpAGcF2fw)** [Bariah et al., 2023]. Demonstrated that LLMs can learn telecom standards language, using 3GPP technical documents as an early test case for telecom-domain adaptation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 106 |
|
| 107 |
+
- **[TelecomGPT](https://ieeexplore.ieee.org/abstract/document/11097898)** [Zou et al., 2025]. Built the first major telecom-specific LLM framework from the group, covering telecom standards, RAN logs, mathematical modeling, code tasks, and domain evaluation.
|
| 108 |
|
| 109 |
+
- **[Seeing Radio](https://arxiv.org/abs/2601.13157)** [Zou et al., 2026]. Opened the RF-perception direction by showing that wireless signals can be converted into interpretable visual representations for multimodal AI models.
|
| 110 |
|
| 111 |
+
- **[RF-GPT](https://arxiv.org/abs/2602.14833)** [Zou et al., 2026]. Delivered the program's first open RF foundation model, enabling LLM-style reasoning over RF spectrograms and wireless-spectrum scenes.
|
| 112 |
|
| 113 |
+
- **[Telecom World Models](https://arxiv.org/abs/2604.06882)** [Zou et al., 2026]. Proposed a world-model architecture that unifies digital twins, foundation models, uncertainty-aware prediction, and action-conditioned planning for 6G networks.
|
| 114 |
|
| 115 |
+
- **[RF-Analyzer](https://arxiv.org/abs/2605.04676)** [Bara et al., 2026]. Built an SDR-to-AI evaluation platform to test whether VLMs trained on synthetic RF spectrograms can generalize to real over-the-air wireless environments.
|
| 116 |
|
| 117 |
+
- **TelecomGPT-R1** [this work, 2026]. Extends the program from telecom knowledge and RF perception to standards-grounded reasoning, producing an open telecom reasoning model for verifiable decision support.
|
| 118 |
|
| 119 |
+
### The open-program thesis
|
| 120 |
|
| 121 |
+
The core logic is simple: telecom AI cannot be led by closed models alone. Operators, vendors, regulators, and standards bodies need systems that can be inspected, benchmarked, reproduced, adapted, and deployed under real telecom constraints.
|
| 122 |
|
| 123 |
+
KU/DFI's role is to build that open commons. The program now spans the key layers of the future telecom-AI stack: telecom language, RF perception, network-world modeling, and reasoning. **TelecomGPT-R1 is therefore a starting point, not an endpoint: the beginning of an open, full-stack telecom-AI foundation that the wider industry can audit, improve, and build upon.**
|
| 124 |
|
| 125 |
---
|
| 126 |
|
| 127 |
### Resources
|
| 128 |
|
| 129 |
+
- **Paper.** [Coming soon!]
|
| 130 |
+
- **Model weights.** [KU-DFI/TelecomGPT-R1](https://huggingface.co/KU-DFI/TelecomGPT-R1/tree/main)
|
| 131 |
- **Unified benchmark.** [GSMA Open Telco Leaderboard](https://huggingface.co/spaces/GSMA/open-telco-leaderboard)
|
| 132 |
|
| 133 |
### Citation
|
|
|
|
| 140 |
booktitle = {[Venue coming soon!]},
|
| 141 |
year = {2026}
|
| 142 |
}
|
| 143 |
+
|
| 144 |
@article{zou2025telecomgpt,
|
| 145 |
title ={Telecomgpt: A framework to build telecom-specific large language models},
|
| 146 |
author ={Zou, Hang and Zhao, Qiyang and Tian, Yu and Bariah, Lina and Bader, Faouzi and Lestable, Thierry and Debbah, M\'{e}rouane},
|
|
|
|
| 148 |
year ={2025},
|
| 149 |
publisher ={IEEE}
|
| 150 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 151 |
|
| 152 |
```
|
| 153 |
|
| 154 |
### Acknowledgements
|
| 155 |
|
| 156 |
This work was supported by the Digital Future Institute of Khalifa University; the College of Information Science and Electronic Engineering, Zhejiang University; the College of Computer Science and Technology, Zhejiang University; and the Research Computing team of Khalifa University.
|
|
|
|
|
|