| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - Skylion007/openwebtext |
| | language: |
| | - en |
| | pipeline_tag: text-generation |
| | tags: |
| | - research |
| | - convolutional |
| | - fft |
| | - transformer-alternative |
| | - causal-lm |
| | --- |
| | |
| | # GCLM — Global Convolutional Language Model |
| |
|
| | ## Model Summary |
| |
|
| | **GCLM (Global Convolutional Language Model)** is an experimental causal language model that replaces traditional self-attention with a hybrid **local + global convolutional architecture**. |
| |
|
| | Instead of attention heads, GCLM uses: |
| | - **Local depthwise convolutions** for short-range context |
| | - **FFT-based global convolutions** for long-range sequence modeling |
| |
|
| | This design explores whether **global receptive fields** can be achieved efficiently *without* quadratic attention, while remaining compatible with standard autoregressive language modeling. |
| |
|
| | > GCLM is a transformer alternative — not a transformer replacement. |
| |
|
| | --- |
| |
|
| | ## Architecture Overview |
| |
|
| | - Token + learned positional embeddings |
| | - Stacked convolutional blocks: |
| | - Local depthwise + pointwise convolution |
| | - Optional global FFT convolution every *N* layers |
| | - Feedforward MLP |
| | - Residual connections + LayerNorm |
| | - Causal language modeling head |
| |
|
| | **Key properties:** |
| | - No attention mechanism |
| | - No KV cache |
| | - Linear memory scaling with sequence length |
| | - Extremely long-context friendly (tested up to 8k+ tokens) |
| |
|
| | --- |
| |
|
| | ## Training Data |
| |
|
| | The model was trained on: |
| | - **Skylion007/openwebtext** |
| |
|
| | This dataset contains raw, unfiltered internet text and may include biased, incorrect, or unsafe content. |
| |
|
| | --- |
| |
|
| | ## Intended Use |
| |
|
| | **Primary use cases:** |
| | - Research into transformer alternatives |
| | - Long-context modeling experiments |
| | - Architectural ablation studies |
| | - Educational exploration of non-attention sequence models |
| |
|
| | **Not intended for:** |
| | - Safety-critical applications |
| | - Medical, legal, or financial advice |
| | - Deployment as a production chatbot without additional alignment work |
| |
|
| | --- |
| |
|
| | ## Limitations |
| |
|
| | - This model is **research-grade**, not instruction-tuned |
| | - Outputs may be: |
| | - Incoherent |
| | - Factually incorrect |
| | - Biased or unsafe |
| | - Performance characteristics differ significantly from transformer LMs |
| | - No reinforcement learning or alignment tuning applied |
| |
|
| | --- |
| |
|
| | ## Ethical Considerations |
| |
|
| | GCLM was trained on publicly available web data and may reflect societal biases present in that data. |
| |
|
| | Users are responsible for: |
| | - Applying appropriate filtering |
| | - Avoiding harmful or misleading use cases |
| | - Evaluating outputs critically |
| |
|
| | --- |
| |
|
| | ## License |
| |
|
| | This model is released under the **Apache License 2.0**. |
| |
|
| | You are free to: |
| | - Use |
| | - Modify |
| | - Distribute |
| | - Use commercially |
| |
|
| | Attribution and license preservation are required. |
| | Patent rights are explicitly granted under this license. |
| |
|
| | --- |
| |
|
| | ## Citation |
| |
|
| | If you use GCLM in your research, please cite or reference the project. |
| |
|
| |
|
| | ## Important |
| |
|
| | The model will not be put in the repo until it has finished training. |