| --- |
| language: |
| - en |
| pipeline_tag: text-generation |
| tags: |
| - research |
| - convolutional |
| - fft |
| - transformer-alternative |
| - causal-lm |
| --- |
| <img src="ChatGCLM.png"> |
|
|
| # GCLM — Global Convolutional Language Model |
|
|
| ## Model Summary |
|
|
| **GCLM (Global Convolutional Language Model)** is an experimental causal language model that replaces traditional self-attention with a hybrid **local + global convolutional architecture**. |
|
|
| Instead of attention heads, GCLM uses: |
| - **Local depthwise convolutions** for short-range context |
| - **FFT-based global convolutions** for long-range sequence modeling |
|
|
| This design explores whether **global receptive fields** can be achieved efficiently *without* quadratic attention, while remaining compatible with standard autoregressive language modeling. |
|
|
| > GCLM is a transformer alternative — not a transformer replacement. |
|
|
| --- |
|
|
| ## Architecture Overview |
|
|
| - Token + learned positional embeddings |
| - Stacked convolutional blocks: |
| - Local depthwise + pointwise convolution |
| - Optional global FFT convolution every *N* layers |
| - Feedforward MLP |
| - Residual connections + LayerNorm |
| - Causal language modeling head |
|
|
| **Key properties:** |
| - No attention mechanism |
| - No KV cache |
| - Linear memory scaling with sequence length |
| - Extremely long-context friendly (tested up to 8k+ tokens) |
|
|
| --- |
|
|
| ## Training Data |
|
|
| The model was trained on: |
| - **Skylion007/openwebtext** |
|
|
| This dataset contains raw, unfiltered internet text and may include biased, incorrect, or unsafe content. |
|
|
| --- |
|
|
| ## Intended Use |
|
|
| **Primary use cases:** |
| - Research into transformer alternatives |
| - Long-context modeling experiments |
| - Architectural ablation studies |
| - Educational exploration of non-attention sequence models |
|
|
| **Not intended for:** |
| - Safety-critical applications |
| - Medical, legal, or financial advice |
| - Deployment as a production chatbot without additional alignment work |
|
|
| --- |
|
|
| ## Limitations |
|
|
| - This model is **research-grade**, not instruction-tuned |
| - Outputs may be: |
| - Incoherent |
| - Factually incorrect |
| - Biased or unsafe |
| - Performance characteristics differ significantly from transformer LMs |
| - No reinforcement learning or alignment tuning applied |
|
|
| --- |
|
|
| ## Ethical Considerations |
|
|
| GCLM was trained on publicly available web data and may reflect societal biases present in that data. |
|
|
| Users are responsible for: |
| - Applying appropriate filtering |
| - Avoiding harmful or misleading use cases |
| - Evaluating outputs critically |
|
|
| --- |
|
|
| ## License |
|
|
| This model is released under the **Apache License 2.0**. |
|
|
| You are free to: |
| - Use |
| - Modify |
| - Distribute |
| - Use commercially |
|
|
| Attribution and license preservation are required. |
| Patent rights are explicitly granted under this license. |
|
|
| --- |
|
|
| ## Citation |
|
|
| If you use GCLM in your research, please cite or reference the project. |
|
|
|
|
| ## Important |
|
|
| The model will not be put in the repo until it has finished training. |