umm-dev
/

ChatGCLM-Open

Text Generation

transformer-alternative

Model card Files Files and versions

ChatGCLM-Open / README.md

umm-dev's picture

Update README.md

c517fa7 verified 2 months ago

|

2.88 kB

	---
	license: apache-2.0
	datasets:
	- Skylion007/openwebtext
	language:
	- en
	pipeline_tag: text-generation
	tags:
	- research
	- convolutional
	- fft
	- transformer-alternative
	- causal-lm
	---

	# GCLM — Global Convolutional Language Model

	## Model Summary

	GCLM (Global Convolutional Language Model) is an experimental causal language model that replaces traditional self-attention with a hybrid local + global convolutional architecture.

	Instead of attention heads, GCLM uses:
	- Local depthwise convolutions for short-range context
	- FFT-based global convolutions for long-range sequence modeling

	This design explores whether global receptive fields can be achieved efficiently without quadratic attention, while remaining compatible with standard autoregressive language modeling.

	> GCLM is a transformer alternative — not a transformer replacement.

	---

	## Architecture Overview

	- Token + learned positional embeddings
	- Stacked convolutional blocks:
	- Local depthwise + pointwise convolution
	- Optional global FFT convolution every N layers
	- Feedforward MLP
	- Residual connections + LayerNorm
	- Causal language modeling head

	Key properties:
	- No attention mechanism
	- No KV cache
	- Linear memory scaling with sequence length
	- Extremely long-context friendly (tested up to 8k+ tokens)

	---

	## Training Data

	The model was trained on:
	- Skylion007/openwebtext

	This dataset contains raw, unfiltered internet text and may include biased, incorrect, or unsafe content.

	---

	## Intended Use

	Primary use cases:
	- Research into transformer alternatives
	- Long-context modeling experiments
	- Architectural ablation studies
	- Educational exploration of non-attention sequence models

	Not intended for:
	- Safety-critical applications
	- Medical, legal, or financial advice
	- Deployment as a production chatbot without additional alignment work

	---

	## Limitations

	- This model is research-grade, not instruction-tuned
	- Outputs may be:
	- Incoherent
	- Factually incorrect
	- Biased or unsafe
	- Performance characteristics differ significantly from transformer LMs
	- No reinforcement learning or alignment tuning applied

	---

	## Ethical Considerations

	GCLM was trained on publicly available web data and may reflect societal biases present in that data.

	Users are responsible for:
	- Applying appropriate filtering
	- Avoiding harmful or misleading use cases
	- Evaluating outputs critically

	---

	## License

	This model is released under the Apache License 2.0.

	You are free to:
	- Use
	- Modify
	- Distribute
	- Use commercially

	Attribution and license preservation are required.
	Patent rights are explicitly granted under this license.

	---

	## Citation

	If you use GCLM in your research, please cite or reference the project.


	## Important

	The model will not be put in the repo until it has finished training.