Aspire.Base / README.md
GODsStrongestSoldier's picture
Update README.md
7f3b408 verified
---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
library_name: transformers
tags:
- transformers
- llama
- long-context
- 256k-context
- reasoning
- instruction-following
- causal-lm
- text-generation-inference
- gqa
- rope-scaling
- bfloat16
- safetensors
- withinusai
- Aspire_1.1B
datasets:
- open-thoughts/OpenThoughts-114k
- WizardLMTeam/WizardLM_evol_instruct_70k
---
🌌 Aspire_1.1B
Long-Context Frontier Language Model
“Built to think across distance.”
🌌 Overview
Aspire_1.1B is a highly capable 1.1 billion parameter frontier language model engineered for extreme long-context reasoning, instruction following, and scalable inference efficiency.
Developed for persistent cognition workflows, Aspire_1.1B supports a native 256K context window while maintaining strong reasoning coherence and efficient memory utilization through:
* Grouped Query Attention (GQA)
* dynamically scaled RoPE embeddings
* optimized transformer routing
* TPU-native bfloat16 training
Unlike conventional small-scale models constrained by short context windows, Aspire_1.1B is designed for:
* long-form reasoning
* extended conversational continuity
* large document understanding
* retrieval-heavy workflows
* persistent agent memory systems
* scalable frontier experimentation
The architecture balances:
* efficiency
* reasoning capability
* long-context retention
* deployment practicality
⚡ Model Highlights
Attribute Value
Parameters ~1.12B
Architecture Llama-based Causal LM
Context Window 262,144 Tokens (256K)
Precision bfloat16
Hidden Size 2048
Layers 22
Attention Heads 16
KV Heads 4 (GQA)
Vocabulary 32K Custom BPE
Optimization Adafactor
Training Hardware Google Cloud TPUs
🧠 Architecture
Aspire_1.1B is built around a highly optimized transformer stack designed for efficient long-context scaling.
Core architectural features include:
* Grouped Query Attention (GQA)
* high-base Rotary Positional Embeddings (RoPE)
* TPU-optimized training pathways
* efficient KV-cache scaling
* long-sequence extrapolation support
The architecture is optimized for:
* inference efficiency
* stable long-context attention
* reduced memory overhead
* scalable deployment workflows
🌌 Long-Context Design
256K Context Window
Aspire_1.1B supports:
* 262,144 token context processing
* persistent conversational memory
* large-document reasoning
* long-form analytical workflows
* retrieval-augmented generation systems
The model utilizes:
* dynamically scaled RoPE embeddings
* Grouped Query Attention
* optimized attention routing
to maintain coherence across extremely long sequences.
🔬 Training Details
Hardware
Component Configuration
Accelerator Google Cloud TPUs (Kaggle TPU Environment)
Precision bfloat16
Optimization Adafactor
Framework Hugging Face Transformers + XLA
The model was trained using TPU-native workflows optimized for:
* efficient large-scale sequence processing
* stable long-context convergence
* reduced memory fragmentation
* uninterrupted checkpoint recovery
📚 Training Datasets
Aspire_1.1B was pretrained on a curated combination of reasoning and instruction-following datasets.
🧠 OpenThoughts-114k
A dense reasoning dataset focused on:
* chain-of-thought reasoning
* logical deduction
* structured inference
* analytical problem solving
Dataset:
OpenThoughts-114k
⚡ WizardLM Evol Instruct 70K
An evolved instruction-following dataset designed to improve:
* prompt adherence
* formatting consistency
* complex instruction execution
* conversational alignment
Dataset:
WizardLM Evol Instruct 70K
💻 Usage
Loading the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
repo_id = "GODsStrongestSoldier/Aspire_1.1B"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
repo_id,
torch_dtype=torch.bfloat16,
device_map="auto"
)
Text Generation Example
prompt = """
Explain the concept of RoPE (Rotary Positional Embeddings)
and how it benefits 256K context windows.
Answer:
"""
inputs = tokenizer(
prompt,
return_tensors="pt"
).to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9
)
response = tokenizer.decode(
outputs[0],
skip_special_tokens=True
)
print(response)
🔄 Checkpointing & Recovery
Aspire_1.1B was trained using a robust checkpointing system that continuously saved training state directly to the Hugging Face Hub.
This workflow enabled:
* uninterrupted TPU training continuation
* session recovery across Kaggle runtime limits
* persistent optimizer state management
* scalable long-duration pretraining workflows
⚙️ Intended Use Cases
Domain Purpose
Long-Context Chat Persistent conversational memory
Document Analysis Large-scale text understanding
Frontier Research Long-sequence experimentation
Instruction Following Complex prompt execution
Retrieval Systems RAG & memory augmentation
Agentic Workflows Persistent reasoning systems
⚠️ Limitations
Aspire_1.1B is an experimental open language model.
Human verification is recommended for:
* medical information
* legal advice
* financial decisions
* safety-critical applications
🌵 Origin
Developed through independent frontier AI experimentation using:
* Kaggle TPU infrastructure
* Hugging Face Transformers
* open reasoning datasets
* long-context architecture research
Focused on:
* efficient frontier models
* scalable context systems
* accessible open AI research
* persistent reasoning architectures
👑 Final Motto
“Long context is memory.
Memory is continuity.
Continuity is intelligence.”