Text Generation
Transformers
Safetensors
English
llama
long-context
256k-context
reasoning
instruction-following
causal-lm
text-generation-inference
gqa
rope-scaling
bfloat16
withinusai
Aspire_1.1B
Instructions to use WithinUsAI/Aspire.Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WithinUsAI/Aspire.Base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="WithinUsAI/Aspire.Base")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("WithinUsAI/Aspire.Base", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use WithinUsAI/Aspire.Base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "WithinUsAI/Aspire.Base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WithinUsAI/Aspire.Base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/WithinUsAI/Aspire.Base
- SGLang
How to use WithinUsAI/Aspire.Base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "WithinUsAI/Aspire.Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WithinUsAI/Aspire.Base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "WithinUsAI/Aspire.Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WithinUsAI/Aspire.Base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use WithinUsAI/Aspire.Base with Docker Model Runner:
docker model run hf.co/WithinUsAI/Aspire.Base
| license: apache-2.0 | |
| language: | |
| - en | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| tags: | |
| - transformers | |
| - llama | |
| - long-context | |
| - 256k-context | |
| - reasoning | |
| - instruction-following | |
| - causal-lm | |
| - text-generation-inference | |
| - gqa | |
| - rope-scaling | |
| - bfloat16 | |
| - safetensors | |
| - withinusai | |
| - Aspire_1.1B | |
| datasets: | |
| - open-thoughts/OpenThoughts-114k | |
| - WizardLMTeam/WizardLM_evol_instruct_70k | |
| 🌌 Aspire_1.1B | |
| Long-Context Frontier Language Model | |
| “Built to think across distance.” | |
| ⸻ | |
| 🌌 Overview | |
| Aspire_1.1B is a highly capable 1.1 billion parameter frontier language model engineered for extreme long-context reasoning, instruction following, and scalable inference efficiency. | |
| Developed for persistent cognition workflows, Aspire_1.1B supports a native 256K context window while maintaining strong reasoning coherence and efficient memory utilization through: | |
| * Grouped Query Attention (GQA) | |
| * dynamically scaled RoPE embeddings | |
| * optimized transformer routing | |
| * TPU-native bfloat16 training | |
| Unlike conventional small-scale models constrained by short context windows, Aspire_1.1B is designed for: | |
| * long-form reasoning | |
| * extended conversational continuity | |
| * large document understanding | |
| * retrieval-heavy workflows | |
| * persistent agent memory systems | |
| * scalable frontier experimentation | |
| The architecture balances: | |
| * efficiency | |
| * reasoning capability | |
| * long-context retention | |
| * deployment practicality | |
| ⸻ | |
| ⚡ Model Highlights | |
| Attribute Value | |
| Parameters ~1.12B | |
| Architecture Llama-based Causal LM | |
| Context Window 262,144 Tokens (256K) | |
| Precision bfloat16 | |
| Hidden Size 2048 | |
| Layers 22 | |
| Attention Heads 16 | |
| KV Heads 4 (GQA) | |
| Vocabulary 32K Custom BPE | |
| Optimization Adafactor | |
| Training Hardware Google Cloud TPUs | |
| ⸻ | |
| 🧠 Architecture | |
| Aspire_1.1B is built around a highly optimized transformer stack designed for efficient long-context scaling. | |
| Core architectural features include: | |
| * Grouped Query Attention (GQA) | |
| * high-base Rotary Positional Embeddings (RoPE) | |
| * TPU-optimized training pathways | |
| * efficient KV-cache scaling | |
| * long-sequence extrapolation support | |
| The architecture is optimized for: | |
| * inference efficiency | |
| * stable long-context attention | |
| * reduced memory overhead | |
| * scalable deployment workflows | |
| ⸻ | |
| 🌌 Long-Context Design | |
| 256K Context Window | |
| Aspire_1.1B supports: | |
| * 262,144 token context processing | |
| * persistent conversational memory | |
| * large-document reasoning | |
| * long-form analytical workflows | |
| * retrieval-augmented generation systems | |
| The model utilizes: | |
| * dynamically scaled RoPE embeddings | |
| * Grouped Query Attention | |
| * optimized attention routing | |
| to maintain coherence across extremely long sequences. | |
| ⸻ | |
| 🔬 Training Details | |
| Hardware | |
| Component Configuration | |
| Accelerator Google Cloud TPUs (Kaggle TPU Environment) | |
| Precision bfloat16 | |
| Optimization Adafactor | |
| Framework Hugging Face Transformers + XLA | |
| The model was trained using TPU-native workflows optimized for: | |
| * efficient large-scale sequence processing | |
| * stable long-context convergence | |
| * reduced memory fragmentation | |
| * uninterrupted checkpoint recovery | |
| ⸻ | |
| 📚 Training Datasets | |
| Aspire_1.1B was pretrained on a curated combination of reasoning and instruction-following datasets. | |
| ⸻ | |
| 🧠 OpenThoughts-114k | |
| A dense reasoning dataset focused on: | |
| * chain-of-thought reasoning | |
| * logical deduction | |
| * structured inference | |
| * analytical problem solving | |
| Dataset: | |
| OpenThoughts-114k | |
| ⸻ | |
| ⚡ WizardLM Evol Instruct 70K | |
| An evolved instruction-following dataset designed to improve: | |
| * prompt adherence | |
| * formatting consistency | |
| * complex instruction execution | |
| * conversational alignment | |
| Dataset: | |
| WizardLM Evol Instruct 70K | |
| ⸻ | |
| 💻 Usage | |
| Loading the Model | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| import torch | |
| repo_id = "GODsStrongestSoldier/Aspire_1.1B" | |
| tokenizer = AutoTokenizer.from_pretrained(repo_id) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| repo_id, | |
| torch_dtype=torch.bfloat16, | |
| device_map="auto" | |
| ) | |
| ⸻ | |
| Text Generation Example | |
| prompt = """ | |
| Explain the concept of RoPE (Rotary Positional Embeddings) | |
| and how it benefits 256K context windows. | |
| Answer: | |
| """ | |
| inputs = tokenizer( | |
| prompt, | |
| return_tensors="pt" | |
| ).to(model.device) | |
| outputs = model.generate( | |
| **inputs, | |
| max_new_tokens=512, | |
| temperature=0.7, | |
| top_p=0.9 | |
| ) | |
| response = tokenizer.decode( | |
| outputs[0], | |
| skip_special_tokens=True | |
| ) | |
| print(response) | |
| ⸻ | |
| 🔄 Checkpointing & Recovery | |
| Aspire_1.1B was trained using a robust checkpointing system that continuously saved training state directly to the Hugging Face Hub. | |
| This workflow enabled: | |
| * uninterrupted TPU training continuation | |
| * session recovery across Kaggle runtime limits | |
| * persistent optimizer state management | |
| * scalable long-duration pretraining workflows | |
| ⸻ | |
| ⚙️ Intended Use Cases | |
| Domain Purpose | |
| Long-Context Chat Persistent conversational memory | |
| Document Analysis Large-scale text understanding | |
| Frontier Research Long-sequence experimentation | |
| Instruction Following Complex prompt execution | |
| Retrieval Systems RAG & memory augmentation | |
| Agentic Workflows Persistent reasoning systems | |
| ⸻ | |
| ⚠️ Limitations | |
| Aspire_1.1B is an experimental open language model. | |
| Human verification is recommended for: | |
| * medical information | |
| * legal advice | |
| * financial decisions | |
| * safety-critical applications | |
| ⸻ | |
| 🌵 Origin | |
| Developed through independent frontier AI experimentation using: | |
| * Kaggle TPU infrastructure | |
| * Hugging Face Transformers | |
| * open reasoning datasets | |
| * long-context architecture research | |
| Focused on: | |
| * efficient frontier models | |
| * scalable context systems | |
| * accessible open AI research | |
| * persistent reasoning architectures | |
| ⸻ | |
| 👑 Final Motto | |
| “Long context is memory. | |
| Memory is continuity. | |
| Continuity is intelligence.” |