| | --- |
| | license: apache-2.0 |
| | tags: |
| | - geometric-deep-learning |
| | - vae |
| | - patch-analysis |
| | - gate-vectors |
| | - text-to-geometry |
| | - rosetta-stone |
| | - multimodal |
| | - experimental |
| | - custom_code |
| | datasets: |
| | - AbstractPhil/synthetic-characters |
| | --- |
| | |
| | # GeoVocab Patch Maker |
| |
|
| | **A geometric vocabulary extractor that reads structural properties from latent patches β and proved that text carries the same geometric structure as images.** |
| |
|
| | This is a two-tier gated geometric transformer trained on 27 geometric primitives (point through channel) in 8Γ16Γ16 voxel grids. It extracts 17-dimensional gate vectors (explicit geometric properties) and 256-dimensional patch features (learned representations) from any compatible latent input. |
| |
|
| | ## What It Does |
| |
|
| | Takes an `(8, 16, 16)` tensor β originally voxel grids, but proven to work on adapted FLUX VAE latents and text-derived latent patches β and produces per-patch geometric descriptors: |
| |
|
| | ```python |
| | from geometric_model import load_from_hub, extract_features |
| | |
| | model = load_from_hub() |
| | gate_vectors, patch_features = extract_features(model, patches) |
| | # gate_vectors: (N, 64, 17) β interpretable geometric properties |
| | # patch_features: (N, 64, 256) β learned representations |
| | ``` |
| |
|
| | ### Gate Vector Anatomy (17 dimensions) |
| |
|
| | | Dims | Property | Type | Meaning | |
| | |---|---|---|---| |
| | | 0β3 | dimensionality | softmax(4) | 0D point, 1D line, 2D surface, 3D volume | |
| | | 4β6 | curvature | softmax(3) | rigid, curved, combined | |
| | | 7 | boundary | sigmoid(1) | partial fill (surface patch) | |
| | | 8β10 | axis_active | sigmoid(3) | which axes have spatial extent | |
| | | 11β12 | topology | softmax(2) | open vs closed (neighbor-based) | |
| | | 13 | neighbor_density | sigmoid(1) | normalized neighbor count | |
| | | 14β16 | surface_role | softmax(3) | isolated, boundary, interior | |
| | |
| | Dimensions 0β10 are **local** (intrinsic to each patch, no cross-patch info). Dimensions 11β16 are **structural** (relational, computed after attention sees neighborhood context). |
| | |
| | ## Architecture |
| | |
| | ``` |
| | (8, 16, 16) input |
| | β |
| | PatchEmbedding3D β (B, 64, 64) # 64 patches of 32 voxels each |
| | β |
| | Stage 0: Local Encoder + Gate Heads # dims, curvature, boundary, axes |
| | β |
| | proj([embedding, local_gates]) β (B, 64, 128) |
| | β |
| | Stage 1: Bootstrap Transformer Γ2 # standard attention with local context |
| | β |
| | Stage 1.5: Structural Gate Heads # topology, neighbors, surface role |
| | β |
| | Stage 2: Geometric Transformer Γ2 # gated attention modulated by all 17 gates |
| | β |
| | Stage 3: Classification Heads # 27-class shape recognition |
| | ``` |
| | |
| | The geometric transformer blocks use gate-modulated attention: Q and K are projected from `[hidden, all_gates]`, V is multiplicatively gated, and per-head compatibility scores are computed from gate interactions. |
| |
|
| | ## The Rosetta Stone Discovery |
| |
|
| | This model was used as the analyzer in the [GeoVAE Proto experiments](https://huggingface.co/AbstractPhil/geovae-proto), which proved that text descriptions produce **2.5β3.5Γ stronger geometric differentiation** than actual images when projected through a lightweight VAE into this model's patch space. |
| |
|
| | | Source | patch_feat discriminability | |
| | |---|---| |
| | | FLUX images (49k) | +0.020 | |
| | | flan-t5-small text | +0.053 | |
| | | bert-base-uncased text | +0.053 | |
| | | bert-beatrix-2048 text | +0.050 | |
| | |
| | Three architecturally different text encoders converge to Β±5% of each other β the geometric structure is in the language, not the encoder. This model reads it. |
| | |
| | ## Training |
| | |
| | Trained on procedurally generated multi-shape superposition grids (2β4 overlapping geometric primitives per sample, 27 shape classes). Two-tier gate supervision with ground truth computed from voxel analysis: |
| | |
| | - **Local gates**: dimensionality from axis extent, curvature from fill ratio, boundary from partial occupancy |
| | - **Structural gates**: topology from 3D convolution neighbor counting, surface role from neighbor density thresholds |
| | |
| | 200 epochs, achieving 93.8% recall on shape classification with explicit geometric property prediction as auxiliary objectives. |
| | |
| | ## Files |
| | |
| | | File | Description | |
| | |---|---| |
| | | `geometric_model.py` | Standalone model + `load_from_hub()` + `extract_features()` | |
| | | `model.pt` | Pretrained weights (epoch 200) | |
| | |
| | ## Usage |
| | |
| | ```python |
| | import torch |
| | from geometric_model import SuperpositionPatchClassifier, load_from_hub, extract_features |
| | |
| | # Load pretrained |
| | model = load_from_hub() |
| | |
| | # From any (8, 16, 16) source |
| | patches = torch.randn(16, 8, 16, 16).cuda() |
| | gate_vectors, patch_features = extract_features(model, patches) |
| |
|
| | # Or full output dict |
| | out = model(patches) |
| | out["local_dim_logits"] # (B, 64, 4) dimensionality |
| | out["local_curv_logits"] # (B, 64, 3) curvature |
| | out["struct_topo_logits"] # (B, 64, 2) topology |
| | out["patch_features"] # (B, 64, 128) learned features |
| | out["patch_shape_logits"] # (B, 64, 27) shape classification |
| | ``` |
| | |
| | ## Related |
| | |
| | - [AbstractPhil/geovae-proto](https://huggingface.co/AbstractPhil/geovae-proto) β The Rosetta Stone experiments (textβgeometry VAEs) |
| | - [AbstractPhil/synthetic-characters](https://huggingface.co/datasets/AbstractPhil/synthetic-characters) β 49k FLUX-generated character dataset |
| | - [AbstractPhil/grid-geometric-multishape](https://huggingface.co/AbstractPhil/grid-geometric-multishape) β Original training repo with checkpoints |
| | |
| | ## Citation |
| | |
| | Geometric deep learning research by [AbstractPhil](https://huggingface.co/AbstractPhil). The model demonstrates that geometric structure is a universal language bridging text and visual modalities β symbolic association through geometric language. |