In a Training Loop π
Β·
AI & ML interests
None yet
Recent Activity
reacted to ManniX-ITA's post with π about 2 hours ago v1.1.0 was Claude + Ollama chat. Eight releases later the stack is a grounded research pipeline plus a local-first memory layer; the token crunch is operational now, not a quality wall.
π claude-hooks v1.8.3 β highlights since v1.1.0.
π§ /consultants v2 β agentic council, matured.
π tool_executor β PLANβREPORT lane runs read_file / grep / glob over the codebase before the researcher speaks; claims grounded in tool output, not vibes.
βοΈβ coder β sandboxed write_file role with per-language model routing (50KB/file, 1MB/lane caps).
π‘οΈ CitationLinter β three-layer verifier at the researcher boundary; every `path:line` claim checked against an mtime-cached code_graph. Catches fabricated filenames before they launder through critics + synthesizer.
πΎ M14 cross-session memory (default on).
LangGraph BaseStore wired across four namespaces: research / tool_results / project / user. Per-namespace TTL: research=30d, tool_results=24h, project+user=forever. Hourly Caliber-style distillation reaper summarizes
expiring research into the durable project namespace BEFORE deletion β episodic β semantic, like human consolidation. Originals only dropped after a successful summary write.
π sqlite_vec β full pgvector parity (v1.7).
Hybrid recall via RRF over vector cosine + BM25 (FTS5). KG surface: kg_create_entities / kg_add_observations / kg_create_relations / kg_search_nodes. Bundled sqlite-vec-mcp launcher went 3β8 tools so Cursor / Codex /
OpenWebUI / Claude Desktop share the same .db. Lazy schema migration carries v1.6.x dbs in place, non-destructive.
π§© llamafile chat + embed (v1.4 + v1.5).
HyDE / reflect / consolidate / get-advice / consultants route to a daemon-supervised local llamafile via the `llamafile://<label>` model prefix. Multi-instance LRU, per-label idle reap, sticky CPU fallback. Stack runs
offline now.
π§ Linux / macOS / Windows. PostgreSQL OR SQLite. Local OR cloud LLMs.
π github.com/mann1x/claude-hooks reacted to Doradus-AI's post with π about 3 hours ago Tonight we validated a small upstream vLLM fix that brings GLM-5.1-REAP-478B back into our consumer-Blackwell rotation pool.
Sleep/wake on 4Γ RTX PRO 6000 (SM_120) had a CuMemAllocator race that retired GLM in April: cuMemUnmap runs synchronously from the host the moment a pool-backed tensor's refcount hits zero, but kernels can still be in flight against that storage, accumulating CUDA_ERROR_ILLEGAL_ADDRESS,
engine eventually unrecoverable.
vllm-project/vllm#43020 is a one-line torch.cuda.synchronize() at the top of _python_free_callback. Steady-state inference unaffected (only cumem
frees pay the cost).
We caught the unpatched bug live during validation:
```
CUDA Error: invalid argument at /build/vllm/csrc/cumem_allocator.cpp:146
```
That's the exact failure class #43020 fixes. With it bind-mounted in: Q3.6-27B sleep/wake cycle clean (25.8 GiB VRAM released on /sleep level=1,
engine alive, post-wake chat coherent), GLM 30-request stress test 30/30 PASS, 0 CUDA errors. Back into rotation.
Side win: we're also submitting a generic Triton autotune shmem-budget helper upstream that replaces hand-rolled check_shared_mem() ? [64,128] :
[32,64] bucket switches with per-config precision via Triton's existing prune_configs_by={"early_config_prune": ...} hook. Zero change to the
H100/H200 fast path. Submitted: vllm-project/vllm#43047
Full writeup with byte math + stress-test logs + the bind-mount overlay pattern: https://doradusresearch.ai/blog/sleep-mode-on-blackwell-part-2/
Hardware: 4Γ NVIDIA RTX PRO 6000 Blackwell Workstation Edition (SM_120, 95 GiB per GPU, 101 KiB per-block opt-in shmem).
Image stack documented in the writeup! reacted to kanaria007's post with π about 3 hours ago β
Article highlight: Honest Benchmarking for Governed Intelligence Platforms (art-60-241, v0.1)
TL;DR:
This article argues that benchmark results should be published as bounded observations, not inflated into platform claims.
A governed benchmark should not quietly turn βwe measured this result under these conditionsβ into βtherefore this platform is more governed, safer, or more production-ready.β Honest benchmarking separates reproducibility, comparability, and disclosabilityβand keeps benchmark outcomes distinct from stronger governance or platform-readiness claims.
Read:
https://huggingface.co/datasets/kanaria007/agi-structural-intelligence-protocols/blob/main/article/60-supplements/art-60-241-honest-benchmarking-for-governed-intelligence-platforms.md
Why it matters:
β’ prevents benchmark scores from being laundered into governance-readiness claims
β’ distinguishes reproducible results from truly comparable rankings
β’ makes public benchmark language respect disclosure floors and evidence class
β’ gives a clean way to publish strong numbers without overclaiming what they mean
Whatβs inside:
β’ the separation between reproducibility, comparability, and disclosability
β’ the rule that a benchmark result is not the same thing as a platform claim
β’ a benchmark disclosure profile that sets the publication floor
β’ a governed benchmark pack that binds runtime, toolchain, policy surface, evidence class, and results
β’ a comparability declaration and benchmark publication report that state what public reading is actually supportable
Key idea:
Do not say:
βwe ranked higher, therefore we are better governed.β
Say:
βthis governed benchmark pack produced these results under this disclosed runtime, toolchain, policy, and evidence surface; this comparability declaration defines what we are and are not fairly comparable to; and this publication report states exactly what public reading is supportable without inflating benchmark observations into stronger platform claims.β
View all activity Organizations