postprocess at model res, defer resize+write to CPU (saves ~35s GPU) f4a7288 Running Nekochu commited on May 2
safetensors loading, Phase 0 4x faster (uint8), total time in status 33e616b Nekochu commited on May 2
quality: lower clean_matte threshold 0.25→0.02, always keep largest component 1363975 Nekochu commited on May 2
cleanup: stale comments, dead import, redundant makedirs, fix batch size in UI a2a7a3e Nekochu commited on May 2
simplify: merge write functions, fix missing Processed output, bulk transfer 9d23c67 Nekochu commited on May 2
remove dead code: AOTI export, inductor/triton cache, shared_results, deferred write 2a4471f Nekochu commited on May 2
fix: reduce-overhead instead of max-autotune (118s→~30s), dedicated export endpoint c53eb28 Nekochu commited on May 2
fix README: accurate torch.compile description, no triton/AOTI claim cdef1d9 Nekochu commited on May 2
add ZeroGPU GPU inference (FP16, flash-attn, batch=32@1024/16@2048) 0b6961f Nekochu commited on Mar 25