IQuest-Coder-V1 inference support and benchmarks

#10

by Geodd - opened 5 days ago

5 days ago

We’re DeployPad, a team focused on high-performance inference for open-weight models.

We’ve added official support for the IQuest-Coder-V1 family, with the goal of supporting the IQuestLab community by making the model easier and more cost-efficient to run without changing how you interact with it.

Current performance
~50–80 tokens/sec
Batch size: 32

We are also planning to add lower cost GPU options, including RTX Pro 6000 and L40s, both running at FP8 precision, to further reduce deployment cost while maintaining performance.

Benchmarks and methodology are publicly available here
https://github.com/geoddllc/large-llm-inference-benchmarks

If you want to try it yourself, you can deploy via the console
https://console.geodd.io/

(top up and run no platform fees beyond compute)

Feedback, questions, or requests from the community are welcome feel free to leave a comment below.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment