geniuspro-coder-v1 / README.md
StabIeGenius's picture
Update README.md
794ab9e verified
metadata
license: apache-2.0
language:
  - en
tags:
  - code
  - assistant
  - ollama
  - openai-compatible
  - streaming
  - voice
pipeline_tag: text-generation
inference: false

GeniusPro Coder v1

GeniusPro Coder v1 is a coding-focused AI assistant model built for providing intelligent code generation, explanation, and general-purpose AI assistance.

Highlights

  • Code generation, debugging, and explanation across multiple languages
  • Natural conversational ability for non-code tasks
  • OpenAI-compatible API (drop-in replacement for existing tooling)
  • Streaming support for real-time token delivery
  • Voice mode with concise, spoken-friendly responses
  • Runs locally on consumer hardware via Ollama

Intended Use

GeniusPro Coder v1 is designed for:

  • Code assistance — generating, reviewing, debugging, and explaining code
  • Chat — general-purpose question answering and conversation
  • Voice interaction — concise, natural-language responses optimized for text-to-speech

It powers the GeniusPro platform, which includes a web-based chat dashboard and a real-time voice assistant.

Supported Parameters

Parameter Description
temperature Controls randomness (0.0 = deterministic, 1.0 = creative)
top_p Nucleus sampling threshold
max_tokens Maximum tokens to generate
stop Stop sequences
stream Enable streaming responses (SSE)

Available Endpoints

Endpoint Method Description
/v1/models GET List available models
/v1/chat/completions POST Chat completions (streaming + non-streaming)
/v1/voice WebSocket Real-time voice interaction
/health GET Health check (no auth required)

Running Locally with Ollama

# Pull the model
ollama pull geniuspro-coder-v1

# Run interactively
ollama run geniuspro-coder-v1

# Serve via API
ollama serve

Once running, the model is available at http://localhost:11434 with the same OpenAI-compatible API format.

Infrastructure

GeniusPro Coder v1 runs on dedicated hardware for low-latency inference:

  • GPU: NVIDIA RTX 5090 (32 GB VRAM)
  • Runtime: Ollama for model serving
  • Gateway: FastAPI reverse proxy with auth, rate limiting, and usage tracking
  • Deployment: Ubuntu Server behind Nginx + Cloudflare Tunnel

Limitations

  • Optimized for English. Other languages may work but are not officially supported.
  • Code generation quality varies by language — strongest in Python, JavaScript/TypeScript, and common web technologies.
  • Not suitable for safety-critical applications without human review.
  • Context window and output length are bounded by the underlying architecture.

License

This model is released under the Apache 2.0 License.