geniuspro-coder-v1 / README.md

StabIeGenius

Update README.md

794ab9e verified 6 days ago

preview code

raw

history blame contribute delete

2.92 kB

metadata

license: apache-2.0
language:
  - en
tags:
  - code
  - assistant
  - ollama
  - openai-compatible
  - streaming
  - voice
pipeline_tag: text-generation
inference: false

GeniusPro Coder v1

GeniusPro Coder v1 is a coding-focused AI assistant model built for providing intelligent code generation, explanation, and general-purpose AI assistance.

Highlights

Code generation, debugging, and explanation across multiple languages
Natural conversational ability for non-code tasks
OpenAI-compatible API (drop-in replacement for existing tooling)
Streaming support for real-time token delivery
Voice mode with concise, spoken-friendly responses
Runs locally on consumer hardware via Ollama

Intended Use

GeniusPro Coder v1 is designed for:

Code assistance — generating, reviewing, debugging, and explaining code
Chat — general-purpose question answering and conversation
Voice interaction — concise, natural-language responses optimized for text-to-speech

It powers the GeniusPro platform, which includes a web-based chat dashboard and a real-time voice assistant.

Supported Parameters

Parameter	Description
`temperature`	Controls randomness (0.0 = deterministic, 1.0 = creative)
`top_p`	Nucleus sampling threshold
`max_tokens`	Maximum tokens to generate
`stop`	Stop sequences
`stream`	Enable streaming responses (SSE)

Available Endpoints

Endpoint	Method	Description
`/v1/models`	GET	List available models
`/v1/chat/completions`	POST	Chat completions (streaming + non-streaming)
`/v1/voice`	WebSocket	Real-time voice interaction
`/health`	GET	Health check (no auth required)

Running Locally with Ollama

# Pull the model
ollama pull geniuspro-coder-v1

# Run interactively
ollama run geniuspro-coder-v1

# Serve via API
ollama serve

Once running, the model is available at http://localhost:11434 with the same OpenAI-compatible API format.

Infrastructure

GeniusPro Coder v1 runs on dedicated hardware for low-latency inference:

GPU: NVIDIA RTX 5090 (32 GB VRAM)
Runtime: Ollama for model serving
Gateway: FastAPI reverse proxy with auth, rate limiting, and usage tracking
Deployment: Ubuntu Server behind Nginx + Cloudflare Tunnel

Limitations

Optimized for English. Other languages may work but are not officially supported.
Code generation quality varies by language — strongest in Python, JavaScript/TypeScript, and common web technologies.
Not suitable for safety-critical applications without human review.
Context window and output length are bounded by the underlying architecture.

License

This model is released under the Apache 2.0 License.