carbon-tokenization

Running

tfrere HF Staff Cursor commited on 22 days ago

Commit

3afbbdf

1 Parent(s): e9c2c73

fix(agent): default back to gpt-oss-120b, pin Llama 3.3 to Together

Groq (HF Router's :fastest pick for Llama 3.3 70B) validates tool calls
very strictly and was rejecting the editor's 18-tool registry with
"Failed to call a function. Please adjust your prompt." Fireworks has
deprecated the Llama 3.3 checkpoint; Nscale doesn't support the `tools`
parameter at all. Together serves the same model with full
tool-calling support, so pin Llama 3.3 to `:together`.

Restore openai/gpt-oss-120b as the default - it's HF's trending
tool-calling model and the reasoning-strip fix already covers the
Cerebras round-trip case.

Co-authored-by: Cursor <cursoragent@cursor.com>

Files changed (4) hide show

backend/.env.example +7 -2
backend/src/agent/chat.ts +13 -1
backend/src/agent/stream-handler.ts +7 -1
docs/SPECIFICATION.md +2 -2

backend/.env.example CHANGED Viewed

@@ -87,8 +87,13 @@ OAUTH_CLIENT_SECRET=
 # Override the default model id used by the chat agent. The list of
 # supported models is in backend/src/agent/chat.ts (AVAILABLE_MODELS), but
 # any model exposed by HF Inference Providers with tool-calling support
-# works. Defaults to "meta-llama/Llama-3.3-70B-Instruct".
-# HF_INFERENCE_MODEL=meta-llama/Llama-3.3-70B-Instruct
 # -----------------------------------------------------------------------------
 # Publishing

 # Override the default model id used by the chat agent. The list of
 # supported models is in backend/src/agent/chat.ts (AVAILABLE_MODELS), but
 # any model exposed by HF Inference Providers with tool-calling support
+# works. Defaults to "openai/gpt-oss-120b".
+# You can also pin a specific provider by appending ":<provider>" to the
+# model id, e.g. "meta-llama/Llama-3.3-70B-Instruct:together" - useful
+# to avoid providers (such as Groq or Nscale) that reject the editor's
+# wide tool registry with "Failed to call a function" or
+# "tools parameter not supported".
+# HF_INFERENCE_MODEL=openai/gpt-oss-120b
 # -----------------------------------------------------------------------------
 # Publishing

backend/src/agent/chat.ts CHANGED Viewed

@@ -8,6 +8,18 @@ import type { Request, Response } from "express";
  * Face Inference Providers (`https://router.huggingface.co/v1`) and
  * support function/tool calling - the agent loop won't work without it.
  *
  * Discover more conversational models here:
  *   https://huggingface.co/models?inference_provider=all&other=conversational
  *
@@ -16,9 +28,9 @@ import type { Request, Response } from "express";
  * own rates, see the docs for the source of truth.
  */
 export const AVAILABLE_MODELS = [
-  { id: "meta-llama/Llama-3.3-70B-Instruct", label: "Llama 3.3 70B", context: "128K", cost: "$" },
   { id: "openai/gpt-oss-120b", label: "GPT-OSS 120B", context: "131K", cost: "$$" },
   { id: "openai/gpt-oss-20b", label: "GPT-OSS 20B", context: "131K", cost: "$" },
   { id: "Qwen/Qwen3-Coder-480B-A35B-Instruct", label: "Qwen3 Coder 480B", context: "262K", cost: "$$" },
   { id: "deepseek-ai/DeepSeek-V3.1", label: "DeepSeek V3.1", context: "128K", cost: "$$" },
 ];

  * Face Inference Providers (`https://router.huggingface.co/v1`) and
  * support function/tool calling - the agent loop won't work without it.
  *
+ * Note about provider suffixes (`:provider`):
+ * HF Router defaults to the `:fastest` provider for a given model.
+ * That's usually fine, but a few providers don't fit the editor's
+ * workload:
+ *  - Groq enforces strict tool-call validation and tends to reject
+ *    our 18-tool registry with `Failed to call a function`.
+ *  - Nscale + a few others reject the `tools` parameter outright.
+ *  - Fireworks has deprecated several Llama 3.x checkpoints.
+ * We pin `Llama-3.3-70B` to Together, which serves the model with
+ * full tool-calling support. Unsuffixed ids use the default :fastest
+ * policy.
+ *
  * Discover more conversational models here:
  *   https://huggingface.co/models?inference_provider=all&other=conversational
  *
  * own rates, see the docs for the source of truth.
  */
 export const AVAILABLE_MODELS = [
   { id: "openai/gpt-oss-120b", label: "GPT-OSS 120B", context: "131K", cost: "$$" },
   { id: "openai/gpt-oss-20b", label: "GPT-OSS 20B", context: "131K", cost: "$" },
+  { id: "meta-llama/Llama-3.3-70B-Instruct:together", label: "Llama 3.3 70B", context: "128K", cost: "$" },
   { id: "Qwen/Qwen3-Coder-480B-A35B-Instruct", label: "Qwen3 Coder 480B", context: "262K", cost: "$$" },
   { id: "deepseek-ai/DeepSeek-V3.1", label: "DeepSeek V3.1", context: "128K", cost: "$$" },
 ];

backend/src/agent/stream-handler.ts CHANGED Viewed

@@ -4,7 +4,13 @@ import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
 import type { Request, Response } from "express";
 import { extractToken } from "../auth.js";
-export const DEFAULT_MODEL = "meta-llama/Llama-3.3-70B-Instruct";
 /**
  * Hugging Face Inference Providers exposes an OpenAI-compatible chat

 import type { Request, Response } from "express";
 import { extractToken } from "../auth.js";
+/**
+ * `openai/gpt-oss-120b` is HF's trending tool-calling model with a
+ * strong reputation on the editor's 18-tool agent loop. The `:fastest`
+ * policy routes it to Cerebras, which we've validated end-to-end
+ * (multi-turn, with the reasoning-strip below).
+ */
+export const DEFAULT_MODEL = "openai/gpt-oss-120b";
 /**
  * Hugging Face Inference Providers exposes an OpenAI-compatible chat

docs/SPECIFICATION.md CHANGED Viewed

@@ -127,7 +127,7 @@ flowchart LR
 ### 4.6 AI Agent
-- Provider: Hugging Face Inference Providers (`https://router.huggingface.co/v1`), default model `meta-llama/Llama-3.3-70B-Instruct`
 - Auth: per-request bearer token resolved from the editor's OAuth cookie when available, falling back to the server-side `HF_TOKEN`. On a HF Space with `inference-api` scope, no extra secret is needed - the logged-in user pays for their own inference under their HF quota.
 - Streaming via Vercel AI SDK `streamText` over `@ai-sdk/openai-compatible`
 - Reasoning parts from prior assistant turns are stripped before re-sending the history: providers like Cerebras reject `reasoning_content` on round-trip, and the model doesn't need to see its own past reasoning to continue the conversation.
@@ -283,7 +283,7 @@ The publisher reads these same CSS files server-side and injects them inline int
 | `OAUTH_SCOPES` | No (default `openid profile`) | OAuth scopes. Add `manage-repos` for dataset persistence and `inference-api` to power AI features with the user's token |
 | `HF_DATASET_ID` | No | Override dataset name (default: `{SPACE_ID}-data`) |
 | `HF_TOKEN` | For AI chat in local dev | Fallback Hub token for HF API + Inference Providers. Needs the "Make calls to Inference Providers" permission |
-| `HF_INFERENCE_MODEL` | No (default `meta-llama/Llama-3.3-70B-Instruct`) | Default chat-completion model id served by HF Inference Providers |
 | `ENABLE_PDF` | No (default true) | Toggle PDF/thumbnail generation |
 ### 6.4 Local Development

 ### 4.6 AI Agent
+- Provider: Hugging Face Inference Providers (`https://router.huggingface.co/v1`), default model `openai/gpt-oss-120b`. Model ids may be suffixed with `:<provider>` (e.g. `meta-llama/Llama-3.3-70B-Instruct:together`) to bypass providers that enforce overly strict tool-call validation (notably Groq) or that don't support the `tools` parameter (Nscale, etc.).
 - Auth: per-request bearer token resolved from the editor's OAuth cookie when available, falling back to the server-side `HF_TOKEN`. On a HF Space with `inference-api` scope, no extra secret is needed - the logged-in user pays for their own inference under their HF quota.
 - Streaming via Vercel AI SDK `streamText` over `@ai-sdk/openai-compatible`
 - Reasoning parts from prior assistant turns are stripped before re-sending the history: providers like Cerebras reject `reasoning_content` on round-trip, and the model doesn't need to see its own past reasoning to continue the conversation.
 | `OAUTH_SCOPES` | No (default `openid profile`) | OAuth scopes. Add `manage-repos` for dataset persistence and `inference-api` to power AI features with the user's token |
 | `HF_DATASET_ID` | No | Override dataset name (default: `{SPACE_ID}-data`) |
 | `HF_TOKEN` | For AI chat in local dev | Fallback Hub token for HF API + Inference Providers. Needs the "Make calls to Inference Providers" permission |
+| `HF_INFERENCE_MODEL` | No (default `openai/gpt-oss-120b`) | Default chat-completion model id served by HF Inference Providers. May be suffixed with `:<provider>` to pin a specific routing |
 | `ENABLE_PDF` | No (default true) | Toggle PDF/thumbnail generation |
 ### 6.4 Local Development