tfrere HF Staff Cursor commited on
Commit
3afbbdf
·
1 Parent(s): e9c2c73

fix(agent): default back to gpt-oss-120b, pin Llama 3.3 to Together

Browse files

Groq (HF Router's :fastest pick for Llama 3.3 70B) validates tool calls
very strictly and was rejecting the editor's 18-tool registry with
"Failed to call a function. Please adjust your prompt." Fireworks has
deprecated the Llama 3.3 checkpoint; Nscale doesn't support the `tools`
parameter at all. Together serves the same model with full
tool-calling support, so pin Llama 3.3 to `:together`.

Restore openai/gpt-oss-120b as the default - it's HF's trending
tool-calling model and the reasoning-strip fix already covers the
Cerebras round-trip case.

Co-authored-by: Cursor <cursoragent@cursor.com>

backend/.env.example CHANGED
@@ -87,8 +87,13 @@ OAUTH_CLIENT_SECRET=
87
  # Override the default model id used by the chat agent. The list of
88
  # supported models is in backend/src/agent/chat.ts (AVAILABLE_MODELS), but
89
  # any model exposed by HF Inference Providers with tool-calling support
90
- # works. Defaults to "meta-llama/Llama-3.3-70B-Instruct".
91
- # HF_INFERENCE_MODEL=meta-llama/Llama-3.3-70B-Instruct
 
 
 
 
 
92
 
93
  # -----------------------------------------------------------------------------
94
  # Publishing
 
87
  # Override the default model id used by the chat agent. The list of
88
  # supported models is in backend/src/agent/chat.ts (AVAILABLE_MODELS), but
89
  # any model exposed by HF Inference Providers with tool-calling support
90
+ # works. Defaults to "openai/gpt-oss-120b".
91
+ # You can also pin a specific provider by appending ":<provider>" to the
92
+ # model id, e.g. "meta-llama/Llama-3.3-70B-Instruct:together" - useful
93
+ # to avoid providers (such as Groq or Nscale) that reject the editor's
94
+ # wide tool registry with "Failed to call a function" or
95
+ # "tools parameter not supported".
96
+ # HF_INFERENCE_MODEL=openai/gpt-oss-120b
97
 
98
  # -----------------------------------------------------------------------------
99
  # Publishing
backend/src/agent/chat.ts CHANGED
@@ -8,6 +8,18 @@ import type { Request, Response } from "express";
8
  * Face Inference Providers (`https://router.huggingface.co/v1`) and
9
  * support function/tool calling - the agent loop won't work without it.
10
  *
 
 
 
 
 
 
 
 
 
 
 
 
11
  * Discover more conversational models here:
12
  * https://huggingface.co/models?inference_provider=all&other=conversational
13
  *
@@ -16,9 +28,9 @@ import type { Request, Response } from "express";
16
  * own rates, see the docs for the source of truth.
17
  */
18
  export const AVAILABLE_MODELS = [
19
- { id: "meta-llama/Llama-3.3-70B-Instruct", label: "Llama 3.3 70B", context: "128K", cost: "$" },
20
  { id: "openai/gpt-oss-120b", label: "GPT-OSS 120B", context: "131K", cost: "$$" },
21
  { id: "openai/gpt-oss-20b", label: "GPT-OSS 20B", context: "131K", cost: "$" },
 
22
  { id: "Qwen/Qwen3-Coder-480B-A35B-Instruct", label: "Qwen3 Coder 480B", context: "262K", cost: "$$" },
23
  { id: "deepseek-ai/DeepSeek-V3.1", label: "DeepSeek V3.1", context: "128K", cost: "$$" },
24
  ];
 
8
  * Face Inference Providers (`https://router.huggingface.co/v1`) and
9
  * support function/tool calling - the agent loop won't work without it.
10
  *
11
+ * Note about provider suffixes (`:provider`):
12
+ * HF Router defaults to the `:fastest` provider for a given model.
13
+ * That's usually fine, but a few providers don't fit the editor's
14
+ * workload:
15
+ * - Groq enforces strict tool-call validation and tends to reject
16
+ * our 18-tool registry with `Failed to call a function`.
17
+ * - Nscale + a few others reject the `tools` parameter outright.
18
+ * - Fireworks has deprecated several Llama 3.x checkpoints.
19
+ * We pin `Llama-3.3-70B` to Together, which serves the model with
20
+ * full tool-calling support. Unsuffixed ids use the default :fastest
21
+ * policy.
22
+ *
23
  * Discover more conversational models here:
24
  * https://huggingface.co/models?inference_provider=all&other=conversational
25
  *
 
28
  * own rates, see the docs for the source of truth.
29
  */
30
  export const AVAILABLE_MODELS = [
 
31
  { id: "openai/gpt-oss-120b", label: "GPT-OSS 120B", context: "131K", cost: "$$" },
32
  { id: "openai/gpt-oss-20b", label: "GPT-OSS 20B", context: "131K", cost: "$" },
33
+ { id: "meta-llama/Llama-3.3-70B-Instruct:together", label: "Llama 3.3 70B", context: "128K", cost: "$" },
34
  { id: "Qwen/Qwen3-Coder-480B-A35B-Instruct", label: "Qwen3 Coder 480B", context: "262K", cost: "$$" },
35
  { id: "deepseek-ai/DeepSeek-V3.1", label: "DeepSeek V3.1", context: "128K", cost: "$$" },
36
  ];
backend/src/agent/stream-handler.ts CHANGED
@@ -4,7 +4,13 @@ import { createOpenAICompatible } from "@ai-sdk/openai-compatible";
4
  import type { Request, Response } from "express";
5
  import { extractToken } from "../auth.js";
6
 
7
- export const DEFAULT_MODEL = "meta-llama/Llama-3.3-70B-Instruct";
 
 
 
 
 
 
8
 
9
  /**
10
  * Hugging Face Inference Providers exposes an OpenAI-compatible chat
 
4
  import type { Request, Response } from "express";
5
  import { extractToken } from "../auth.js";
6
 
7
+ /**
8
+ * `openai/gpt-oss-120b` is HF's trending tool-calling model with a
9
+ * strong reputation on the editor's 18-tool agent loop. The `:fastest`
10
+ * policy routes it to Cerebras, which we've validated end-to-end
11
+ * (multi-turn, with the reasoning-strip below).
12
+ */
13
+ export const DEFAULT_MODEL = "openai/gpt-oss-120b";
14
 
15
  /**
16
  * Hugging Face Inference Providers exposes an OpenAI-compatible chat
docs/SPECIFICATION.md CHANGED
@@ -127,7 +127,7 @@ flowchart LR
127
 
128
  ### 4.6 AI Agent
129
 
130
- - Provider: Hugging Face Inference Providers (`https://router.huggingface.co/v1`), default model `meta-llama/Llama-3.3-70B-Instruct`
131
  - Auth: per-request bearer token resolved from the editor's OAuth cookie when available, falling back to the server-side `HF_TOKEN`. On a HF Space with `inference-api` scope, no extra secret is needed - the logged-in user pays for their own inference under their HF quota.
132
  - Streaming via Vercel AI SDK `streamText` over `@ai-sdk/openai-compatible`
133
  - Reasoning parts from prior assistant turns are stripped before re-sending the history: providers like Cerebras reject `reasoning_content` on round-trip, and the model doesn't need to see its own past reasoning to continue the conversation.
@@ -283,7 +283,7 @@ The publisher reads these same CSS files server-side and injects them inline int
283
  | `OAUTH_SCOPES` | No (default `openid profile`) | OAuth scopes. Add `manage-repos` for dataset persistence and `inference-api` to power AI features with the user's token |
284
  | `HF_DATASET_ID` | No | Override dataset name (default: `{SPACE_ID}-data`) |
285
  | `HF_TOKEN` | For AI chat in local dev | Fallback Hub token for HF API + Inference Providers. Needs the "Make calls to Inference Providers" permission |
286
- | `HF_INFERENCE_MODEL` | No (default `meta-llama/Llama-3.3-70B-Instruct`) | Default chat-completion model id served by HF Inference Providers |
287
  | `ENABLE_PDF` | No (default true) | Toggle PDF/thumbnail generation |
288
 
289
  ### 6.4 Local Development
 
127
 
128
  ### 4.6 AI Agent
129
 
130
+ - Provider: Hugging Face Inference Providers (`https://router.huggingface.co/v1`), default model `openai/gpt-oss-120b`. Model ids may be suffixed with `:<provider>` (e.g. `meta-llama/Llama-3.3-70B-Instruct:together`) to bypass providers that enforce overly strict tool-call validation (notably Groq) or that don't support the `tools` parameter (Nscale, etc.).
131
  - Auth: per-request bearer token resolved from the editor's OAuth cookie when available, falling back to the server-side `HF_TOKEN`. On a HF Space with `inference-api` scope, no extra secret is needed - the logged-in user pays for their own inference under their HF quota.
132
  - Streaming via Vercel AI SDK `streamText` over `@ai-sdk/openai-compatible`
133
  - Reasoning parts from prior assistant turns are stripped before re-sending the history: providers like Cerebras reject `reasoning_content` on round-trip, and the model doesn't need to see its own past reasoning to continue the conversation.
 
283
  | `OAUTH_SCOPES` | No (default `openid profile`) | OAuth scopes. Add `manage-repos` for dataset persistence and `inference-api` to power AI features with the user's token |
284
  | `HF_DATASET_ID` | No | Override dataset name (default: `{SPACE_ID}-data`) |
285
  | `HF_TOKEN` | For AI chat in local dev | Fallback Hub token for HF API + Inference Providers. Needs the "Make calls to Inference Providers" permission |
286
+ | `HF_INFERENCE_MODEL` | No (default `openai/gpt-oss-120b`) | Default chat-completion model id served by HF Inference Providers. May be suffixed with `:<provider>` to pin a specific routing |
287
  | `ENABLE_PDF` | No (default true) | Toggle PDF/thumbnail generation |
288
 
289
  ### 6.4 Local Development