AdithyaSK HF Staff Claude Opus 4.7 (1M context) commited on
Commit
a301de7
·
1 Parent(s): 3949fb1

Port Harbor Visualiser from Gradio to FastAPI + Hugging Face theme

Browse files

- FastAPI backend (Docker Space) replacing the Gradio app
- Hugging Face themed SPA: dark slate + yellow/orange, 🤗 logo
- Browse Harbor-tagged HF datasets live (other=harbor), no stale cache
- Large datasets list via shallow Hub tree listing; per-task lazy fetch
(2k-task datasets list in ~2s instead of bulk-downloading the repo)
- Task master-detail view: collapsible task side-panel, in-place switching
- Per-task copy-able `harbor run` command
- Deep-link + dataset-card badge; real Space URL via $SPACE_HOST

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (10) hide show
  1. Dockerfile +20 -0
  2. README.md +14 -14
  3. app.py +124 -547
  4. requirements.txt +2 -2
  5. static/app.js +515 -0
  6. static/index.html +35 -0
  7. static/style.css +285 -0
  8. viewer/__init__.py +2 -1
  9. viewer/hub.py +121 -0
  10. viewer/load.py +55 -4
Dockerfile ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Hugging Face Docker Space — FastAPI Harbor Visualiser.
2
+ FROM python:3.11-slim
3
+
4
+ # git: needed for gh:// dataset clones. (harbor CLI installs via pip for harbor://.)
5
+ RUN apt-get update && apt-get install -y --no-install-recommends git \
6
+ && rm -rf /var/lib/apt/lists/*
7
+
8
+ RUN useradd -m -u 1000 user
9
+ USER user
10
+ ENV PATH="/home/user/.local/bin:$PATH" \
11
+ HARBOR_VIEWER_CACHE=/tmp/.harbor-viewer-cache
12
+ WORKDIR /app
13
+
14
+ COPY --chown=user requirements.txt .
15
+ RUN pip install --no-cache-dir -r requirements.txt
16
+
17
+ COPY --chown=user . .
18
+
19
+ # HF routes public traffic to app_port (7860, set in README.md frontmatter).
20
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
README.md CHANGED
@@ -1,21 +1,20 @@
1
  ---
2
- title: Harbor Visualiser
3
- emoji: 🔭
4
- colorFrom: gray
5
- colorTo: gray
6
- sdk: gradio
7
- sdk_version: "6.14.0"
8
- app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
- short_description: Browse Harbor task specs from HF, GitHub, or local
12
  ---
13
 
14
- # Harbor Visualiser
15
 
16
- A tiny Gradio Space for browsing [Harbor](https://www.harborframework.com/) task spec directories — the dataset format used by Harbor for agent evaluation + RL environments.
17
 
18
- Drop in a Hugging Face dataset id, a GitHub repo, or a local Harbor dataset directory; the viewer renders every task's metadata, instruction, oracle patch, test script, and Dockerfile side-by-side.
19
 
20
  ## Use it
21
 
@@ -41,7 +40,7 @@ https://huggingface.co/spaces/AdithyaSK/harbor-visualiser?dataset=<owner>/<datas
41
 
42
  ```bash
43
  pip install -r requirements.txt
44
- python app.py
45
  # → http://127.0.0.1:7860
46
  ```
47
 
@@ -85,7 +84,8 @@ Either of these:
85
 
86
  ## Stack
87
 
88
- - [Gradio 5](https://www.gradio.app/) — UI
89
- - [huggingface_hub](https://github.com/huggingface/huggingface_hub) HF dataset download
 
90
  - `git` (system binary) — GitHub clone
91
  - Python stdlib `tomllib` — task.toml parsing
 
1
  ---
2
+ title: Hugging Face Harbor Visualiser
3
+ emoji: 🤗
4
+ colorFrom: yellow
5
+ colorTo: orange
6
+ sdk: docker
7
+ app_port: 7860
 
8
  pinned: false
9
  license: apache-2.0
10
+ short_description: Browse Harbor task specs from HF Hub, GitHub, or local
11
  ---
12
 
13
+ # 🤗 Hugging Face Harbor Visualiser
14
 
15
+ A FastAPI Space for browsing [Harbor](https://www.harborframework.com/) task spec directories — the dataset format used by Harbor for agent evaluation + RL environments.
16
 
17
+ Drop in a Hugging Face dataset id, a GitHub repo, or a local Harbor dataset directory; the viewer renders every task's metadata, instruction, oracle patch, test script, and Dockerfile side-by-side. Large datasets (2k+ tasks) list and open instantly — task ids come from a shallow Hub listing and only the opened task's files are fetched, so nothing is bulk-downloaded.
18
 
19
  ## Use it
20
 
 
40
 
41
  ```bash
42
  pip install -r requirements.txt
43
+ uvicorn app:app --port 7860
44
  # → http://127.0.0.1:7860
45
  ```
46
 
 
84
 
85
  ## Stack
86
 
87
+ - [FastAPI](https://fastapi.tiangolo.com/) + [uvicorn](https://www.uvicorn.org/) — server
88
+ - Vanilla-JS single-page UI (hash-routed) with a Hugging Face theme
89
+ - [huggingface_hub](https://github.com/huggingface/huggingface_hub) — Hub listing + per-task download
90
  - `git` (system binary) — GitHub clone
91
  - Python stdlib `tomllib` — task.toml parsing
app.py CHANGED
@@ -1,17 +1,20 @@
1
- """Harbor Visualiser — a Gradio Space for browsing Harbor task specs.
 
 
 
 
 
 
 
 
 
 
2
 
3
  Run locally:
4
  pip install -r requirements.txt
5
- python app.py
6
-
7
- Or deploy to a Hugging Face Space — the `README.md` frontmatter pins
8
- `sdk: gradio` and `app_file: app.py`, so the Space picks this up directly.
9
 
10
- URL prefill:
11
- https://<space>/?dataset=owner/name
12
- https://<space>/?dataset=harbor://org/name@tag
13
- https://<space>/?dataset=gh://owner/repo
14
- https://<space>/?d=owner/name (short alias)
15
  """
16
 
17
  from __future__ import annotations
@@ -19,567 +22,141 @@ from __future__ import annotations
19
  import logging
20
  from pathlib import Path
21
 
22
- import gradio as gr
 
 
23
 
24
- from viewer import (
25
- HarborTask,
26
- fetch_dataset,
27
- list_tasks,
28
- load_task,
29
- parse_dataset_uri,
30
- )
31
 
32
  logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(name)s %(message)s")
33
  logger = logging.getLogger("harbor-visualiser")
34
 
 
 
35
 
36
- # ---------------------------------------------------------------------------
37
- # File-tree definition + helpers
38
- # ---------------------------------------------------------------------------
39
-
40
- # Virtual entry id for the metadata overview (not a real file).
41
- _OVERVIEW = "__overview__"
42
-
43
- # Folder pseudo-ids — selecting one routes to its first present child.
44
- _FOLDER_PREFIX = "__folder__"
45
-
46
-
47
- # Map suffix → Gradio Code language token. Everything else falls back to "shell".
48
- _EXTENSION_LANGUAGE: dict[str, str] = {
49
- ".toml": "yaml", # Gradio Prism has no TOML; YAML is the closest fit
50
- ".diff": "python", # closest available; +/- lines render fine
51
- ".patch": "python",
52
- ".sh": "shell",
53
- ".bash": "shell",
54
- ".py": "python",
55
- ".json": "json",
56
- ".yaml": "yaml",
57
- ".yml": "yaml",
58
- ".md": "markdown",
59
- ".markdown": "markdown",
60
- ".txt": "shell",
61
- ".csv": "shell",
62
- ".tsv": "shell",
63
- ".ini": "yaml",
64
- ".cfg": "yaml",
65
- ".conf": "shell",
66
- ".html": "html",
67
- ".css": "css",
68
- ".js": "javascript",
69
- ".ts": "typescript",
70
- }
71
-
72
-
73
- def _file_language(filename: str) -> str:
74
- """Pick a Gradio Code `language=` token for a given filename."""
75
- if filename.endswith("Dockerfile"):
76
- return "dockerfile"
77
- suffix = Path(filename).suffix.lower()
78
- return _EXTENSION_LANGUAGE.get(suffix, "shell")
79
-
80
-
81
- def _read_task_file(task: HarborTask, file_id: str) -> str | None:
82
- """Fetch the content for a file_id; None when the file isn't present.
83
-
84
- task.toml and instruction.md have special handling (task.toml uses the
85
- pre-captured raw text; instruction.md falls back to the inline `task.instruction`
86
- field from task.toml when no instruction.md file is on disk). Everything else
87
- is a direct lookup against the dict populated by walking the task dir.
88
- """
89
- if file_id == "task.toml":
90
- return task.task_toml_raw or None
91
- if file_id == "instruction.md":
92
- return task.files.get("instruction.md") or task.instruction_inline
93
- return task.files.get(file_id)
94
-
95
-
96
- def _build_file_tree(task: HarborTask) -> tuple[list[tuple[str, str]], dict[str, str]]:
97
- """Render a file-explorer-style choice list for a task.
98
-
99
- Walks every file discovered under the task dir (via `task.files`) and groups
100
- them by their first path segment. Order: Overview → top-level files
101
- (task.toml, instruction.md, anything else) → folders alphabetically with
102
- their children alphabetically. No hardcoded allowlist — what's on disk is
103
- what gets shown.
104
-
105
- Returns:
106
- choices: list of (label, value) tuples for `gr.Radio`. Labels use unicode
107
- tree glyphs (📂, ├─, └─) so the radio reads like a file tree.
108
- folder_redirects: maps each folder pseudo-id to its first present child
109
- file_id so clicking a folder header opens its first file.
110
- """
111
- choices: list[tuple[str, str]] = [("ⓘ Overview", _OVERVIEW)]
112
- redirects: dict[str, str] = {}
113
-
114
- # Bucket every discovered file by top-level dir ("" = at task root)
115
- top_level: list[str] = []
116
- by_folder: dict[str, list[str]] = {}
117
- for path in sorted(task.files):
118
- if "/" in path:
119
- folder = path.split("/", 1)[0]
120
- by_folder.setdefault(folder, []).append(path)
121
- else:
122
- top_level.append(path)
123
-
124
- # Top-level files: task.toml first, then instruction.md (with inline
125
- # fallback), then anything else alphabetically. Order is presentational.
126
- if (task.task_toml_raw or "").strip():
127
- choices.append(("📄 task.toml", "task.toml"))
128
- if (task.files.get("instruction.md") or task.instruction_inline):
129
- choices.append(("📄 instruction.md", "instruction.md"))
130
- for path in sorted(top_level):
131
- if path in ("task.toml", "instruction.md"):
132
- continue # already added
133
- choices.append((f"📄 {path}", path))
134
-
135
- # Folders alphabetically (environment / solution / tests / ...) with
136
- # children alphabetical within each.
137
- for folder in sorted(by_folder):
138
- children = sorted(by_folder[folder])
139
- if not children:
140
- continue
141
- folder_id = f"{_FOLDER_PREFIX}{folder}"
142
- choices.append((f"📂 {folder}/", folder_id))
143
- redirects[folder_id] = children[0] # folder header → first child
144
- for i, full_id in enumerate(children):
145
- # Show the path *inside* the folder (handles nested subdirs too)
146
- basename = full_id[len(folder) + 1:] # strip "<folder>/"
147
- glyph = "└─" if i == len(children) - 1 else "├─"
148
- choices.append((f" {glyph} {basename}", full_id))
149
-
150
- return choices, redirects
151
 
152
 
153
  # ---------------------------------------------------------------------------
154
- # Backend handlers
155
  # ---------------------------------------------------------------------------
156
 
 
 
 
 
 
 
 
 
 
 
 
 
 
157
 
158
- def load_dataset_action(uri: str):
159
- """Top-level "Load" button handler.
 
 
160
 
161
- Outputs (in order):
162
- status_md, source_state, root_state, all_tasks_state, folder_redirects_state,
163
- task_search_value, task_radio_update,
164
- file_radio_update, markdown_update, code_update
165
- """
166
- if not uri or not uri.strip():
167
- return _empty_state("Enter a dataset URI to begin.")
168
 
 
 
 
 
 
 
169
  try:
170
  source = parse_dataset_uri(uri)
171
  except ValueError as exc:
172
- return _empty_state(f"❌ {exc}")
173
-
174
  try:
175
- root = fetch_dataset(source)
176
- except Exception as exc:
177
- logger.exception("fetch failed")
178
- return _empty_state(f"❌ fetch failed: {exc}")
179
-
180
- tasks = list_tasks(root)
181
- if not tasks:
182
- return _empty_state(
183
- f" No `task.toml` files found in `{source.display}`. "
184
- f"Looked under `{root}` for `<id>/task.toml` and `tasks/<id>/task.toml`."
185
- )
186
-
187
- first = tasks[0]
188
- task = load_task(root, first)
189
- file_choices, redirects = _build_file_tree(task)
190
- md_html, code_update = _render_file(task, _OVERVIEW)
191
- status = (
192
- f"✅ Loaded **{source.display}** — {len(tasks)} task"
193
- f"{'s' if len(tasks) != 1 else ''} found."
194
- )
195
- return (
196
- status,
197
- source.display,
198
- str(root),
199
- tasks,
200
- redirects,
201
- "", # clear search box
202
- gr.update(choices=tasks, value=first, label=f"Tasks ({len(tasks)})"),
203
- gr.update(choices=file_choices, value=_OVERVIEW, label="Files"),
204
- md_html,
205
- code_update,
206
- )
207
-
208
-
209
- def select_task_action(task_id: str, root: str):
210
- """Switch task → repopulate file tree, render the Overview."""
211
- if not task_id or not root:
212
- return (
213
- {},
214
- gr.update(choices=[], value=None, label="Files"),
215
- "Pick a task from the list.",
216
- gr.update(value="", visible=False),
217
- )
218
  try:
219
- task = load_task(Path(root), task_id)
220
- except Exception as exc:
221
- logger.exception("load_task failed")
222
- return (
223
- {},
224
- gr.update(choices=[], value=None, label="Files"),
225
- f"❌ {exc}",
226
- gr.update(value="", visible=False),
227
- )
228
- file_choices, redirects = _build_file_tree(task)
229
- md_html, code_update = _render_file(task, _OVERVIEW)
230
- return (
231
- redirects,
232
- gr.update(choices=file_choices, value=_OVERVIEW, label="Files"),
233
- md_html,
234
- code_update,
235
- )
236
-
237
-
238
- def select_file_action(file_id: str, root: str, task_id: str, folder_redirects: dict):
239
- """Switch file inside a task → render its content.
240
-
241
- Folder pseudo-ids are routed to their first child via `folder_redirects`.
242
- The file_tree radio's value is also updated so the user sees which file
243
- was opened.
244
- """
245
- if not file_id or not root or not task_id:
246
- return ("Pick a task first.", gr.update(value="", visible=False), gr.update())
247
-
248
- # Folder header click → redirect to first child + update radio selection
249
- redirect = (folder_redirects or {}).get(file_id)
250
- if redirect is not None:
251
- file_id = redirect
252
- radio_update = gr.update(value=file_id)
253
- else:
254
- radio_update = gr.update() # no-op for normal file clicks
255
-
256
  try:
257
- task = load_task(Path(root), task_id)
258
- except Exception as exc:
259
- return (f"❌ {exc}", gr.update(value="", visible=False), gr.update())
260
-
261
- md_html, code_update = _render_file(task, file_id)
262
- return (md_html, code_update, radio_update)
263
-
264
-
265
- def filter_tasks_action(query: str, all_tasks: list[str], current_root: str):
266
- """Search-box onchange re-derive the radio choices.
267
-
268
- Doesn't touch the file/content panels; selection is preserved when the
269
- currently-selected task still matches the filter.
270
- """
271
- if not all_tasks:
272
- return gr.update(choices=[], value=None)
273
- q = (query or "").strip().lower()
274
- if not q:
275
- return gr.update(choices=all_tasks, label=f"Tasks ({len(all_tasks)})")
276
- filtered = [t for t in all_tasks if q in t.lower()]
277
- label = f"Tasks ({len(filtered)} / {len(all_tasks)})"
278
- return gr.update(choices=filtered, label=label)
279
-
280
-
281
- def init_from_url(request: gr.Request):
282
- """Read `?dataset=` (or `?d=`) on page load and prefill the input."""
283
- if request is None:
284
- return ""
285
- qs = dict(request.query_params or {})
286
- return (qs.get("dataset") or qs.get("d") or "").strip()
287
-
288
-
289
- # ---------------------------------------------------------------------------
290
- # Rendering helpers
291
- # ---------------------------------------------------------------------------
292
-
293
-
294
- def _empty_state(status: str):
295
- """Reset the UI when no dataset is loaded."""
296
- return (
297
- status,
298
- "", # source_state
299
- "", # root_state
300
- [], # all_tasks_state
301
- {}, # folder_redirects_state
302
- "", # task search clear
303
- gr.update(choices=[], value=None, label="Tasks"),
304
- gr.update(choices=[], value=None, label="Files"),
305
- "Pick a task from the list once a dataset is loaded.",
306
- gr.update(value="", visible=False),
307
- )
308
-
309
-
310
- def _render_file(task: HarborTask, file_id: str):
311
- """Return (markdown_html, code_update) for the right-panel content area.
312
-
313
- Exactly ONE of the two panels is visible at a time:
314
- - Overview + .md files → markdown panel
315
- - everything else → code panel with language=auto
316
- """
317
- if file_id == _OVERVIEW:
318
- return (_overview_markdown(task), gr.update(value="", visible=False))
319
-
320
- content = _read_task_file(task, file_id)
321
- if content is None:
322
- return (
323
- f"_(no `{file_id}` in this task)_",
324
- gr.update(value="", visible=False),
325
- )
326
-
327
- if file_id.endswith(".md"):
328
- # Render Markdown for instruction.md (no code box)
329
- return (content, gr.update(value="", visible=False))
330
-
331
- lang = _file_language(file_id)
332
- return (
333
- "", # markdown empty
334
- gr.update(value=content, language=lang, visible=True, label=file_id),
335
- )
336
-
337
-
338
- def _overview_markdown(task: HarborTask) -> str:
339
- """Render the task's metadata as a clean markdown table."""
340
- rows: list[tuple[str, str]] = []
341
- rows.append(("task id", f"`{task.id}`"))
342
- if task.name:
343
- rows.append(("name", f"`{task.name}`"))
344
- if task.version:
345
- rows.append(("spec version", task.version))
346
- if task.description:
347
- rows.append(("description", task.description))
348
- if task.difficulty:
349
- rows.append(("difficulty", task.difficulty))
350
- if task.category:
351
- rows.append(("category", task.category))
352
- if task.keywords:
353
- rows.append(("keywords", ", ".join(f"`{k}`" for k in task.keywords)))
354
- if task.agent_timeout_sec is not None:
355
- rows.append(("agent timeout", f"{task.agent_timeout_sec}s"))
356
- if task.verifier_timeout_sec is not None:
357
- rows.append(("verifier timeout", f"{task.verifier_timeout_sec}s"))
358
-
359
- md = ["| Field | Value |", "|---|---|"]
360
- for k, v in rows:
361
- md.append(f"| **{k}** | {v} |")
362
-
363
- if task.repo2env:
364
- md.append("\n### `[metadata.repo2env]` extension (Repo2RLEnv)\n")
365
- md.append("| Field | Value |")
366
- md.append("|---|---|")
367
- for k, v in sorted(task.repo2env.items()):
368
- if isinstance(v, dict):
369
- md.append(f"| **{k}** | _(nested — see below)_ |")
370
- for kk, vv in sorted(v.items()):
371
- md.append(f"| &nbsp;&nbsp;`{kk}` | `{_short(vv)}` |")
372
- else:
373
- md.append(f"| **{k}** | `{_short(v)}` |")
374
-
375
- return "\n".join(md)
376
-
377
-
378
- def _short(v) -> str:
379
- """Truncate long values for the metadata table cells."""
380
- if isinstance(v, list):
381
- return ", ".join(str(x) for x in v)
382
- s = str(v)
383
- return s if len(s) < 110 else s[:107] + "…"
384
 
385
 
386
  # ---------------------------------------------------------------------------
387
- # UI
388
  # ---------------------------------------------------------------------------
389
 
390
-
391
- _INTRO_MD = """# Harbor Visualiser
392
-
393
- Browse [Harbor](https://www.harborframework.com/) task spec datasets — Hugging Face, GitHub, Harbor registry, or local."""
394
-
395
-
396
- _FOOTER_MD = """<sub>Built with [Gradio](https://www.gradio.app/) · Harbor framework [docs](https://www.harborframework.com/)</sub>"""
397
-
398
-
399
- # A small set of popular / known-working datasets surfaced as one-click examples.
400
- _EXAMPLES: list[tuple[str, str]] = [
401
- ("cookbook/test (Harbor)", "harbor://cookbook/test"),
402
- ("SWE-Atlas QnA (Harbor)", "harbor://scale-ai/swe-atlas-qna"),
403
- ("SWE-Bench Pro (Harbor)", "harbor://cais/swebenchpro"),
404
- ("Click PRs (HF / Repo2RLEnv)", "AdithyaSK/click-r2e-v082post1"),
405
- ("Click PRs (GitHub demo)", "https://github.com/adithya-s-k/harbor-tasks-demo"),
406
- ]
407
-
408
-
409
- # Minimal monochrome aesthetic — file-explorer feel for both task list and file tree.
410
- _CUSTOM_CSS = """
411
- .gradio-container { font-family: ui-sans-serif, system-ui, -apple-system, sans-serif; }
412
- h1, h2, h3 { font-weight: 600; }
413
-
414
- button.primary { background: #111 !important; color: white !important; border: 1px solid #111 !important; }
415
- button.primary:hover { background: #333 !important; }
416
-
417
- /* Task list — scrollable + monospace + ellipsis on long IDs */
418
- #task-list .wrap { max-height: 65vh; overflow-y: auto; padding-right: 4px; }
419
- #task-list label,
420
- #task-list label > span {
421
- font-family: ui-monospace, Menlo, Consolas, monospace;
422
- font-size: 11.5px;
423
- font-weight: 500;
424
- white-space: nowrap;
425
- overflow: hidden;
426
- text-overflow: ellipsis;
427
- display: block;
428
- max-width: 100%;
429
- }
430
-
431
- /* File tree — also monospace, slightly larger, preserve indent whitespace */
432
- #file-tree .wrap { max-height: 55vh; overflow-y: auto; }
433
- #file-tree label,
434
- #file-tree label > span {
435
- font-family: ui-monospace, Menlo, Consolas, monospace;
436
- font-size: 12.5px;
437
- white-space: pre;
438
- }
439
-
440
- #task-search input { font-family: ui-monospace, Menlo, Consolas, monospace; font-size: 12px; }
441
-
442
- footer { display: none !important; }
443
- """
444
-
445
-
446
- def build_ui() -> gr.Blocks:
447
- with gr.Blocks(title="Harbor Visualiser") as demo:
448
- gr.Markdown(_INTRO_MD)
449
-
450
- with gr.Row():
451
- uri_input = gr.Textbox(
452
- label="Dataset",
453
- placeholder="owner/name | gh://owner/repo | harbor://org/name | https://github.com/owner/repo",
454
- lines=1,
455
- scale=8,
456
- )
457
- load_btn = gr.Button("Load", variant="primary", scale=1, min_width=80)
458
-
459
- # Quick-access popular examples
460
- with gr.Row():
461
- example_btns: list[gr.Button] = []
462
- for label, _ in _EXAMPLES:
463
- example_btns.append(gr.Button(label, size="sm", variant="secondary"))
464
-
465
- status = gr.Markdown("Enter a dataset URI to begin.")
466
-
467
- # Hidden state for the dispatch handlers
468
- source_state = gr.State("")
469
- root_state = gr.State("")
470
- all_tasks_state = gr.State([]) # full unfiltered list for the search box
471
- folder_redirects_state = gr.State({}) # folder pseudo-id → first child file_id
472
-
473
- # ─── 3-column file-explorer layout ────────────────────────────────
474
- with gr.Row():
475
- # Column 1 — Tasks (scrollable + searchable)
476
- with gr.Column(scale=2, min_width=240):
477
- task_search = gr.Textbox(
478
- label="Filter tasks",
479
- placeholder="type to filter…",
480
- lines=1,
481
- elem_id="task-search",
482
- )
483
- task_list = gr.Radio(
484
- choices=[],
485
- label="Tasks",
486
- value=None,
487
- interactive=True,
488
- elem_id="task-list",
489
- )
490
- # Column 2 — File tree for the selected task
491
- with gr.Column(scale=2, min_width=200):
492
- file_tree = gr.Radio(
493
- choices=[],
494
- label="Files",
495
- value=None,
496
- interactive=True,
497
- elem_id="file-tree",
498
- )
499
- # Column 3 — content viewer (markdown OR code, mutually exclusive)
500
- with gr.Column(scale=6):
501
- content_md = gr.Markdown("Pick a task from the list once a dataset is loaded.")
502
- content_code = gr.Code(
503
- value="",
504
- language="shell",
505
- label="",
506
- interactive=False,
507
- visible=False,
508
- )
509
-
510
- gr.Markdown(_FOOTER_MD)
511
-
512
- # --- Event wiring ---------------------------------------------------
513
-
514
- load_outputs = [
515
- status,
516
- source_state,
517
- root_state,
518
- all_tasks_state,
519
- folder_redirects_state,
520
- task_search,
521
- task_list,
522
- file_tree,
523
- content_md,
524
- content_code,
525
- ]
526
-
527
- load_btn.click(
528
- fn=load_dataset_action,
529
- inputs=[uri_input],
530
- outputs=load_outputs,
531
- )
532
- uri_input.submit(
533
- fn=load_dataset_action,
534
- inputs=[uri_input],
535
- outputs=load_outputs,
536
- )
537
-
538
- task_list.change(
539
- fn=select_task_action,
540
- inputs=[task_list, root_state],
541
- outputs=[folder_redirects_state, file_tree, content_md, content_code],
542
- )
543
-
544
- file_tree.change(
545
- fn=select_file_action,
546
- inputs=[file_tree, root_state, task_list, folder_redirects_state],
547
- outputs=[content_md, content_code, file_tree],
548
- )
549
-
550
- task_search.change(
551
- fn=filter_tasks_action,
552
- inputs=[task_search, all_tasks_state, root_state],
553
- outputs=[task_list],
554
- )
555
-
556
- # Example buttons → set input + auto-load
557
- for btn, (_, uri_value) in zip(example_btns, _EXAMPLES, strict=True):
558
- btn.click(fn=lambda u=uri_value: u, outputs=uri_input).then(
559
- fn=load_dataset_action,
560
- inputs=[uri_input],
561
- outputs=load_outputs,
562
- )
563
-
564
- # On page load: read ?dataset= → prefill → auto-load if non-empty
565
- demo.load(fn=init_from_url, inputs=None, outputs=uri_input).then(
566
- fn=lambda u: load_dataset_action(u) if u else _empty_state("Enter a dataset URI to begin."),
567
- inputs=[uri_input],
568
- outputs=load_outputs,
569
- )
570
-
571
- return demo
572
 
573
 
574
- if __name__ == "__main__":
575
- theme = gr.themes.Monochrome(
576
- radius_size=gr.themes.sizes.radius_sm,
577
- spacing_size=gr.themes.sizes.spacing_md,
578
- text_size=gr.themes.sizes.text_md,
579
- )
580
- demo = build_ui()
581
- demo.queue(default_concurrency_limit=4).launch(
582
- server_name="0.0.0.0",
583
- theme=theme,
584
- css=_CUSTOM_CSS,
585
- )
 
1
+ """Harbor Visualiser — FastAPI backend + Harbor Hub UI.
2
+
3
+ Serves a single-page "Harbor Hub" themed UI (static/) plus a JSON API that
4
+ reuses the existing loader/parser:
5
+
6
+ GET / → the SPA (static/index.html)
7
+ GET /api/hub/datasets → live list of Harbor-tagged HF datasets
8
+ GET /api/hub/count?id= → task count for one Hub dataset (memoised)
9
+ GET /api/dataset?uri= → fetch a dataset, return its task ids + meta
10
+ GET /api/task?uri=&task= → one task's parsed spec (files + metadata)
11
+ GET /healthz
12
 
13
  Run locally:
14
  pip install -r requirements.txt
15
+ uvicorn app:app --reload --port 7860 # → http://127.0.0.1:7860
 
 
 
16
 
17
+ On a Hugging Face Docker Space it runs via the Dockerfile (uvicorn :7860).
 
 
 
 
18
  """
19
 
20
  from __future__ import annotations
 
22
  import logging
23
  from pathlib import Path
24
 
25
+ from fastapi import FastAPI, HTTPException, Query
26
+ from fastapi.responses import FileResponse, JSONResponse
27
+ from fastapi.staticfiles import StaticFiles
28
 
29
+ from viewer import fetch_dataset, fetch_hf_task, list_tasks, load_task, parse_dataset_uri
30
+ from viewer.hub import count_tasks, list_harbor_datasets, list_hf_tasks
 
 
 
 
 
31
 
32
  logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(name)s %(message)s")
33
  logger = logging.getLogger("harbor-visualiser")
34
 
35
+ HERE = Path(__file__).resolve().parent
36
+ STATIC = HERE / "static"
37
 
38
+ app = FastAPI(title="Harbor Visualiser", docs_url="/api/docs")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
 
41
  # ---------------------------------------------------------------------------
42
+ # API
43
  # ---------------------------------------------------------------------------
44
 
45
+ @app.get("/api/hub/datasets")
46
+ def api_hub_datasets(
47
+ q: str | None = Query(None, description="substring filter on dataset id"),
48
+ sort: str = Query("downloads"),
49
+ limit: int = Query(500, ge=1, le=2000),
50
+ ) -> JSONResponse:
51
+ """Live list of Harbor-tagged datasets on the HF Hub (no stale cache)."""
52
+ try:
53
+ ds = list_harbor_datasets(query=q, sort=sort, limit=limit)
54
+ except Exception as exc: # noqa: BLE001
55
+ raise HTTPException(502, f"HF Hub listing failed: {exc}") from exc
56
+ return JSONResponse({"datasets": [d.as_dict() for d in ds], "count": len(ds)})
57
+
58
 
59
+ @app.get("/api/hub/count")
60
+ def api_hub_count(id: str = Query(..., description="dataset id, e.g. owner/name")) -> JSONResponse:
61
+ """Task count for a single Hub dataset (one cheap list_repo_files call)."""
62
+ return JSONResponse({"id": id, "tasks": count_tasks(id)})
63
 
 
 
 
 
 
 
 
64
 
65
+ @app.get("/api/dataset")
66
+ def api_dataset(
67
+ uri: str = Query(..., description="owner/name | hf:// | gh:// | harbor:// | local path"),
68
+ refresh: int = Query(0, description="1 = force re-fetch (bypass cache)"),
69
+ ) -> JSONResponse:
70
+ """Fetch a dataset and return its task ids + source metadata."""
71
  try:
72
  source = parse_dataset_uri(uri)
73
  except ValueError as exc:
74
+ raise HTTPException(400, str(exc)) from exc
 
75
  try:
76
+ if source.kind == "hf":
77
+ # List task ids via the Hub API — no download. Critical for large
78
+ # datasets (2k+ tasks) which would otherwise snapshot the whole repo.
79
+ tasks = list_hf_tasks(source.ident, source.revision)
80
+ else:
81
+ root = fetch_dataset(source, force=bool(refresh))
82
+ tasks = list_tasks(root)
83
+ except Exception as exc: # noqa: BLE001
84
+ raise HTTPException(502, f"fetch failed: {exc}") from exc
85
+ return JSONResponse({
86
+ "uri": uri,
87
+ "display": source.display,
88
+ "kind": source.kind,
89
+ "ident": source.ident,
90
+ "revision": source.revision,
91
+ "tasks": tasks,
92
+ "count": len(tasks),
93
+ })
94
+
95
+
96
+ @app.get("/api/task")
97
+ def api_task(
98
+ uri: str = Query(...),
99
+ task: str = Query(..., description="task id (directory name)"),
100
+ refresh: int = Query(0),
101
+ ) -> JSONResponse:
102
+ """Return one task's full parsed spec — metadata + every file."""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
103
  try:
104
+ source = parse_dataset_uri(uri)
105
+ except ValueError as exc:
106
+ raise HTTPException(400, str(exc)) from exc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
  try:
108
+ if source.kind == "hf":
109
+ # Pull just this one task's files, not the entire dataset.
110
+ root = fetch_hf_task(source, task, force=bool(refresh))
111
+ else:
112
+ root = fetch_dataset(source, force=bool(refresh))
113
+ t = load_task(root, task)
114
+ except FileNotFoundError as exc:
115
+ raise HTTPException(404, str(exc)) from exc
116
+ except Exception as exc: # noqa: BLE001
117
+ raise HTTPException(502, f"load failed: {exc}") from exc
118
+ return JSONResponse({
119
+ "id": t.id,
120
+ "name": t.name,
121
+ "org": t.org,
122
+ "version": t.version,
123
+ "description": t.description,
124
+ "instruction_inline": t.instruction_inline,
125
+ "difficulty": t.difficulty,
126
+ "category": t.category,
127
+ "keywords": t.keywords,
128
+ "agent_timeout_sec": t.agent_timeout_sec,
129
+ "verifier_timeout_sec": t.verifier_timeout_sec,
130
+ "repo2env": t.repo2env,
131
+ "task_toml_raw": t.task_toml_raw,
132
+ "files": t.files,
133
+ })
134
+
135
+
136
+ @app.get("/api/config")
137
+ def api_config() -> JSONResponse:
138
+ """Runtime config for the UI. On a Hugging Face Space, $SPACE_HOST is the
139
+ public app host (e.g. owner-name.hf.space) — we surface it so the deep-link
140
+ / badge examples show the real Space URL instead of localhost."""
141
+ import os
142
+ return JSONResponse({
143
+ "space_host": os.environ.get("SPACE_HOST") or None,
144
+ "space_id": os.environ.get("SPACE_ID") or None,
145
+ })
146
+
147
+
148
+ @app.get("/healthz")
149
+ def healthz() -> dict:
150
+ return {"ok": True}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
151
 
152
 
153
  # ---------------------------------------------------------------------------
154
+ # UI (static SPA)
155
  # ---------------------------------------------------------------------------
156
 
157
+ @app.get("/")
158
+ def index() -> FileResponse:
159
+ return FileResponse(STATIC / "index.html", media_type="text/html")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
160
 
161
 
162
+ app.mount("/static", StaticFiles(directory=str(STATIC)), name="static")
 
 
 
 
 
 
 
 
 
 
 
requirements.txt CHANGED
@@ -1,4 +1,4 @@
1
- gradio>=6.0.0
 
2
  huggingface_hub>=0.27.0
3
  harbor>=0.6.0
4
-
 
1
+ fastapi>=0.115
2
+ uvicorn[standard]>=0.30
3
  huggingface_hub>=0.27.0
4
  harbor>=0.6.0
 
static/app.js ADDED
@@ -0,0 +1,515 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /* Harbor Hub — SPA frontend. Vanilla JS, hash-routed, talks to the FastAPI API. */
2
+ 'use strict';
3
+
4
+ const APP = document.getElementById('app');
5
+
6
+ /* ── tiny helpers ─────────────────────────────────── */
7
+ const esc = (s) => String(s == null ? '' : s).replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;').replace(/"/g, '&quot;');
8
+ const fmtNum = (n) => (n == null || n < 0) ? '—' : n.toLocaleString();
9
+ const enc = encodeURIComponent;
10
+ const qs = (o) => Object.entries(o).filter(([, v]) => v != null && v !== '').map(([k, v]) => `${k}=${enc(v)}`).join('&');
11
+
12
+ async function api(path) {
13
+ const r = await fetch(path);
14
+ if (!r.ok) {
15
+ let msg = `${r.status}`;
16
+ try { msg = (await r.json()).detail || msg; } catch {}
17
+ throw new Error(msg);
18
+ }
19
+ return r.json();
20
+ }
21
+
22
+ const ICON = {
23
+ copy: '<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><rect x="9" y="9" width="11" height="11" rx="2"/><path d="M5 15V5a2 2 0 0 1 2-2h10"/></svg>',
24
+ check: '<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.5"><path d="M20 6L9 17l-5-5"/></svg>',
25
+ search: '<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><circle cx="11" cy="11" r="7"/><path d="M21 21l-4-4"/></svg>',
26
+ file: '<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M14 2H6a2 2 0 0 0-2 2v16a2 2 0 0 0 2 2h12a2 2 0 0 0 2-2V8z"/><path d="M14 2v6h6"/></svg>',
27
+ dir: '<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M3 7a2 2 0 0 1 2-2h4l2 3h8a2 2 0 0 1 2 2v7a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2z"/></svg>',
28
+ info: '<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><circle cx="12" cy="12" r="9"/><path d="M12 16v-4M12 8h.01"/></svg>',
29
+ back: '<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M15 18l-6-6 6-6"/></svg>',
30
+ next: '<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M9 18l6-6-6-6"/></svg>',
31
+ term: '<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M4 17l6-6-6-6M12 19h8"/></svg>',
32
+ panel: '<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><rect x="3" y="4" width="18" height="16" rx="2"/><path d="M9 4v16"/></svg>',
33
+ };
34
+
35
+ function copyButton(text, cls = 'copy') {
36
+ const b = document.createElement('button');
37
+ b.className = cls; b.innerHTML = ICON.copy; b.title = 'Copy';
38
+ b.onclick = (e) => {
39
+ e.stopPropagation(); e.preventDefault();
40
+ navigator.clipboard.writeText(text).then(() => {
41
+ b.innerHTML = ICON.check; b.classList.add('copied');
42
+ setTimeout(() => { b.innerHTML = ICON.copy; b.classList.remove('copied'); }, 1100);
43
+ });
44
+ };
45
+ return b;
46
+ }
47
+
48
+ /* ── theme ────────────────────────────────────────── */
49
+ function applyTheme(mode) {
50
+ const sys = window.matchMedia('(prefers-color-scheme: dark)').matches ? 'dark' : 'light';
51
+ document.documentElement.setAttribute('data-theme', mode === 'system' ? sys : mode);
52
+ document.querySelectorAll('#theme-toggle button').forEach(b =>
53
+ b.classList.toggle('active', b.dataset.mode === mode));
54
+ }
55
+ (function initTheme() {
56
+ let mode = localStorage.getItem('hh-theme') || 'dark';
57
+ applyTheme(mode);
58
+ document.getElementById('theme-toggle').addEventListener('click', (e) => {
59
+ const b = e.target.closest('button'); if (!b) return;
60
+ mode = b.dataset.mode; localStorage.setItem('hh-theme', mode); applyTheme(mode);
61
+ });
62
+ window.matchMedia('(prefers-color-scheme: dark)').addEventListener('change', () => {
63
+ if ((localStorage.getItem('hh-theme') || 'dark') === 'system') applyTheme('system');
64
+ });
65
+ })();
66
+
67
+ /* ── data row with lazy task count ────────────────── */
68
+ function datasetRow(id, count) {
69
+ const row = document.createElement('div');
70
+ row.className = 'row';
71
+ row.onclick = () => { location.hash = `dataset?uri=${enc(id)}`; };
72
+ const name = document.createElement('span'); name.className = 'name'; name.textContent = id;
73
+ row.appendChild(name);
74
+ row.appendChild(copyButton(id));
75
+ const t = document.createElement('span'); t.className = 'tasks';
76
+ if (count == null) { t.innerHTML = '<span class="spin">···</span>'; t.dataset.lazy = id; }
77
+ else t.textContent = fmtNum(count);
78
+ row.appendChild(t);
79
+ return row;
80
+ }
81
+
82
+ // Fill in lazy counts for visible rows, throttled.
83
+ async function fillCounts(container) {
84
+ const pending = [...container.querySelectorAll('.tasks[data-lazy]')];
85
+ let i = 0;
86
+ const worker = async () => {
87
+ while (i < pending.length) {
88
+ const cell = pending[i++]; const id = cell.dataset.lazy; delete cell.dataset.lazy;
89
+ try { const r = await api(`/api/hub/count?id=${enc(id)}`); cell.textContent = fmtNum(r.tasks); }
90
+ catch { cell.textContent = '—'; }
91
+ }
92
+ };
93
+ await Promise.all([worker(), worker(), worker(), worker()]); // 4 in parallel
94
+ }
95
+
96
+ /* ── routes ───────────────────────────────────────── */
97
+ function setActiveNav(name) {
98
+ document.querySelectorAll('.nav .links a').forEach(a => a.classList.toggle('active', a.dataset.nav === name));
99
+ }
100
+
101
+ async function renderHome() {
102
+ setActiveNav('home');
103
+ // Resolve the public base URL: on a HF Space this is the real .hf.space host,
104
+ // so deep-link / badge examples don't show localhost.
105
+ let origin = location.origin;
106
+ try { const cfg = await api('/api/config'); if (cfg.space_host) origin = `https://${cfg.space_host}`; } catch {}
107
+ const badgeUrl = 'https://img.shields.io/badge/%F0%9F%A4%97%20Harbor%20Visualiser-View%20Tasks-ffd21e';
108
+ const deepLink = `${origin}/?dataset=YOUR_DATASET_ID`;
109
+ const badgeMd = `[![Open in Harbor Visualiser](${badgeUrl})](${deepLink})`;
110
+ APP.innerHTML = `
111
+ <div class="hero">
112
+ <div class="mark">🤗</div>
113
+ <h1><span class="hf">Hugging Face</span> Harbor Visualiser</h1>
114
+ <p>Visualise <a href="https://www.harborframework.com" target="_blank" rel="noopener" class="hl">Harbor&nbsp;↗</a> task-spec datasets <strong style="color:var(--text)">straight from the Hugging&nbsp;Face&nbsp;Hub</strong> — metadata, instructions, oracle patches, tests &amp; Dockerfiles. Also works with GitHub repos and local paths. No bulk download, always the latest.</p>
115
+ </div>
116
+ <div class="search" id="load-box">
117
+ ${ICON.search}
118
+ <input id="load-input" placeholder="Load any dataset — owner/name · hf:// · gh://owner/repo · harbor://org/name · /local/path" />
119
+ <span class="kbd">↵</span>
120
+ </div>
121
+ <div style="display:flex;align-items:center;justify-content:space-between;margin:26px 0 12px">
122
+ <h2 style="margin:0">Harbor datasets on the Hub</h2>
123
+ <span class="faint" id="hub-status">loading…</span>
124
+ </div>
125
+ <div class="card" id="hub-table">
126
+ <div class="thead"><span>Dataset</span><span class="col-tasks">Tasks</span></div>
127
+ <div class="loading"><span class="spinner"></span>fetching from huggingface.co/datasets?other=harbor</div>
128
+ </div>
129
+ <div class="center"><a class="btn" href="#/datasets">View all datasets →</a></div>
130
+
131
+ <div class="howto">
132
+ <h2>Link your dataset to the visualiser</h2>
133
+ <div class="steps">
134
+ <div class="step">
135
+ <h3>Deep-link any dataset</h3>
136
+ <p>Append <code>?dataset=&lt;owner&gt;/&lt;name&gt;</code> to open straight into a dataset's tasks — handy from a dataset card or docs.</p>
137
+ <div class="snippet"><code id="snip-link">${esc(origin)}/?dataset=&lt;owner&gt;/&lt;name&gt;</code><span id="copy-link"></span></div>
138
+ </div>
139
+ <div class="step">
140
+ <h3>Add a badge to your dataset card</h3>
141
+ <p>Paste this Markdown into your dataset README so a 🤗 badge always links here:</p>
142
+ <span class="badge-preview"><span class="l">🤗 Harbor Visualiser</span><span class="r">View Tasks</span></span>
143
+ <div class="snippet"><code id="snip-badge">${esc(badgeMd)}</code><span id="copy-badge"></span></div>
144
+ </div>
145
+ </div>
146
+ </div>
147
+
148
+ <div class="footer">
149
+ A read-only visualiser for <a href="https://www.harborframework.com" target="_blank" rel="noopener" class="hl">Harbor</a>
150
+ task-spec datasets — the format used by Harbor for agent evaluation &amp; RL environments.
151
+ Runs on Hugging Face Spaces · not affiliated with the Harbor project.
152
+ </div>
153
+ `;
154
+ document.getElementById('copy-link').appendChild(copyButton(`${origin}/?dataset=<owner>/<name>`));
155
+ document.getElementById('copy-badge').appendChild(copyButton(badgeMd));
156
+ const input = document.getElementById('load-input');
157
+ input.addEventListener('keydown', (e) => {
158
+ if (e.key === 'Enter' && input.value.trim()) location.hash = `dataset?uri=${enc(input.value.trim())}`;
159
+ });
160
+
161
+ try {
162
+ const { datasets } = await api('/api/hub/datasets?sort=downloads&limit=12');
163
+ const card = document.getElementById('hub-table');
164
+ card.innerHTML = '<div class="thead"><span>Dataset</span><span class="col-tasks">Tasks</span></div>';
165
+ datasets.slice(0, 8).forEach(d => card.appendChild(datasetRow(d.id, null)));
166
+ document.getElementById('hub-status').textContent = `${datasets.length}+ datasets`;
167
+ fillCounts(card);
168
+ } catch (e) {
169
+ document.getElementById('hub-table').innerHTML = `<div class="errbox">Couldn't reach the Hub: ${esc(e.message)}</div>`;
170
+ document.getElementById('hub-status').textContent = '';
171
+ }
172
+ }
173
+
174
+ let _hubCache = null;
175
+ async function renderDatasets(params) {
176
+ setActiveNav('datasets');
177
+ const sort = params.get('sort') || 'downloads';
178
+ APP.innerHTML = `
179
+ <div class="page">
180
+ <h1>Datasets</h1>
181
+ <p class="muted" style="margin:-8px 0 20px;font-size:13.5px">Search across every <strong style="color:var(--text)">Harbor-tagged dataset on the Hugging&nbsp;Face&nbsp;Hub</strong> — the live <code style="background:var(--panel-2);padding:1px 6px;border-radius:4px">other=harbor</code> filter.</p>
182
+ <div class="search">
183
+ ${ICON.search}
184
+ <input id="ds-search" placeholder="Search Harbor datasets on the Hub…" autofocus />
185
+ <select id="ds-sort">
186
+ <option value="downloads">Most downloads</option>
187
+ <option value="likes">Most likes</option>
188
+ <option value="lastModified">Recently updated</option>
189
+ </select>
190
+ <span class="kbd">⌘K</span>
191
+ </div>
192
+ <div class="card" id="ds-table"><div class="loading"><span class="spinner"></span>loading…</div></div>
193
+ <div class="hint" style="margin-top:18px">
194
+ <span class="ic">${ICON.info}</span>
195
+ <span><strong style="color:var(--text)">Want your dataset to show up here?</strong> Add the <code>harbor</code> tag to your dataset card's metadata (<code>tags: [harbor]</code> in the README front-matter) and it'll appear in this list automatically.</span>
196
+ </div>
197
+ </div>`;
198
+ const tbl = document.getElementById('ds-table');
199
+ const search = document.getElementById('ds-search');
200
+ const sortSel = document.getElementById('ds-sort'); sortSel.value = sort;
201
+
202
+ async function load() {
203
+ tbl.innerHTML = '<div class="loading"><span class="spinner"></span>loading…</div>';
204
+ try {
205
+ const { datasets } = await api(`/api/hub/datasets?${qs({ sort: sortSel.value, limit: 1000 })}`);
206
+ _hubCache = datasets; draw(datasets);
207
+ } catch (e) { tbl.innerHTML = `<div class="errbox">${esc(e.message)}</div>`; }
208
+ }
209
+ function draw(list) {
210
+ tbl.innerHTML = '<div class="thead"><span>Dataset</span><span class="col-tasks">Tasks</span></div>';
211
+ if (!list.length) { tbl.innerHTML += '<div class="empty">no matching datasets</div>'; return; }
212
+ list.slice(0, 300).forEach(d => tbl.appendChild(datasetRow(d.id, null)));
213
+ if (list.length > 300) tbl.innerHTML += `<div class="empty">showing 300 of ${list.length} — refine your search</div>`;
214
+ fillCounts(tbl);
215
+ }
216
+ let t;
217
+ search.addEventListener('input', () => {
218
+ clearTimeout(t);
219
+ t = setTimeout(() => {
220
+ const q = search.value.trim().toLowerCase();
221
+ draw(q ? _hubCache.filter(d => d.id.toLowerCase().includes(q)) : _hubCache);
222
+ }, 120);
223
+ });
224
+ sortSel.addEventListener('change', load);
225
+ await load();
226
+ }
227
+
228
+ async function renderDataset(params) {
229
+ setActiveNav(null);
230
+ const uri = params.get('uri');
231
+ APP.innerHTML = `
232
+ <div class="page">
233
+ <div class="crumb"><a href="#/datasets">Datasets</a><span class="sep">/</span><span>${esc(uri)}</span></div>
234
+ <div class="loading"><span class="spinner"></span>fetching <b>${esc(uri)}</b> …
235
+ <span class="sub">Loading the Harbor spec — this can take a few seconds to a minute for large datasets (the more tasks, the longer the listing).</span>
236
+ </div>
237
+ </div>`;
238
+ let data;
239
+ try { data = await api(`/api/dataset?uri=${enc(uri)}`); }
240
+ catch (e) { APP.querySelector('.page').innerHTML = `<div class="crumb"><a href="#/datasets">Datasets</a></div><div class="errbox">Failed to load <b>${esc(uri)}</b>: ${esc(e.message)}</div>`; return; }
241
+
242
+ const page = APP.querySelector('.page');
243
+ page.innerHTML = `
244
+ <div class="crumb"><a href="#/datasets">Datasets</a><span class="sep">/</span><span>${esc(data.display)}</span>
245
+ <span class="pill">${data.count} tasks</span>
246
+ <button class="btn" id="refresh" style="margin-left:auto;padding:5px 11px;font-size:12px">↻ refresh</button>
247
+ </div>
248
+ <div class="search"><span style="color:var(--faint)">${ICON.search}</span>
249
+ <input id="task-search" placeholder="Search ${data.count} tasks…" autofocus /></div>
250
+ <div class="card tasklist" id="tasks"></div>`;
251
+ const tasksCard = document.getElementById('tasks');
252
+ const tsearch = document.getElementById('task-search');
253
+ function draw(list) {
254
+ tasksCard.innerHTML = '<div class="thead"><span>Task</span></div>';
255
+ if (!list.length) { tasksCard.innerHTML += '<div class="empty">no matching tasks</div>'; return; }
256
+ list.slice(0, 500).forEach(tid => {
257
+ const row = document.createElement('div'); row.className = 'row';
258
+ row.onclick = () => { location.hash = `task?${qs({ uri, task: tid })}`; };
259
+ row.innerHTML = `<span class="name">${esc(tid)}</span>`;
260
+ row.appendChild(copyButton(tid));
261
+ tasksCard.appendChild(row);
262
+ });
263
+ if (list.length > 500) tasksCard.innerHTML += `<div class="empty">showing 500 of ${list.length} — refine your search</div>`;
264
+ }
265
+ draw(data.tasks);
266
+ let t;
267
+ tsearch.addEventListener('input', () => {
268
+ clearTimeout(t);
269
+ t = setTimeout(() => {
270
+ const q = tsearch.value.trim().toLowerCase();
271
+ draw(q ? data.tasks.filter(x => x.toLowerCase().includes(q)) : data.tasks);
272
+ }, 100);
273
+ });
274
+ document.getElementById('refresh').onclick = async () => {
275
+ page.querySelector('.crumb').insertAdjacentHTML('beforeend', ' <span class="faint">refreshing…</span>');
276
+ try { const fresh = await api(`/api/dataset?${qs({ uri, refresh: 1 })}`); data.tasks = fresh.tasks; draw(fresh.tasks); }
277
+ catch (e) { alert('refresh failed: ' + e.message); }
278
+ location.reload();
279
+ };
280
+ }
281
+
282
+ /* ── task viewer (file tree + content) ────────────── */
283
+ const LANG = { toml: 'ini', diff: 'diff', patch: 'diff', sh: 'bash', bash: 'bash', py: 'python', json: 'json', yaml: 'yaml', yml: 'yaml', md: 'markdown', js: 'javascript', ts: 'typescript', html: 'xml', css: 'css' };
284
+ function langFor(path) {
285
+ if (path.endsWith('Dockerfile')) return 'dockerfile';
286
+ const ext = path.split('.').pop().toLowerCase();
287
+ return LANG[ext] || 'plaintext';
288
+ }
289
+
290
+ function harborCmd(kind, ident, taskId) {
291
+ if (kind === 'gh') return `harbor run --task-git-url https://github.com/${ident}.git -i ${taskId} -a oracle`;
292
+ if (kind === 'local') return `harbor run -p ${ident} -i ${taskId} -a oracle`;
293
+ // hf: pull from the Hub, then run the single task with the oracle agent
294
+ const dir = ident.split('/').pop();
295
+ return `huggingface-cli download ${ident} --repo-type dataset --local-dir ${dir} && harbor run -p ${dir} -i ${taskId} -a oracle`;
296
+ }
297
+
298
+ let _taskSiblings = { uri: null, tasks: [], ident: null, kind: null };
299
+ async function renderTask(params) {
300
+ setActiveNav(null);
301
+ const uri = params.get('uri');
302
+ let task = params.get('task');
303
+ let initialFile = params.get('f');
304
+
305
+ APP.innerHTML = `<div class="page"><div class="loading"><span class="spinner"></span>loading task…
306
+ <span class="sub">Fetching this task's files from the Hub — usually a second or two.</span>
307
+ </div></div>`;
308
+
309
+ // Sibling task list (for the side panel) + canonical ident/kind (run command).
310
+ // Cached per-uri so flipping between tasks doesn't refetch the list.
311
+ if (_taskSiblings.uri !== uri) {
312
+ try {
313
+ const ds = await api(`/api/dataset?uri=${enc(uri)}`);
314
+ _taskSiblings = { uri, tasks: ds.tasks || [], ident: ds.ident, kind: ds.kind };
315
+ } catch { _taskSiblings = { uri, tasks: [], ident: uri, kind: 'hf' }; }
316
+ }
317
+ const siblings = _taskSiblings.tasks;
318
+ const ident = _taskSiblings.ident || uri;
319
+ const kind = _taskSiblings.kind || 'hf';
320
+
321
+ const page = APP.querySelector('.page');
322
+ const collapsed = localStorage.getItem('hh-tasks-collapsed') === '1';
323
+ page.innerHTML = `
324
+ <div class="crumb">
325
+ <button class="nav-btn ghost" id="toggle-tasks" title="Toggle task list">${ICON.panel}</button>
326
+ <a href="#dataset?uri=${enc(uri)}">${esc(ident)}</a>
327
+ <span class="sep">/</span><span id="crumb-task">${esc(task)}</span>
328
+ <span id="crumb-diff"></span>
329
+ <span class="pos" id="crumb-pos" style="margin-left:auto"></span>
330
+ </div>
331
+ <div class="runbar">
332
+ <span class="lbl">${ICON.term}</span>
333
+ <code id="run-cmd"></code>
334
+ <span id="run-copy"></span>
335
+ </div>
336
+ <div class="taskview${collapsed ? ' collapsed' : ''}" id="taskview">
337
+ <div class="tasks-panel" id="tasks-panel">
338
+ <div class="tp-head">Tasks <span class="faint">${siblings.length}</span></div>
339
+ <div class="tp-search">${ICON.search}<input id="tp-search" placeholder="Filter tasks…" /></div>
340
+ <div class="tp-list" id="tp-list"></div>
341
+ </div>
342
+ <div class="tree" id="tree"></div>
343
+ <div class="content" id="content"></div>
344
+ </div>`;
345
+
346
+ const taskview = document.getElementById('taskview');
347
+ const tpList = document.getElementById('tp-list');
348
+ const tree = document.getElementById('tree');
349
+ const content = document.getElementById('content');
350
+ const runbar = page.querySelector('.runbar');
351
+ const runCode = document.getElementById('run-cmd');
352
+ const runCopyHolder = document.getElementById('run-copy');
353
+
354
+ document.getElementById('toggle-tasks').onclick = () => {
355
+ taskview.classList.toggle('collapsed');
356
+ localStorage.setItem('hh-tasks-collapsed', taskview.classList.contains('collapsed') ? '1' : '0');
357
+ };
358
+
359
+ // ── tasks side panel ──
360
+ function drawPanel(filter = '') {
361
+ tpList.innerHTML = '';
362
+ const q = filter.trim().toLowerCase();
363
+ const list = q ? siblings.filter(s => s.toLowerCase().includes(q)) : siblings;
364
+ list.slice(0, 1000).forEach(tid => {
365
+ const r = document.createElement('div');
366
+ r.className = 'tp-item' + (tid === task ? ' active' : '');
367
+ r.textContent = tid; r.title = tid; r.dataset.tid = tid;
368
+ r.onclick = () => { if (tid !== task) loadDetail(tid, null); };
369
+ tpList.appendChild(r);
370
+ });
371
+ if (list.length > 1000) {
372
+ const m = document.createElement('div'); m.className = 'empty'; m.textContent = `showing 1000 of ${list.length} — filter to narrow`;
373
+ tpList.appendChild(m);
374
+ }
375
+ }
376
+ drawPanel();
377
+ const tps = document.getElementById('tp-search');
378
+ let ft;
379
+ tps.addEventListener('input', () => { clearTimeout(ft); ft = setTimeout(() => drawPanel(tps.value), 100); });
380
+
381
+ function syncPanelActive(tid) {
382
+ tpList.querySelectorAll('.tp-item').forEach(n => n.classList.toggle('active', n.dataset.tid === tid));
383
+ const a = tpList.querySelector('.tp-item.active'); if (a) a.scrollIntoView({ block: 'nearest' });
384
+ }
385
+
386
+ // ─�� load one task's detail into the tree + content (no full re-render) ──
387
+ async function loadDetail(tid, wantFile) {
388
+ task = tid;
389
+ syncPanelActive(tid);
390
+ document.getElementById('crumb-task').textContent = tid;
391
+ const i = siblings.indexOf(tid);
392
+ document.getElementById('crumb-pos').textContent = i >= 0 ? `${i + 1} / ${siblings.length}` : '';
393
+ history.replaceState(null, '', '#' + `task?${qs({ uri, task: tid })}`);
394
+
395
+ const cmd = harborCmd(kind, ident, tid);
396
+ runCode.textContent = cmd;
397
+ runCopyHolder.innerHTML = '';
398
+ const rc = copyButton(cmd);
399
+ rc.addEventListener('click', () => { runbar.classList.add('copied'); setTimeout(() => runbar.classList.remove('copied'), 1100); });
400
+ runCopyHolder.appendChild(rc);
401
+
402
+ tree.innerHTML = '';
403
+ content.innerHTML = `<div class="loading"><span class="spinner"></span>loading task…</div>`;
404
+ let t;
405
+ try { t = await api(`/api/task?${qs({ uri, task: tid })}`); }
406
+ catch (e) { content.innerHTML = `<div class="errbox">${esc(e.message)}</div>`; return; }
407
+ if (task !== tid) return; // a newer click superseded this fetch
408
+
409
+ document.getElementById('crumb-diff').innerHTML = t.difficulty ? `<span class="pill">${esc(t.difficulty)}</span>` : '';
410
+ buildDetail(t, wantFile);
411
+ }
412
+
413
+ function buildDetail(t, wantFile) {
414
+ const files = t.files || {};
415
+ const paths = Object.keys(files).sort();
416
+ tree.innerHTML = `<div class="thead2">${esc(t.id)}</div>`;
417
+
418
+ function node(label, indent, type, onClick, active) {
419
+ const n = document.createElement('div');
420
+ n.className = 'tnode' + (type === 'dir' ? ' dir' : '') + (active ? ' active' : '');
421
+ n.style.paddingLeft = (14 + indent * 16) + 'px';
422
+ n.innerHTML = (type === 'dir' ? ICON.dir : type === 'info' ? ICON.info : ICON.file) + `<span>${esc(label)}</span>`;
423
+ if (onClick) n.onclick = onClick;
424
+ return n;
425
+ }
426
+ const nodes = {};
427
+ function setHashFile(f) { return `task?${qs({ uri, task: t.id, f })}`; }
428
+ function select(id) {
429
+ Object.values(nodes).forEach(n => n.classList.remove('active'));
430
+ if (nodes[id]) nodes[id].classList.add('active');
431
+ if (id === '__overview__') showOverview(); else showFile(id);
432
+ }
433
+ const ov = node('Overview', 0, 'info', () => { history.replaceState(null, '', '#' + setHashFile('__overview__')); select('__overview__'); });
434
+ nodes['__overview__'] = ov; tree.appendChild(ov);
435
+ const groups = {}; const top = [];
436
+ paths.forEach(p => { if (p.includes('/')) { const f = p.split('/')[0]; (groups[f] = groups[f] || []).push(p); } else top.push(p); });
437
+ top.forEach(p => { const n = node(p, 0, 'file', () => { history.replaceState(null, '', '#' + setHashFile(p)); select(p); }); nodes[p] = n; tree.appendChild(n); });
438
+ Object.keys(groups).sort().forEach(folder => {
439
+ tree.appendChild(node(folder + '/', 0, 'dir'));
440
+ groups[folder].sort().forEach(p => { const n = node(p.split('/').slice(1).join('/'), 1, 'file', () => { history.replaceState(null, '', '#' + setHashFile(p)); select(p); }); nodes[p] = n; tree.appendChild(n); });
441
+ });
442
+
443
+ function showOverview() {
444
+ const rows = [];
445
+ const add = (k, v) => { if (v != null && v !== '' && !(Array.isArray(v) && !v.length)) rows.push([k, v]); };
446
+ add('Task id', t.id); add('Name', t.name); add('Org', t.org); add('Version', t.version);
447
+ add('Difficulty', t.difficulty); add('Category', t.category);
448
+ add('Agent timeout', t.agent_timeout_sec != null ? t.agent_timeout_sec + 's' : null);
449
+ add('Verifier timeout', t.verifier_timeout_sec != null ? t.verifier_timeout_sec + 's' : null);
450
+ let html = `<div class="fhead"><span class="path">${ICON.info} Overview</span></div>`;
451
+ if (t.description) html += `<div class="md">${marked.parse(t.description)}</div>`;
452
+ html += '<table class="kv">';
453
+ rows.forEach(([k, v]) => html += `<tr><td>${esc(k)}</td><td>${esc(v)}</td></tr>`);
454
+ if (t.keywords && t.keywords.length) html += `<tr><td>Keywords</td><td>${t.keywords.map(k => `<span class="kw">${esc(k)}</span>`).join('')}</td></tr>`;
455
+ if (t.repo2env) html += `<tr><td>repo2env</td><td><pre style="margin:0;padding:0;background:none">${esc(JSON.stringify(t.repo2env, null, 2))}</pre></td></tr>`;
456
+ html += '</table>';
457
+ const instr = files['instruction.md'] || t.instruction_inline;
458
+ if (instr) html += `<div class="fhead"><span class="path">${ICON.file} instruction.md</span></div><div class="md">${marked.parse(instr)}</div>`;
459
+ content.innerHTML = html;
460
+ }
461
+ function showFile(path) {
462
+ const body = files[path] != null ? files[path] : (path === 'task.toml' ? t.task_toml_raw : '');
463
+ const fhead = document.createElement('div'); fhead.className = 'fhead';
464
+ fhead.innerHTML = `<span class="path">${ICON.file} ${esc(path)}</span>`;
465
+ fhead.appendChild(copyButton(body));
466
+ content.innerHTML = '';
467
+ content.appendChild(fhead);
468
+ if (path.endsWith('.md')) {
469
+ const d = document.createElement('div'); d.className = 'md'; d.innerHTML = marked.parse(body); content.appendChild(d);
470
+ } else {
471
+ const pre = document.createElement('pre'); const code = document.createElement('code');
472
+ code.className = 'language-' + langFor(path); code.textContent = body;
473
+ pre.appendChild(code); content.appendChild(pre);
474
+ try { hljs.highlightElement(code); } catch {}
475
+ }
476
+ content.scrollTop = 0;
477
+ }
478
+
479
+ select(wantFile && (nodes[wantFile] || wantFile === '__overview__') ? wantFile : '__overview__');
480
+ }
481
+
482
+ await loadDetail(task, initialFile);
483
+ }
484
+
485
+ /* ── router ───────────────────────────────────────── */
486
+ function router() {
487
+ const raw = location.hash.slice(1) || '/';
488
+ const [route, query] = raw.split('?');
489
+ const params = new URLSearchParams(query || '');
490
+ window.scrollTo(0, 0);
491
+ if (route === '/' || route === '' || route === 'home') return renderHome();
492
+ if (route === '/datasets' || route === 'datasets') return renderDatasets(params);
493
+ if (route === 'dataset') return renderDataset(params);
494
+ if (route === 'task') return renderTask(params);
495
+ renderHome();
496
+ }
497
+
498
+ // ⌘K focuses search on datasets page (and jumps there otherwise)
499
+ document.addEventListener('keydown', (e) => {
500
+ if ((e.metaKey || e.ctrlKey) && e.key === 'k') {
501
+ e.preventDefault();
502
+ const s = document.getElementById('ds-search') || document.getElementById('load-input');
503
+ if (s) s.focus(); else location.hash = '/datasets';
504
+ }
505
+ });
506
+
507
+ // ?dataset= / ?d= prefill (legacy Gradio-style deep link) → dataset view
508
+ (function prefill() {
509
+ const p = new URLSearchParams(location.search);
510
+ const d = p.get('dataset') || p.get('d');
511
+ if (d && !location.hash) { location.hash = `dataset?uri=${enc(d)}`; }
512
+ })();
513
+
514
+ window.addEventListener('hashchange', router);
515
+ router();
static/index.html ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en" data-theme="dark">
3
+ <head>
4
+ <meta charset="utf-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1">
6
+ <title>Hugging Face Harbor Visualiser — browse Harbor task-spec datasets</title>
7
+ <link rel="icon" href="data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 100 100'><text y='.9em' font-size='90'>🤗</text></svg>">
8
+ <link rel="preconnect" href="https://fonts.googleapis.com">
9
+ <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
10
+ <link href="https://fonts.googleapis.com/css2?family=JetBrains+Mono:wght@400;500;600;700&display=swap" rel="stylesheet">
11
+ <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/styles/github-dark.min.css">
12
+ <link rel="stylesheet" href="/static/style.css">
13
+ </head>
14
+ <body>
15
+ <nav class="nav">
16
+ <a class="brand" href="#/"><span class="logo">🤗</span> Harbor&nbsp;Visualiser</a>
17
+ <div class="links">
18
+ <a href="#/" data-nav="home">Home</a>
19
+ <a href="#/datasets" data-nav="datasets">Datasets</a>
20
+ <a href="https://www.harborframework.com" target="_blank" rel="noopener" class="ext">Harbor&nbsp;↗</a>
21
+ </div>
22
+ <div class="spacer"></div>
23
+ <div class="theme-toggle" id="theme-toggle">
24
+ <button data-mode="light" title="Light"><svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><circle cx="12" cy="12" r="4"/><path d="M12 2v2M12 20v2M4.9 4.9l1.4 1.4M17.7 17.7l1.4 1.4M2 12h2M20 12h2M4.9 19.1l1.4-1.4M17.7 6.3l1.4-1.4"/></svg></button>
25
+ <button data-mode="system" title="System"><svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><rect x="2" y="3" width="20" height="14" rx="2"/><path d="M8 21h8M12 17v4"/></svg></button>
26
+ <button data-mode="dark" title="Dark"><svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M21 12.8A9 9 0 1 1 11.2 3a7 7 0 0 0 9.8 9.8z"/></svg></button>
27
+ </div>
28
+ </nav>
29
+ <main id="app" class="wrap"></main>
30
+
31
+ <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/highlight.min.js"></script>
32
+ <script src="https://cdnjs.cloudflare.com/ajax/libs/marked/12.0.0/marked.min.js"></script>
33
+ <script src="/static/app.js"></script>
34
+ </body>
35
+ </html>
static/style.css ADDED
@@ -0,0 +1,285 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /* Hugging Face Harbor Visualiser — Hugging Face themed dark/light. */
2
+ :root {
3
+ --bg: #ffffff; --panel: #f9fafb; --panel-2: #f1f3f5;
4
+ --border: #e5e7eb; --border-strong: #d4d7dd;
5
+ --text: #1b1b1f; --muted: #5b6270; --faint: #99a0ad;
6
+ --accent: #e88b00; --accent-soft: rgba(255,157,0,.12);
7
+ --hf-yellow: #ffd21e; --hf-orange: #ff9d00;
8
+ --ok: #16a34a; --warn: #d97706; --err: #dc2626;
9
+ --hover: #f1f3f5;
10
+ --mono: 'JetBrains Mono', ui-monospace, 'SF Mono', SFMono-Regular, Menlo, Consolas, monospace;
11
+ --radius: 10px; --nav-h: 56px; --maxw: 1180px;
12
+ }
13
+ :root[data-theme="dark"] {
14
+ --bg: #0b0d12; --panel: #11141b; --panel-2: #1a1e27;
15
+ --border: #232834; --border-strong: #323847;
16
+ --text: #f3f4f6; --muted: #9aa1ad; --faint: #646b78;
17
+ --accent: #ffae45; --accent-soft: rgba(255,174,69,.13);
18
+ --hf-yellow: #ffd21e; --hf-orange: #ff9d00;
19
+ --ok: #4ade80; --warn: #fbbf24; --err: #f87171;
20
+ --hover: #181c24;
21
+ }
22
+ * { box-sizing: border-box; }
23
+ html, body { margin: 0; padding: 0; overflow-x: hidden; max-width: 100%; }
24
+ body {
25
+ background: var(--bg); color: var(--text);
26
+ font-family: var(--mono); font-size: 14px; line-height: 1.55;
27
+ -webkit-font-smoothing: antialiased;
28
+ }
29
+ a { color: inherit; text-decoration: none; }
30
+ button { font-family: inherit; cursor: pointer; }
31
+ code, pre { font-family: var(--mono); }
32
+ ::selection { background: var(--accent-soft); }
33
+
34
+ /* ── nav ───────────────────────────────────────────── */
35
+ .nav {
36
+ position: sticky; top: 0; z-index: 50;
37
+ height: var(--nav-h); display: flex; align-items: center; gap: 22px;
38
+ padding: 0 22px; background: color-mix(in srgb, var(--bg) 86%, transparent);
39
+ backdrop-filter: blur(10px); border-bottom: 1px solid var(--border);
40
+ }
41
+ .nav .brand { display: flex; align-items: center; gap: 9px; font-weight: 700; font-size: 15px; letter-spacing: -.2px; }
42
+ .nav .brand:hover { color: var(--text); }
43
+ .nav .brand .logo { font-size: 20px; line-height: 1; filter: saturate(1.15); }
44
+ .nav .brand .tag { font-size: 9px; font-weight: 700; letter-spacing: .6px; text-transform: uppercase;
45
+ color: #1b1b1f; background: var(--hf-yellow); padding: 2px 6px; border-radius: 5px; }
46
+ .nav .links { display: flex; gap: 18px; }
47
+ .nav .links a { color: var(--muted); font-size: 13px; }
48
+ .nav .links a:hover, .nav .links a.active { color: var(--text); }
49
+ .nav .spacer { flex: 1; }
50
+ .theme-toggle { display: flex; border: 1px solid var(--border); border-radius: 8px; overflow: hidden; }
51
+ .theme-toggle button {
52
+ background: transparent; border: 0; color: var(--faint);
53
+ padding: 6px 9px; display: grid; place-items: center; line-height: 0;
54
+ }
55
+ .theme-toggle button:hover { color: var(--text); background: var(--hover); }
56
+ .theme-toggle button.active { color: var(--text); background: var(--panel-2); }
57
+ .theme-toggle svg { width: 15px; height: 15px; }
58
+
59
+ /* ── layout ────────────────────────────────────────── */
60
+ .wrap { max-width: var(--maxw); margin: 0 auto; padding: 0 22px; }
61
+ .page { padding: 34px 0 80px; }
62
+ h1 { font-size: 30px; font-weight: 700; letter-spacing: -.5px; margin: 0 0 18px; }
63
+ h2 { font-size: 18px; font-weight: 600; margin: 0 0 12px; }
64
+ .muted { color: var(--muted); }
65
+ .faint { color: var(--faint); }
66
+
67
+ /* thin Hugging Face accent strip at the very top */
68
+ body::before { content: ""; display: block; height: 3px;
69
+ background: linear-gradient(90deg, var(--hf-yellow), var(--hf-orange)); }
70
+
71
+ /* hero */
72
+ .hero { text-align: center; padding: 60px 0 30px; }
73
+ .hero .mark { font-size: 56px; line-height: 1; margin-bottom: 10px; }
74
+ .hero h1 { font-size: 46px; margin: 0 0 14px; letter-spacing: -1.4px; }
75
+ .hero h1 .hf { background: linear-gradient(90deg, var(--hf-orange), var(--hf-yellow));
76
+ -webkit-background-clip: text; background-clip: text; -webkit-text-fill-color: transparent; }
77
+ .hero p { color: var(--muted); font-size: 14.5px; margin: 0 auto; max-width: 640px; line-height: 1.6; }
78
+
79
+ /* how-to / embed instructions */
80
+ .howto { margin: 46px 0 0; }
81
+ .howto h2 { font-size: 15px; margin: 0 0 14px; }
82
+ .howto .steps { display: grid; gap: 14px; grid-template-columns: 1fr 1fr; }
83
+ .howto .step { min-width: 0; border: 1px solid var(--border); border-radius: var(--radius); background: var(--panel); padding: 16px 18px; }
84
+ .howto .step h3 { font-size: 13px; margin: 0 0 4px; font-weight: 600; }
85
+ .howto .step p { color: var(--muted); font-size: 12.5px; margin: 0 0 12px; line-height: 1.55; }
86
+ .snippet { display: flex; align-items: center; gap: 10px; background: var(--panel-2);
87
+ border: 1px solid var(--border); border-radius: 8px; padding: 9px 12px; max-width: 100%; overflow: hidden; }
88
+ .snippet code { font-size: 12px; color: var(--text); overflow-x: auto; white-space: nowrap; flex: 1; min-width: 0; }
89
+ .snippet code::-webkit-scrollbar { height: 5px; }
90
+ .snippet code::-webkit-scrollbar-thumb { background: var(--border-strong); border-radius: 3px; }
91
+ .snippet .copy { color: var(--faint); line-height: 0; flex: none; }
92
+ .snippet .copy:hover { color: var(--accent); }
93
+ .snippet .copy svg { width: 14px; height: 14px; }
94
+ .howto .badge-preview { display: inline-flex; align-items: stretch; font-size: 11px; font-weight: 700;
95
+ border-radius: 5px; overflow: hidden; margin-bottom: 11px; }
96
+ .howto .badge-preview .l { background: #555; color: #fff; padding: 3px 8px; }
97
+ .howto .badge-preview .r { background: var(--hf-yellow); color: #1b1b1f; padding: 3px 8px; }
98
+ @media (max-width: 700px) { .howto .steps { grid-template-columns: 1fr; } }
99
+ /* accent link + nav external link */
100
+ .hl { color: var(--accent); }
101
+ .hl:hover { text-decoration: underline; }
102
+ .nav .links a.ext { color: var(--faint); }
103
+ .nav .links a.ext:hover { color: var(--accent); }
104
+ /* footer attribution */
105
+ .footer { max-width: 620px; margin: 56px auto 0; text-align: center; color: var(--faint);
106
+ font-size: 12px; line-height: 1.7; border-top: 1px solid var(--border); padding-top: 22px; }
107
+
108
+ /* ── table card ────────────────────────────────────── */
109
+ .card { border: 1px solid var(--border); border-radius: var(--radius); overflow: hidden; background: var(--panel); }
110
+ .thead { display: flex; padding: 11px 18px; border-bottom: 1px solid var(--border);
111
+ font-size: 11px; letter-spacing: .8px; text-transform: uppercase; color: var(--faint); }
112
+ .thead .col-tasks { margin-left: auto; }
113
+ .row {
114
+ display: flex; align-items: center; gap: 8px; padding: 13px 18px;
115
+ border-bottom: 1px solid var(--border); cursor: pointer; transition: background .08s;
116
+ }
117
+ .row:last-child { border-bottom: 0; }
118
+ .row:hover { background: var(--hover); }
119
+ .row .name { font-size: 13.5px; white-space: nowrap; overflow: hidden; text-overflow: ellipsis; }
120
+ .row .copy { color: var(--faint); opacity: 0; transition: opacity .1s; line-height: 0; }
121
+ .row:hover .copy { opacity: 1; }
122
+ .row .copy:hover { color: var(--text); }
123
+ .row .copy svg { width: 13px; height: 13px; }
124
+ .row .tasks { margin-left: auto; color: var(--muted); font-variant-numeric: tabular-nums; font-size: 13px; padding-left: 14px; }
125
+ .row .tasks .spin { color: var(--faint); }
126
+
127
+ /* search */
128
+ .search { display: flex; align-items: center; gap: 10px; border: 1px solid var(--border);
129
+ border-radius: var(--radius); padding: 11px 16px; background: var(--panel); margin-bottom: 18px; }
130
+ .search:focus-within { border-color: var(--border-strong); }
131
+ .search svg { width: 16px; height: 16px; color: var(--faint); flex: none; }
132
+ .search input { flex: 1; background: transparent; border: 0; outline: 0; color: var(--text); font-family: var(--mono); font-size: 14px; }
133
+ .search input::placeholder { color: var(--faint); }
134
+ .search .kbd { color: var(--faint); font-size: 11px; border: 1px solid var(--border); border-radius: 5px; padding: 2px 6px; }
135
+ .search select { background: var(--panel-2); color: var(--muted); border: 1px solid var(--border); border-radius: 6px; padding: 5px 8px; font-family: var(--mono); font-size: 12px; }
136
+
137
+ /* buttons */
138
+ .btn { display: inline-flex; align-items: center; gap: 7px; background: var(--panel-2);
139
+ border: 1px solid var(--border); border-radius: 8px; color: var(--text);
140
+ padding: 9px 16px; font-size: 13px; transition: background .1s, border-color .1s; }
141
+ .btn:hover { background: var(--hover); border-color: var(--border-strong); }
142
+ .center { text-align: center; margin-top: 26px; }
143
+
144
+ /* pills / badges */
145
+ .pill { display: inline-flex; align-items: center; gap: 5px; font-size: 11px; padding: 2px 8px;
146
+ border-radius: 999px; border: 1px solid var(--border); color: var(--muted); background: var(--panel-2); }
147
+ .pill.ok { color: var(--ok); border-color: color-mix(in srgb, var(--ok) 35%, var(--border)); }
148
+
149
+ /* breadcrumb */
150
+ .crumb { display: flex; align-items: center; gap: 8px; color: var(--muted); font-size: 13px; margin-bottom: 16px; flex-wrap: wrap; }
151
+ .crumb a:hover { color: var(--text); }
152
+ .crumb .sep { color: var(--faint); }
153
+
154
+ /* ── dataset / task viewer (split) ─────────────────── */
155
+ .viewer { display: grid; grid-template-columns: 300px 1fr; gap: 0; border: 1px solid var(--border);
156
+ border-radius: var(--radius); overflow: hidden; min-height: 70vh; }
157
+ .tree { border-right: 1px solid var(--border); background: var(--panel); overflow: auto; max-height: 80vh; }
158
+ .tree .thead2 { padding: 11px 16px; font-size: 11px; letter-spacing: .8px; text-transform: uppercase;
159
+ color: var(--faint); border-bottom: 1px solid var(--border); position: sticky; top: 0; background: var(--panel); }
160
+ .tnode { display: flex; align-items: center; gap: 7px; padding: 6px 14px; cursor: pointer; font-size: 13px;
161
+ color: var(--muted); white-space: nowrap; overflow: hidden; text-overflow: ellipsis; }
162
+ .tnode:hover { background: var(--hover); color: var(--text); }
163
+ .tnode.active { background: var(--accent-soft); color: var(--text); }
164
+ .tnode.dir { color: var(--text); }
165
+ .tnode .ind { display: inline-block; }
166
+ .tnode svg { width: 14px; height: 14px; flex: none; color: var(--faint); }
167
+ .tnode.active svg { color: var(--accent); }
168
+
169
+ /* task master-detail: [tasks panel | file tree | content] */
170
+ .taskview { display: grid; grid-template-columns: 250px 230px 1fr; border: 1px solid var(--border);
171
+ border-radius: var(--radius); overflow: hidden; height: calc(100vh - 200px); min-height: 460px;
172
+ transition: grid-template-columns .18s ease; }
173
+ .taskview.collapsed { grid-template-columns: 0 230px 1fr; }
174
+ .taskview.collapsed .tasks-panel { opacity: 0; pointer-events: none; }
175
+ .tasks-panel { display: flex; flex-direction: column; min-width: 0; border-right: 1px solid var(--border);
176
+ background: var(--panel-2); overflow: hidden; transition: opacity .15s ease; }
177
+ .tasks-panel .tp-head { padding: 11px 14px; font-size: 11px; letter-spacing: .8px; text-transform: uppercase;
178
+ color: var(--faint); border-bottom: 1px solid var(--border); flex: none; }
179
+ .tasks-panel .tp-search { display: flex; align-items: center; gap: 7px; padding: 8px 12px; border-bottom: 1px solid var(--border); flex: none; }
180
+ .tasks-panel .tp-search svg { width: 14px; height: 14px; color: var(--faint); flex: none; }
181
+ .tasks-panel .tp-search input { flex: 1; min-width: 0; background: transparent; border: 0; outline: 0; color: var(--text); font-family: var(--mono); font-size: 12.5px; }
182
+ .tasks-panel .tp-search input::placeholder { color: var(--faint); }
183
+ .tp-list { overflow: auto; flex: 1; }
184
+ .tp-item { padding: 8px 14px; font-size: 12.5px; color: var(--muted); cursor: pointer;
185
+ white-space: nowrap; overflow: hidden; text-overflow: ellipsis; border-left: 2px solid transparent; }
186
+ .tp-item:hover { background: var(--hover); color: var(--text); }
187
+ .tp-item.active { background: var(--accent-soft); color: var(--text); border-left-color: var(--hf-orange); }
188
+ .tp-list .empty { padding: 14px; font-size: 11.5px; color: var(--faint); }
189
+
190
+ .content { overflow: auto; max-height: 80vh; background: var(--bg); }
191
+ .taskview .tree, .taskview .content { max-height: none; }
192
+ .content .fhead { display: flex; align-items: center; gap: 10px; padding: 10px 16px;
193
+ border-bottom: 1px solid var(--border); position: sticky; top: 0; background: var(--bg); z-index: 2; }
194
+ .content .fhead .path { display: inline-flex; align-items: center; gap: 7px; font-size: 13px; color: var(--muted); min-width: 0; }
195
+ .content .fhead svg { width: 15px; height: 15px; flex: none; color: var(--faint); }
196
+ .content .fhead .copy { margin-left: auto; }
197
+ .content pre { margin: 0; padding: 16px; overflow: auto; font-size: 12.5px; line-height: 1.6; }
198
+ .content pre code { background: transparent !important; padding: 0 !important; }
199
+ .content .md { padding: 18px 22px; }
200
+ .content .md h1 { font-size: 22px; } .content .md h2 { font-size: 17px; } .content .md pre { background: var(--panel-2); border-radius: 8px; }
201
+ .content .md code { background: var(--panel-2); padding: 1px 5px; border-radius: 4px; font-size: 12.5px; }
202
+
203
+ /* overview */
204
+ .kv { width: 100%; border-collapse: collapse; }
205
+ .kv td { padding: 9px 16px; border-bottom: 1px solid var(--border); vertical-align: top; font-size: 13px; }
206
+ .kv td:first-child { color: var(--muted); width: 180px; white-space: nowrap; }
207
+ .kw { display: inline-block; font-size: 11px; padding: 2px 8px; border: 1px solid var(--border); border-radius: 999px; margin: 0 4px 4px 0; color: var(--muted); }
208
+
209
+ /* task list (within dataset) — scrolls internally, not the whole page */
210
+ .tasklist { max-height: calc(100vh - 240px); overflow-y: auto; }
211
+ .tasklist .thead { position: sticky; top: 0; background: var(--panel); z-index: 1; }
212
+ .tasklist .row .name { font-size: 13px; }
213
+ .tasklist .row .tasks { font-size: 12px; }
214
+
215
+ /* states */
216
+ .loading, .empty, .errbox { padding: 50px 20px; text-align: center; color: var(--muted); }
217
+ .errbox { color: var(--err); }
218
+ .spinner { display: inline-block; width: 16px; height: 16px; border: 2px solid var(--border); border-top-color: var(--accent);
219
+ border-radius: 50%; animation: spin .7s linear infinite; vertical-align: -3px; margin-right: 8px; }
220
+ @keyframes spin { to { transform: rotate(360deg); } }
221
+ .copied { color: var(--ok) !important; }
222
+
223
+ /* code section (publish-style) */
224
+ .codeblock { border: 1px solid var(--border); border-radius: var(--radius); background: var(--panel); margin: 14px 0; overflow: hidden; }
225
+ .codeblock .chead { display: flex; padding: 9px 14px; border-bottom: 1px solid var(--border); font-size: 11px; letter-spacing: .6px; text-transform: uppercase; color: var(--faint); }
226
+ .codeblock .chead .copy { margin-left: auto; }
227
+ .codeblock pre { margin: 0; padding: 14px; font-size: 12.5px; overflow: auto; }
228
+
229
+ /* run-this-task command — styled like a terminal block */
230
+ .runbar { display: flex; align-items: center; gap: 10px; margin: 0 0 14px;
231
+ background: #0d1117; border: 1px solid var(--border-strong); border-radius: var(--radius);
232
+ padding: 11px 14px; box-shadow: inset 0 0 0 1px rgba(255,255,255,.02); }
233
+ :root[data-theme="light"] .runbar { background: #1b1f27; }
234
+ .runbar .lbl { display: inline-flex; align-items: center; flex: none; color: #4ade80; line-height: 0; }
235
+ .runbar .lbl svg { width: 15px; height: 15px; }
236
+ .runbar code { font-family: var(--mono); font-size: 12.5px; color: #d7dce4; white-space: nowrap;
237
+ overflow-x: auto; flex: 1; min-width: 0; }
238
+ .runbar code::before { content: "$ "; color: #6b7280; }
239
+ .runbar code::-webkit-scrollbar { height: 5px; }
240
+ .runbar code::-webkit-scrollbar-thumb { background: #2b313b; border-radius: 3px; }
241
+ .runbar .copy { flex: none; color: #6b7280; line-height: 0; }
242
+ .runbar .copy:hover { color: var(--hf-yellow); }
243
+ .runbar .copy svg { width: 15px; height: 15px; }
244
+ .runbar.copied .copy { color: #4ade80; }
245
+
246
+ /* generic icon button (crumb toggle etc.) */
247
+ .nav-btn { display: grid; place-items: center; width: 30px; height: 30px; flex: none;
248
+ border: 1px solid var(--border); border-radius: 7px; background: var(--panel-2); color: var(--muted); }
249
+ .nav-btn:hover:not(:disabled) { color: var(--text); border-color: var(--border-strong); }
250
+ .nav-btn:disabled { opacity: .35; cursor: default; }
251
+ .nav-btn svg { width: 15px; height: 15px; }
252
+ .nav-btn.ghost { background: transparent; }
253
+ .crumb .pos { color: var(--faint); font-size: 11px; font-variant-numeric: tabular-nums; flex: none; }
254
+
255
+ /* inline hint / note (loading warnings, tag instruction) */
256
+ .hint { display: flex; align-items: flex-start; gap: 9px; color: var(--muted); font-size: 12.5px;
257
+ line-height: 1.55; background: var(--panel); border: 1px solid var(--border);
258
+ border-radius: var(--radius); padding: 13px 15px; }
259
+ .hint .ic { color: var(--hf-orange); flex: none; line-height: 0; margin-top: 1px; }
260
+ .hint code { background: var(--panel-2); padding: 1px 6px; border-radius: 4px; color: var(--text); font-size: 12px; }
261
+ .loading .sub { display: block; margin-top: 8px; font-size: 12px; color: var(--faint); }
262
+
263
+ /* responsive */
264
+ @media (max-width: 760px) {
265
+ .nav { gap: 12px; padding: 0 14px; }
266
+ .nav .brand { font-size: 13px; }
267
+ .wrap { padding: 0 14px; }
268
+ .hero h1 { font-size: 32px; }
269
+ .hero .mark { font-size: 46px; }
270
+ .viewer { grid-template-columns: 1fr; }
271
+ .tree { max-height: 200px; border-right: 0; border-bottom: 1px solid var(--border); }
272
+ .content { max-height: none; }
273
+ .nav .links { gap: 12px; }
274
+ .nav .links a:not(.ext) { display: none; }
275
+ .crumb { gap: 6px; }
276
+ .kv td:first-child { width: auto; }
277
+
278
+ /* task view stacks: tasks panel (toggleable) over tree over content */
279
+ .taskview { grid-template-columns: 1fr; height: auto; }
280
+ .taskview.collapsed { grid-template-columns: 1fr; }
281
+ .tasks-panel { max-height: 220px; border-right: 0; border-bottom: 1px solid var(--border); }
282
+ .taskview.collapsed .tasks-panel { display: none; }
283
+ .taskview .tree { max-height: 200px; border-right: 0; border-bottom: 1px solid var(--border); }
284
+ .taskview .content { max-height: 70vh; }
285
+ }
viewer/__init__.py CHANGED
@@ -1,12 +1,13 @@
1
  """Harbor Visualiser — load + parse Harbor task spec datasets."""
2
 
3
- from viewer.load import DatasetSource, fetch_dataset, parse_dataset_uri
4
  from viewer.parse import HarborTask, list_tasks, load_task
5
 
6
  __all__ = [
7
  "DatasetSource",
8
  "HarborTask",
9
  "fetch_dataset",
 
10
  "list_tasks",
11
  "load_task",
12
  "parse_dataset_uri",
 
1
  """Harbor Visualiser — load + parse Harbor task spec datasets."""
2
 
3
+ from viewer.load import DatasetSource, fetch_dataset, fetch_hf_task, parse_dataset_uri
4
  from viewer.parse import HarborTask, list_tasks, load_task
5
 
6
  __all__ = [
7
  "DatasetSource",
8
  "HarborTask",
9
  "fetch_dataset",
10
+ "fetch_hf_task",
11
  "list_tasks",
12
  "load_task",
13
  "parse_dataset_uri",
viewer/hub.py ADDED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Discover Harbor task-spec datasets on the Hugging Face Hub.
2
+
3
+ Harbor datasets are tagged `harbor` on the Hub — the same filter as
4
+ https://huggingface.co/datasets?other=harbor . This module lists them (fast,
5
+ no per-dataset round-trips) and computes per-dataset task counts on demand
6
+ (one cheap `list_repo_files` call, memoised).
7
+
8
+ All listing is done live against the Hub so the UI always reflects the latest
9
+ published datasets (no stale snapshot).
10
+ """
11
+
12
+ from __future__ import annotations
13
+
14
+ import logging
15
+ import os
16
+ import time
17
+ from dataclasses import dataclass
18
+
19
+ logger = logging.getLogger(__name__)
20
+
21
+ _HARBOR_TAG = "harbor"
22
+
23
+
24
+ @dataclass(slots=True)
25
+ class HubDataset:
26
+ id: str
27
+ downloads: int = 0
28
+ likes: int = 0
29
+ updated: str | None = None
30
+ private: bool = False
31
+
32
+ def as_dict(self) -> dict:
33
+ return {
34
+ "id": self.id,
35
+ "downloads": self.downloads,
36
+ "likes": self.likes,
37
+ "updated": self.updated,
38
+ "private": self.private,
39
+ }
40
+
41
+
42
+ def _token() -> str | None:
43
+ return os.environ.get("HF_TOKEN") or None
44
+
45
+
46
+ def list_harbor_datasets(query: str | None = None, sort: str = "downloads",
47
+ limit: int = 500) -> list[HubDataset]:
48
+ """List datasets tagged `harbor` on the Hub. Always live (no caching).
49
+
50
+ `sort` ∈ {downloads, likes, lastModified, trending}. `query` filters by
51
+ substring on the dataset id (server-side search)."""
52
+ from huggingface_hub import HfApi
53
+
54
+ api = HfApi(token=_token())
55
+ # `filter=` matches the `other:harbor` tag used by the Hub UI.
56
+ kwargs: dict = {"filter": _HARBOR_TAG, "limit": limit}
57
+ if sort in ("downloads", "likes", "lastModified", "trendingScore"):
58
+ kwargs["sort"] = sort
59
+ if query:
60
+ kwargs["search"] = query
61
+ out: list[HubDataset] = []
62
+ for d in api.list_datasets(**kwargs):
63
+ lm = getattr(d, "last_modified", None)
64
+ out.append(HubDataset(
65
+ id=d.id,
66
+ downloads=int(getattr(d, "downloads", 0) or 0),
67
+ likes=int(getattr(d, "likes", 0) or 0),
68
+ updated=lm.isoformat() if lm else None,
69
+ private=bool(getattr(d, "private", False)),
70
+ ))
71
+ return out
72
+
73
+
74
+ # task-id memo: {(id, rev): (ids, ts)} — derived from a shallow tree listing,
75
+ # never a download. Short TTL so freshly-pushed tasks still surface.
76
+ _TASKS_CACHE: dict[tuple[str, str], tuple[list[str], float]] = {}
77
+ _TASKS_TTL = 120.0 # seconds
78
+
79
+
80
+ def _is_dir(entry) -> bool:
81
+ return entry.__class__.__name__ == "RepoFolder"
82
+
83
+
84
+ def list_hf_tasks(dataset_id: str, revision: str | None = None, *, ttl: float = _TASKS_TTL) -> list[str]:
85
+ """Task ids in a Hub dataset WITHOUT downloading it.
86
+
87
+ Uses *shallow* tree listings so even 2k-task datasets resolve in ~1 API call
88
+ instead of walking every file: if a top-level `tasks/` folder exists we list
89
+ its immediate children (Repo2RLEnv's nested layout); otherwise we treat the
90
+ top-level folders as flat task dirs. This is the fix for huge datasets that
91
+ used to hang while the whole repo was enumerated/downloaded."""
92
+ key = (dataset_id, revision or "head")
93
+ now = time.time()
94
+ hit = _TASKS_CACHE.get(key)
95
+ if hit and (now - hit[1]) < ttl:
96
+ return hit[0]
97
+
98
+ from huggingface_hub import HfApi
99
+
100
+ api = HfApi(token=_token())
101
+ root = list(api.list_repo_tree(dataset_id, repo_type="dataset", revision=revision, recursive=False))
102
+ names = {e.path: e for e in root}
103
+
104
+ if "tasks" in names and _is_dir(names["tasks"]):
105
+ sub = api.list_repo_tree(dataset_id, "tasks", repo_type="dataset", revision=revision, recursive=False)
106
+ ids = sorted(e.path.split("/")[-1] for e in sub if _is_dir(e))
107
+ else:
108
+ # flat layout: top-level folders are the tasks (skip dotfiles/README/etc.)
109
+ ids = sorted(e.path for e in root if _is_dir(e) and not e.path.startswith("."))
110
+
111
+ _TASKS_CACHE[key] = (ids, now)
112
+ return ids
113
+
114
+
115
+ def count_tasks(dataset_id: str) -> int:
116
+ """Number of Harbor tasks in a Hub dataset (shallow listing, memoised)."""
117
+ try:
118
+ return len(list_hf_tasks(dataset_id))
119
+ except Exception as exc: # noqa: BLE001
120
+ logger.warning("count_tasks(%s) failed: %s", dataset_id, exc)
121
+ return -1
viewer/load.py CHANGED
@@ -139,16 +139,21 @@ def _fetch_hf(source: DatasetSource, force: bool) -> Path:
139
  from huggingface_hub import snapshot_download
140
 
141
  target = CACHE_ROOT / source.cache_key
142
- if not force and target.exists() and any(target.iterdir()):
143
- logger.info("hf cache hit: %s", target)
 
 
 
 
 
144
  return target
145
 
146
- if target.exists():
147
- shutil.rmtree(target)
148
  target.mkdir(parents=True, exist_ok=True)
149
  # Public datasets work without a token; private ones rely on $HF_TOKEN
150
  # being set in the Space's secrets.
151
  token = os.environ.get("HF_TOKEN") or None
 
 
152
  snapshot_download(
153
  repo_id=source.ident,
154
  repo_type="dataset",
@@ -159,6 +164,52 @@ def _fetch_hf(source: DatasetSource, force: bool) -> Path:
159
  return target
160
 
161
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
162
  def _fetch_harbor(source: DatasetSource, force: bool) -> Path:
163
  """Shell out to `harbor datasets download` to fetch a Harbor-registry dataset.
164
 
 
139
  from huggingface_hub import snapshot_download
140
 
141
  target = CACHE_ROOT / source.cache_key
142
+ # Pinned revisions (tag/commit) are immutable → caching is always safe.
143
+ # Unpinned ("head") datasets MUST re-sync every load so we never show stale
144
+ # data — snapshot_download is etag-aware, so re-syncing only pulls files
145
+ # that actually changed (cheap). This is the fix for "doesn't show latest".
146
+ pinned = source.revision is not None
147
+ if not force and pinned and target.exists() and any(target.iterdir()):
148
+ logger.info("hf cache hit (pinned %s): %s", source.revision, target)
149
  return target
150
 
 
 
151
  target.mkdir(parents=True, exist_ok=True)
152
  # Public datasets work without a token; private ones rely on $HF_TOKEN
153
  # being set in the Space's secrets.
154
  token = os.environ.get("HF_TOKEN") or None
155
+ logger.info("hf %s: %s@%s", "fetch" if pinned else "re-sync",
156
+ source.ident, source.revision or "head")
157
  snapshot_download(
158
  repo_id=source.ident,
159
  repo_type="dataset",
 
164
  return target
165
 
166
 
167
+ def fetch_hf_task(source: DatasetSource, task_id: str, *, force: bool = False) -> Path:
168
+ """Download ONLY one task's files from an HF dataset (not the whole repo).
169
+
170
+ Snapshot-downloading a 2k-task dataset just to open one task is the slowness
171
+ the user hit; even `snapshot_download(allow_patterns=...)` still walks the
172
+ entire repo tree first. Instead we list just this task's subtree (one shallow
173
+ API call) and `hf_hub_download` each file. A handful of small files, no
174
+ full-repo walk. Files accumulate under one per-dataset cache dir so
175
+ revisiting is free. Returns a root that `load_task(root, task_id)` resolves
176
+ for either flat or nested layout.
177
+ """
178
+ from huggingface_hub import HfApi, hf_hub_download
179
+
180
+ target = CACHE_ROOT / f"{source.cache_key}__bytask"
181
+ target.mkdir(parents=True, exist_ok=True)
182
+ token = os.environ.get("HF_TOKEN") or None
183
+ api = HfApi(token=token)
184
+ logger.info("hf per-task fetch: %s :: %s", source.ident, task_id)
185
+
186
+ # Resolve the task's directory in the repo: nested (`tasks/<id>`) first, then flat.
187
+ # `list_repo_tree` is a generator, so the 404 for a non-existent prefix only
188
+ # fires while iterating — force it inside the try (via list()) so we fall
189
+ # through to the other layout instead of bubbling the error up.
190
+ files: list[str] = []
191
+ for prefix in (f"tasks/{task_id}", task_id):
192
+ try:
193
+ entries = list(api.list_repo_tree(
194
+ source.ident, prefix, repo_type="dataset",
195
+ revision=source.revision, recursive=True,
196
+ ))
197
+ except Exception: # noqa: BLE001 — path doesn't exist in this layout
198
+ continue
199
+ files = [e.path for e in entries if getattr(e, "size", None) is not None]
200
+ if files:
201
+ break
202
+ if not files:
203
+ raise FileNotFoundError(f"task {task_id!r} not found in {source.ident}")
204
+
205
+ for f in files:
206
+ hf_hub_download(
207
+ repo_id=source.ident, repo_type="dataset", revision=source.revision,
208
+ filename=f, local_dir=str(target), token=token,
209
+ )
210
+ return target
211
+
212
+
213
  def _fetch_harbor(source: DatasetSource, force: bool) -> Path:
214
  """Shell out to `harbor datasets download` to fetch a Harbor-registry dataset.
215