# Notebook Collaboration — How We Work on Kaggle Notebooks Together *Audience: anyone collaborating with Broulaye on Sahel-Voice-Lab* *Last updated: 2026-04-20* ## Why we're changing how we work Up to now, we've mostly been editing notebooks inside the Kaggle web UI, downloading them occasionally, and pushing to git. That's painful because: - The Kaggle copy and the git copy drift apart — it's never clear which one is "right." - Cell outputs and execution counts change every run, so git diffs are huge and unreadable. - If both of us edit the same notebook at the same time, one of us accidentally overwrites the other. The new workflow fixes this by making **git the single source of truth** and using Kaggle purely as the place where notebooks *run*. We edit locally, commit to git, and push the notebook up to Kaggle with one command. The Kaggle web UI becomes read-only for our shared notebooks — we still go there to watch runs and read logs, but we don't type code into it anymore. ## What you need to install (one time, on your own machine) ```bash pip install kaggle nbstripout ``` - `kaggle` — the official Kaggle command-line tool. Lets you push, pull, run, and monitor Kaggle notebooks from your terminal. - `nbstripout` — strips cell outputs and execution counts from notebooks before they hit git, so diffs stay about *code*, not noise. ## Set up your Kaggle API credentials (one time) 1. Go to [kaggle.com](https://www.kaggle.com), click your avatar → **Settings** → **API** → **Create New API Token**. A file called `kaggle.json` downloads. 2. Move it to the right place and lock down permissions: ```bash mkdir -p ~/.kaggle mv ~/Downloads/kaggle.json ~/.kaggle/kaggle.json chmod 600 ~/.kaggle/kaggle.json ``` 3. Confirm it works: ```bash kaggle kernels list --mine ``` You should see your existing kernels. If you get an auth error, check the file location and permissions. **Never commit `kaggle.json` to git.** It's already in `.gitignore` in this repo, but if you work in another repo, add it yourself. ## Repository layout for notebooks Each Kaggle notebook ("kernel" in Kaggle's API language) needs its own folder with a `kernel-metadata.json` file next to the `.ipynb`. Our structure: ``` notebooks/ kaggle_master_trainer/ kernel-metadata.json kaggle_master_trainer.ipynb train_fula_tts/ kernel-metadata.json train_fula_tts.ipynb bootstrap_repos.ipynb # local helper, not a Kaggle kernel train_colab.ipynb # runs on Google Colab, different flow ``` A `kernel-metadata.json` looks roughly like this (example for the master trainer): ```json { "id": "ous-sow/sahel-kaggle-master-trainer", "title": "Sahel-Voice-Lab Master Trainer", "code_file": "kaggle_master_trainer.ipynb", "language": "python", "kernel_type": "notebook", "is_private": true, "enable_gpu": true, "enable_internet": true, "dataset_sources": ["google/fleurs", "robotsmali/jeli-asr"], "kernel_sources": [], "competition_sources": [] } ``` The `id` field (`owner/slug`) is **permanent**. Once we've agreed on a slug for a shared kernel, never change it — that's our shared pointer to the kernel living on Kaggle. ## Enable the nbstripout filter in the repo (one time per clone) From the repo root, the first time you clone: ```bash nbstripout --install --attributes .gitattributes ``` This adds a git filter that runs on every `.ipynb` before it gets committed, stripping outputs and execution counts. Commit the `.gitattributes` file so everyone else picks it up automatically. **First-time caveat:** if the repo previously had notebooks-with-outputs committed, your first diff after enabling this will look like everything is being "deleted." That's correct and one-time — it's just stripping the old outputs. ## The daily workflow ```bash # 1. Pull the latest version from git git pull # 2. Edit the notebook locally (VS Code, JupyterLab, whatever you prefer) # — running cells is fine; nbstripout handles the cleanup on commit. # 3. Commit your changes git add notebooks/kaggle_master_trainer/kaggle_master_trainer.ipynb git commit -m "experiment: lower LR for ASR adapter" git push # 4. Push the notebook up to Kaggle to actually run it cd notebooks/kaggle_master_trainer kaggle kernels push # 5. Watch the run kaggle kernels status ous-sow/sahel-kaggle-master-trainer # 6. When it's done, pull outputs if you need them kaggle kernels output ous-sow/sahel-kaggle-master-trainer -p ./runs/$(date +%F)/ ``` Results go into `runs/` (which is gitignored). **They do not go back into the `.ipynb` in git** — that's what nbstripout is protecting us from. ## Team rules (please read these — they matter) 1. **Never edit shared notebooks in the Kaggle web UI.** Use the web UI to watch runs, read logs, download output files. If you want to experiment, do it locally. If you absolutely must try something quick in the web UI, treat it as a scratch copy — do not manually merge it back. 2. **One runner at a time per kernel.** `kaggle kernels push` *replaces* the notebook on Kaggle's side. If you push while the other person's run is queued or mid-execution, you'll queue behind them or disrupt them. Coordinate over chat, or — better — give yourself a personal kernel slug (e.g. `ous-sow/sahel-trainer-dev-`) for experimentation, and only push to the shared kernel (`ous-sow/sahel-kaggle-master-trainer`) when a change is ready to run cleanly. 3. **Git is the source of truth, always.** Every Kaggle run begins with a `kaggle kernels push` from the current git state. Nothing on Kaggle is authoritative. If something on Kaggle looks different from git, git wins — pull from git, re-push, run again. ## Troubleshooting **`kaggle kernels push` says "message: Kernel already exists."** Expected — it's just telling you the kernel already exists on Kaggle and will be updated. Not an error. **Huge diff with no real code changes.** `nbstripout` isn't active in your clone. Run `nbstripout --install --attributes .gitattributes` from the repo root and re-stage the file. **Auth errors from `kaggle` CLI.** Check `~/.kaggle/kaggle.json` exists, is yours (not someone else's), and has mode 600. **Merge conflict on a `kernel-metadata.json`.** Rare but possible if two people edit metadata simultaneously. The file is small JSON — resolve by hand, keeping the shared `id` untouched. **The notebook ran fine on Kaggle but saved outputs landed in git anyway.** You committed before `nbstripout` stripped outputs. Either re-stage (`git add`) which triggers the filter, or run `nbstripout ` manually before `git add`. **You accidentally edited on the Kaggle web UI.** Go to Kaggle → your kernel → "..." → Download notebook. Overwrite the local `.ipynb` with the downloaded file. Commit. Re-push. Don't panic — just restore git as the source of truth. ## What this workflow does not solve - **Two people editing the same cell at the same time.** Normal git merge conflicts will still happen if both of us touch the same notebook cell simultaneously. Mitigation: work on different notebooks when possible, or pair-edit voice-on-voice. If this becomes frequent, we can add `jupytext` later, which pairs each `.ipynb` with a `.py` mirror that merges like regular Python. - **Debugging a crashing Kaggle run.** The MCP/CLI pushes and watches, but fixing the crash is still back-and-forth between your local editor and the Kaggle logs. The workflow just removes the "which version is right" confusion from that loop. - **Kaggle's GPU quota.** You still get 30 free GPU hours per week. Plan accordingly. ## TL;DR Edit locally, commit to git, `kaggle kernels push` to run, `kaggle kernels output` to retrieve. Never edit on the Kaggle web UI for shared kernels. Git is the source of truth. `nbstripout` keeps diffs clean. If anything here doesn't make sense, ping Broulaye before improvising.