| # Insect Label Parser β Setup Instructions |
|
|
| This tool reads raw entomology collection label text and extracts structured |
| data (country, state, locality, date, collector, elevation, etc.) as JSON. |
| It runs entirely on your computer β no internet connection required after |
| the one-time setup. |
|
|
| --- |
|
|
| ## Step 1 β Which file do I need? |
|
|
| Copy one of these files from `output/gguf/` to your computer: |
|
|
| | File | Size | Use when | |
| |------|------|----------| |
| | `ento-label-parser-q4_k_m.gguf` | 3.2 GB | Your computer has **8 GB RAM** (most laptops) | |
| | `ento-label-parser-q5_k_m.gguf` | 3.4 GB | Your computer has **16 GB RAM or more** (slightly better quality) | |
|
|
| Not sure how much RAM you have? |
| - **Mac:** Apple menu β About This Mac β look for "Memory" |
| - **Windows:** Settings β System β About β look for "Installed RAM" |
|
|
| > **The Q4 file works well for this task.** Label parsing is a simple |
| > extraction job β the quality difference between Q4 and Q5 is very small. |
|
|
| --- |
|
|
| ## Option A: LM Studio (recommended for most users β no terminal needed) |
|
|
| LM Studio is a free desktop app with a chat interface, similar to ChatGPT |
| but running fully on your own machine. |
|
|
| ### Install |
|
|
| 1. Go to **lmstudio.ai** and download the version for your operating system |
| (Mac, Windows, or Linux) |
| 2. Install and open it |
|
|
| ### Load the model |
|
|
| 1. In LM Studio, click **My Models** in the left sidebar |
| 2. Click **"Load model from file"** (or drag the `.gguf` file into the window) |
| 3. Navigate to the `ento-label-parser-q4_k_m.gguf` file you copied in Step 1 |
| 4. Wait for the model to load (progress bar at the bottom) |
|
|
| ### Configure the system prompt |
|
|
| This step tells the model what it is supposed to do. |
|
|
| 1. Click the **Chat** icon in the left sidebar |
| 2. Find the **System Prompt** box (usually at the top of the right panel) |
| 3. Paste this text exactly: |
|
|
| ``` |
| Parse this insect collection label and return a JSON object with the extracted fields. Only include fields that are present in the label. |
| ``` |
|
|
| 4. Set **Temperature** to `0` in the model settings panel (this makes |
| output deterministic β the same label always gives the same result) |
|
|
| ### Parse a label |
|
|
| Paste the raw label text into the chat box and press Enter. The model will |
| return a JSON object. Example: |
|
|
| **Input:** |
| ``` |
| U.S.A., Texas: Austin, Travis Co., 15.iv.2021, J. Doe, sweeping |
| ``` |
|
|
| **Output:** |
| ```json |
| { |
| "country": "USA", |
| "state": "Texas", |
| "county": "Travis", |
| "verbatim_locality": "Austin", |
| "verbatim_date": "15.iv.2021", |
| "start_date_year": "2021", |
| "start_date_month": "4", |
| "start_date_day": "15", |
| "verbatim_collectors": "J. Doe", |
| "verbatim_method": "sweeping" |
| } |
| ``` |
|
|
| --- |
|
|
| ## Option B: Ollama (for users comfortable with a terminal) |
|
|
| Ollama is a lightweight tool that runs models from the command line and also |
| exposes a local API for scripting. |
|
|
| ### Requirement: Ollama version 0.20.7 or newer |
|
|
| Older versions do not support this model's architecture. Check your version: |
|
|
| ``` |
| ollama --version |
| ``` |
|
|
| If it shows a version older than 0.20.7, update from **ollama.com**. |
|
|
| ### Install |
|
|
| Go to **ollama.com**, download, and install for your operating system. |
|
|
| ### Register the model |
|
|
| Open a terminal, navigate to the project folder, and run: |
|
|
| ```bash |
| ollama create ento-label-parser -f Modelfile |
| ``` |
|
|
| You only need to do this once. |
|
|
| ### Parse a label |
|
|
| ```bash |
| ollama run ento-label-parser "U.S.A., Texas: Austin, 15.iv.2021, J. Doe" |
| ``` |
|
|
| Or pipe a text file: |
|
|
| ```bash |
| cat my_label.txt | ollama run ento-label-parser |
| ``` |
|
|
| --- |
|
|
| ## Troubleshooting |
|
|
| **The model is very slow.** |
| This is normal on a laptop without a dedicated GPU. The Q4 file typically |
| takes 5β30 seconds per label on a CPU. If you have an NVIDIA or AMD GPU |
| with 4+ GB of video memory, Ollama and LM Studio will use it automatically |
| and be much faster. |
|
|
| **LM Studio says "not enough memory."** |
| Try the Q4 file if you were using Q5. If Q4 also fails, your computer may |
| have less than 8 GB of RAM available β try closing other applications first. |
|
|
| **Ollama says "unknown model architecture: gemma4".** |
| Your Ollama version is too old. Update it from **ollama.com**. |
|
|
| **The output is not valid JSON.** |
| Occasionally the model will include a short thinking passage before the |
| JSON. Copy just the `{ ... }` portion of the output. If this happens |
| frequently, make sure Temperature is set to `0`. |
|
|