| | --- |
| | library_name: transformers |
| | pipeline_tag: image-text-to-text |
| | license: apache-2.0 |
| | datasets: |
| | - joshuachou/SkinCAP |
| | - HemanthKumarK/SKINgpt |
| | language: |
| | - en |
| | tags: |
| | - biology |
| | - skin |
| | - skin disease |
| | - cancer |
| | - medical |
| | --- |
| | # Model Card for PaliGemma Dermatology Model |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | This model, based on the PaliGemma-3B architecture, has been fine-tuned for dermatology-related image and text processing tasks. The model is designed to assist in the identification of various skin conditions using a combination of image analysis and natural language processing. |
| |
|
| |
|
| | - **Developed by:** Bruce_Wayne |
| | - **Model type:** vision model |
| | - **Finetuned from model:** https://huggingface.co/google/paligemma-3b-pt-224 |
| | - **LoRa Adaptors used:** Yes |
| | - **Intended use:** Medical image analysis, specifically for dermatology |
| | ** |
| | ### please let me know how the model works -->https://forms.gle/cBA6apSevTyiEbp46 |
| | ### Thank you |
| | ## Uses |
| | ### Direct Use |
| | |
| | The model can be directly used for analyzing dermatology images, providing insights into potential skin conditions. |
| | |
| | |
| | ## Bias, Risks, and Limitations |
| | |
| | **Skin Tone Bias:** The model may have been trained on a dataset that does not adequately represent all skin tones, potentially leading to biased results. |
| | **Geographic Bias:** The model's performance may vary depending on the prevalence of certain conditions in different geographic regions. |
| | |
| | ## How to Get Started with the Model |
| | |
| | ```python |
| | |
| | import torch |
| | from transformers import AutoProcessor, PaliGemmaForConditionalGeneration |
| | from PIL import Image |
| | |
| | # Load the model and processor |
| | model_id = "brucewayne0459/paligemma_derm" |
| | processor = AutoProcessor.from_pretrained(model_id) |
| | model = PaliGemmaForConditionalGeneration.from_pretrained(model_id, device_map={"": 0}) |
| | model.eval() |
| |
|
| | # Load a sample image and text input |
| | input_text = "Identify the skin condition?" |
| | input_image_path = " Replace with your actual image path" |
| | input_image = Image.open(input_image_path).convert("RGB") |
| |
|
| | # Process the input |
| | inputs = processor(text=input_text, images=input_image, return_tensors="pt", padding="longest").to("cuda" if torch.cuda.is_available() else "cpu") |
| |
|
| | # Set the maximum length for generation |
| | max_new_tokens = 50 |
| |
|
| | # Run inference |
| | with torch.no_grad(): |
| | outputs = model.generate(**inputs, max_new_tokens=max_new_tokens) |
| | |
| | # Decode the output |
| | decoded_output = processor.decode(outputs[0], skip_special_tokens=True) |
| | print("Model Output:", decoded_output) |
| | ``` |
| | ## Training Details |
| | |
| | ### Training Data |
| | |
| | The model was fine-tuned on a dataset of dermatological images combined with disease names |
| | |
| | ### Training Procedure |
| | |
| | The model was fine-tuned using LoRA (Low-Rank Adaptation) for more efficient training. Mixed precision (bfloat16) was used to speed up training and reduce memory usage. |
| | |
| | #### Training Hyperparameters |
| | |
| | - **Training regime:** Mixed precision (bfloat16) |
| | - **Epochs:** 10 |
| | - **Learning rate:** 2e-5 |
| | - **Batch size:** 6 |
| | - **Gradient accumulation steps:** 4 |
| | |
| | |
| | ## Evaluation |
| | |
| | ### Testing Data, Factors & Metrics |
| | |
| | #### Testing Data |
| | |
| | The model was evaluated on a separate validation set of dermatological images and Disease Names, distinct from the training data. |
| | |
| | #### Metrics |
| | - **Validation Loss:** The loss was tracked throughout the training process to evaluate model performance. |
| | - **Accuracy:** The primary metric for assessing model predictions. |
| | ### Results |
| | |
| | The model achieved a final validation loss of approximately 0.2214, indicating reasonable performance in predicting skin conditions based on the dataset used. |
| | |
| | #### Summary |
| | |
| | |
| | ## Environmental Impact |
| | |
| | |
| | - **Hardware Type:** 1 x L4 GPU |
| | - **Hours used:** ~22 HOURS |
| | - **Cloud Provider:** LIGHTNING AI |
| | - **Compute Region:** USA |
| | - **Carbon Emitted:** 0.9 kg eq. CO2 |
| | |
| | ## Technical Specifications |
| | |
| | ### Model Architecture and Objective |
| | |
| | - **Architecture:** Vision-Language model based on PaliGemma-3B |
| | - **Objective:** To classify and diagnose dermatological conditions from images and text |
| | |
| | ### Compute Infrastructure |
| | |
| | #### Hardware |
| | |
| | - **GPU:** 1xL4 GPU |
| | ## Model Card Authors |
| | Bruce_Wayne |