README.md · mapo80/DeQA-Doc-Color at main

DeQA-Doc-Color / README.md

mapo80

Upload README.md with huggingface_hub

ca586be verified 3 months ago

preview code

raw

history blame contribute delete

7.74 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- image-quality-assessment
	- document-quality
	- mplug-owl2
	- vision-language
	- document-analysis
	- color-quality
	- IQA
	pipeline_tag: image-to-text
	library_name: transformers
	---

	# DeQA-Doc-Color: Document Image Color Quality Assessment

	DeQA-Doc-Color is a vision-language model specialized in assessing the color quality of document images. It evaluates color fidelity, saturation, white balance, and color-related artifacts in scanned or photographed documents.

	## Model Family

	This model is part of the DeQA-Doc family, which includes three specialized models:

	\| Model \| Description \| HuggingFace \|
	\|-------\|-------------\|-------------\|
	\| DeQA-Doc-Overall \| Overall document quality \| [mapo80/DeQA-Doc-Overall](https://huggingface.co/mapo80/DeQA-Doc-Overall) \|
	\| DeQA-Doc-Color \| Color quality assessment (this model) \| [mapo80/DeQA-Doc-Color](https://huggingface.co/mapo80/DeQA-Doc-Color) \|
	\| DeQA-Doc-Sharpness \| Sharpness/clarity assessment \| [mapo80/DeQA-Doc-Sharpness](https://huggingface.co/mapo80/DeQA-Doc-Sharpness) \|

	## Quick Start

	```python
	import torch
	from transformers import AutoModelForCausalLM
	from PIL import Image

	# Load the model
	model = AutoModelForCausalLM.from_pretrained(
	"mapo80/DeQA-Doc-Color",
	trust_remote_code=True,
	torch_dtype=torch.float16,
	device_map="auto",
	)

	# Score an image
	image = Image.open("document.jpg").convert("RGB")
	score = model.score([image])
	print(f"Color Quality Score: {score.item():.2f} / 5.0")
	```

	## What Does Color Quality Measure?

	The color quality score evaluates:

	- Color Fidelity: How accurately colors are reproduced
	- White Balance: Neutral whites without color casts (yellow, blue tints)
	- Saturation: Appropriate color intensity (not washed out or oversaturated)
	- Color Artifacts: Absence of color bleeding, banding, or chromatic aberration
	- Uniformity: Consistent color reproduction across the document

	## Score Interpretation

	\| Score Range \| Quality Level \| Typical Issues \|
	\|-------------\|---------------\|----------------\|
	\| 4.5 - 5.0 \| Excellent \| Perfect color reproduction \|
	\| 3.5 - 4.5 \| Good \| Minor color shifts, slight tinting \|
	\| 2.5 - 3.5 \| Fair \| Noticeable color cast, uneven colors \|
	\| 1.5 - 2.5 \| Poor \| Strong color distortion, washed out \|
	\| 1.0 - 1.5 \| Bad \| Severe color problems, unusable \|

	## Batch Processing

	```python
	images = [
	Image.open("doc1.jpg").convert("RGB"),
	Image.open("doc2.jpg").convert("RGB"),
	Image.open("doc3.jpg").convert("RGB"),
	]

	scores = model.score(images)
	for i, score in enumerate(scores):
	print(f"Document {i+1} Color Score: {score.item():.2f} / 5.0")
	```

	## Use Cases

	- Scanner Calibration: Detect when scanners need color calibration
	- Photo Document QA: Flag photos with poor lighting/white balance
	- Color-Critical Documents: Verify color accuracy for maps, charts, branded materials
	- Archive Preservation: Identify documents with color degradation
	- Print Quality Control: Verify color reproduction in printed documents

	## Example: Detect Color Issues

	```python
	import torch
	from transformers import AutoModelForCausalLM
	from PIL import Image

	model = AutoModelForCausalLM.from_pretrained(
	"mapo80/DeQA-Doc-Color",
	trust_remote_code=True,
	torch_dtype=torch.float16,
	device_map="auto",
	)

	def diagnose_color_quality(image_path):
	img = Image.open(image_path).convert("RGB")
	score = model.score([img]).item()

	if score >= 4.5:
	diagnosis = "Excellent color quality"
	elif score >= 3.5:
	diagnosis = "Good - minor color issues"
	elif score >= 2.5:
	diagnosis = "Fair - consider color correction"
	elif score >= 1.5:
	diagnosis = "Poor - needs color correction or rescan"
	else:
	diagnosis = "Bad - severe color problems, rescan required"

	return score, diagnosis

	score, diagnosis = diagnose_color_quality("scanned_document.jpg")
	print(f"Score: {score:.2f}/5.0 - {diagnosis}")
	```

	## Multi-Dimensional Quality Assessment

	Combine with other DeQA-Doc models for comprehensive assessment:

	```python
	import torch
	from transformers import AutoModelForCausalLM
	from PIL import Image

	# Load all three models
	models = {
	"overall": AutoModelForCausalLM.from_pretrained(
	"mapo80/DeQA-Doc-Overall", trust_remote_code=True,
	torch_dtype=torch.float16, device_map="auto"
	),
	"color": AutoModelForCausalLM.from_pretrained(
	"mapo80/DeQA-Doc-Color", trust_remote_code=True,
	torch_dtype=torch.float16, device_map="auto"
	),
	"sharpness": AutoModelForCausalLM.from_pretrained(
	"mapo80/DeQA-Doc-Sharpness", trust_remote_code=True,
	torch_dtype=torch.float16, device_map="auto"
	),
	}

	def full_quality_report(image_path):
	img = Image.open(image_path).convert("RGB")

	scores = {}
	for name, model in models.items():
	scores[name] = model.score([img]).item()

	return scores

	report = full_quality_report("document.jpg")
	print(f"Overall: {report['overall']:.2f}/5.0")
	print(f"Color: {report['color']:.2f}/5.0")
	print(f"Sharpness: {report['sharpness']:.2f}/5.0")
	```

	## Model Architecture

	- Base Model: mPLUG-Owl2 (LLaMA2-7B + ViT-L Vision Encoder)
	- Vision Encoder: CLIP ViT-L/14 (1024 visual tokens via Visual Abstractor)
	- Language Model: LLaMA2-7B
	- Training: Full fine-tuning on document color quality datasets
	- Input Resolution: Images are resized to 448x448 (with aspect ratio preservation)

	## Technical Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Model Size \| ~16 GB (float16) \|
	\| Parameters \| ~7.2B \|
	\| Input \| RGB images (any resolution) \|
	\| Output \| Color quality score (1.0 - 5.0) \|
	\| Inference \| ~2-3 seconds per image on A100 \|

	## Hardware Requirements

	\| Setup \| VRAM Required \| Recommended \|
	\|-------\|---------------\|-------------\|
	\| Full precision (fp32) \| ~32 GB \| A100, H100 \|
	\| Half precision (fp16) \| ~16 GB \| A100, A40, RTX 4090 \|
	\| With CPU offload \| ~8 GB GPU + RAM \| RTX 3090, RTX 4080 \|

	## Installation

	```bash
	pip install torch transformers accelerate pillow sentencepiece protobuf
	```

	Note: Use `transformers>=4.36.0` for best compatibility.

	## Limitations

	- Optimized for document images (may not generalize to natural photos)
	- Color assessment is relative to training data distribution
	- Black & white documents may receive lower scores (use Overall model instead)
	- Requires GPU with sufficient VRAM for efficient inference

	## Credits & Attribution

	This model is based on the DeQA-Doc project by Junjie Gao et al., which won the Championship in the VQualA 2025 DIQA (Document Image Quality Assessment) Challenge.

	Original Repository: [https://github.com/Junjie-Gao19/DeQA-Doc](https://github.com/Junjie-Gao19/DeQA-Doc)

	All credit for the research, training methodology, and model architecture goes to the original authors.

	## Citation

	If you use this model in your research, please cite the original paper:

	```bibtex
	@inproceedings{deqadoc,
	title={{DeQA-Doc}: Adapting {DeQA-Score} to Document Image Quality Assessment},
	author={Gao, Junjie and Liu, Runze and Peng, Yingzhe and Yang, Shujian and Zhang, Jin and Yang, Kai and You, Zhiyuan},
	booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop},
	year={2025},
	}
	```

	ArXiv: [https://arxiv.org/abs/2507.12796](https://arxiv.org/abs/2507.12796)

	## License

	Apache 2.0

	## Related Models

	- [DeQA-Doc-Overall](https://huggingface.co/mapo80/DeQA-Doc-Overall) - Overall quality assessment
	- [DeQA-Doc-Sharpness](https://huggingface.co/mapo80/DeQA-Doc-Sharpness) - Sharpness assessment