Minecraftify: Turning Real Images into Minecraft Worlds with FLUX.2-Klein

Community Article
Published June 15, 2026

Introduction

What would your photos look like if they were built entirely out of Minecraft blocks?

That question led to the creation of Minecraftify, a Hugging Face Space that transforms ordinary images into faithful Minecraft-style recreations while preserving the original scene's structure, layout, and composition.

Unlike simple style-transfer approaches, Minecraftify attempts to maintain the identity of the scene while converting textures, materials, objects, and geometry into something that feels like it belongs inside vanilla Minecraft.

The project is powered by a fine-tuned FLUX.2-Klein-4B image-to-image model and a custom LoRA trained specifically for Minecraft scene conversion.


Try It Yourself

Space: Minecraftify

Upload an image or use your webcam and instantly see your world transformed into Minecraft.

Features include:

  • Image upload
  • Webcam capture
  • Live processing mode
  • Adjustable inference settings
  • Side-by-side comparison
  • Downloadable results

Try it here! : https://huggingface.co/spaces/build-small-hackathon/Minecraftify

The Goal

Many image stylization models dramatically alter a scene, changing camera angles, objects, or compositions.

Minecraftify was designed with a different objective:

  • Preserve the original composition
  • Preserve camera perspective
  • Keep recognizable objects
  • Maintain scene structure
  • Convert surfaces and geometry into Minecraft-style blocks

The result should feel like the original image was rebuilt inside Minecraft rather than completely reimagined.


Example Transformation

Input:

  • Real-world photograph
  • Natural lighting
  • Real textures and materials

Output:

  • Minecraft blocks
  • Voxel-style geometry
  • Minecraft-inspired materials
  • Similar composition and scene layout

The ideal output remains immediately recognizable while clearly belonging to the Minecraft aesthetic.


Building the Dataset

Training data is often the most important part of a project.

For Minecraftify, I created a paired image dataset consisting of approximately 400 image pairs.

Each sample contains:

  • Original image
  • Minecraft-style edited image
  • Caption describing the transformation

The Minecraft versions were generated using Qwen-Edit-25-12, allowing creation of a large paired dataset suitable for image-to-image training.

Dataset structure:

source_image
edited_image
prompt_used

Where:

  • source_image = original image
  • edited_image = Minecraft-style target
  • prompt_used = caption used during generation

Why FLUX.2-Klein?

A major design goal was keeping the project lightweight.

Instead of training or serving a very large model, Minecraftify uses:

FLUX.2-Klein-4B

This smaller FLUX model provides:

  • Fast inference
  • Lower VRAM requirements
  • Strong image editing capabilities
  • Excellent compatibility with LoRA fine-tuning

The final deployment combines:

FLUX.2-Klein-4B
        +
Minecraft LoRA
        =
Minecraftify

This allows the application to remain efficient while producing high-quality edits.


Training the Minecraft LoRA

The model was trained using the Hugging Face Diffusers DreamBooth LoRA workflow adapted for image-to-image training.

Key training settings:

Parameter Value
Batch Size 1
Gradient Accumulation 4
Learning Rate 1
Precision bf16
Rank 64
Training Steps 1200
Optimizer Prodigy
Warmup Steps 200

Additional optimizations:

  • Latent caching
  • Gradient checkpointing
  • Aspect ratio buckets
  • 8-bit optimizer support

These settings allowed training on a relatively small dataset while maintaining scene fidelity.


Architecture

The processing pipeline is intentionally simple:

Input Image
      ↓
FLUX.2-Klein Image-to-Image
      ↓
Minecraft LoRA Adapter
      ↓
Minecraftified Output

For webcam mode:

Camera Frame
      ↓
Latest Frame Buffer
      ↓
Model Inference
      ↓
Minecraft Output

Only the most recent frame is processed to keep latency manageable.


Running on Hugging Face Spaces

Minecraftify is deployed as a Gradio Space.

The Space includes:

  • Persistent model caching
  • Live webcam support
  • Adjustable generation controls
  • Side-by-side preview interface

To avoid repeatedly downloading models after restarts, persistent storage is used:

/data/models
/data/.huggingface

This significantly improves startup times and reduces bandwidth usage.


Recommended Settings

For the best balance between speed and quality:

Setting Value
Steps 3
Guidance Scale 3.0
Seed Fixed
Input Well-lit scenes

These settings were chosen specifically for interactive usage within Hugging Face Spaces.


What Worked Well

A few lessons emerged during development:

Small Models Can Go a Long Way

The 4B FLUX Klein model proved surprisingly capable when paired with a targeted LoRA.

Dataset Quality Matters More Than Quantity

Even with roughly 400 examples, careful pairing and consistent transformations produced useful results.

Scene Preservation Is Hard

One of the biggest challenges was encouraging the model to change visual style without changing the scene itself.

Prompt design and image-to-image conditioning played a major role in achieving this balance.


Future Improvements

Potential future directions include:

  • Better Minecraft character generation
  • Support for different Minecraft texture packs
  • Real-time video processing
  • Larger and more diverse training datasets
  • Multiple Minecraft style presets
  • Improved block-level consistency

Resources

Space

Minecraftify

Base Model

black-forest-labs/FLUX.2-klein-4B

LoRA

AnimeOverlord/flux2-klein-4b-mc-v2

Dataset

Custom paired Minecraft image dataset (~400 samples)

Training Framework

  • Hugging Face Diffusers
  • Accelerate
  • PEFT
  • PyTorch

Closing Thoughts

Minecraftify began as a hackathon-style experiment with a simple question: could a lightweight image editing model convincingly rebuild the world using Minecraft blocks?

Despite being developed in a relatively short timeframe, the project successfully demonstrates how a specialized visual transformation task can be achieved using a compact 4B parameter model, LoRA fine-tuning, and a carefully curated image-to-image dataset.

The current version should be viewed as a strong first iteration rather than a finished product. Due to time constraints, many ideas and improvements remain unexplored, including larger training datasets, improved character and creature generation, better block consistency, real-time video support, texture-pack variations, and more advanced scene preservation techniques.

What excites me most is the project's potential. The underlying approach has already shown promising results with limited training data and development time, suggesting there is significant room for improvement as the dataset, training process, and inference pipeline continue to evolve.

Minecraftify is ultimately an exploration of how far small, specialized models can be pushed for creative visual tasks. This release is just the beginning, and future iterations will continue to expand its capabilities and bring generated scenes even closer to the experience of stepping into a real Minecraft world.

Community

Sign up or log in to comment