| | --- |
| | title: Bottttt |
| | emoji: π |
| | colorFrom: gray |
| | colorTo: indigo |
| | sdk: gradio |
| | sdk_version: 5.36.2 |
| | app_file: app.py |
| | pinned: false |
| | license: apache-2.0 |
| | --- |
| | |
| | Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |
| | Great β letβs prepare your **RAG app** for **deployment on Hugging Face Spaces** with: |
| |
|
| | * β
Gradio as UI |
| | * β
LLaMA3-Instruct via Groq API |
| | * β
Sentence Transformers |
| | * β
ChromaDB with persistence |
| | * β
PDF upload + student Q\&A |
| |
|
| | --- |
| |
|
| | ## β
STEP 1: Project Structure |
| |
|
| | Create this directory structure for your Hugging Face Space: |
| |
|
| | ``` |
| | rag-student-assistant/ |
| | βββ app.py |
| | βββ requirements.txt |
| | βββ .env (optional, but donβt upload publicly) |
| | ``` |
| |
|
| | --- |
| |
|
| | ## β
STEP 2: `app.py` (Full Code) |
| |
|
| | ```python |
| | import os |
| | import gradio as gr |
| | import fitz # PyMuPDF |
| | from sentence_transformers import SentenceTransformer |
| | import chromadb |
| | from chromadb.utils import embedding_functions |
| | import openai |
| | |
| | # Load GROQ API Key |
| | openai.api_key = os.getenv("GROQ_API_KEY") |
| | openai.api_base = "https://api.groq.com/openai/v1" |
| | |
| | # Load embedding model |
| | embedder = SentenceTransformer("all-MiniLM-L6-v2") |
| | |
| | # Set up ChromaDB with persistence |
| | persist_path = "./chroma_db" |
| | db = chromadb.Client(chromadb.config.Settings(persist_directory=persist_path)) |
| | collection = db.get_or_create_collection("papers") |
| | |
| | # Extract text from uploaded PDF |
| | def extract_text_from_pdf(file): |
| | text = "" |
| | doc = fitz.open(stream=file.read(), filetype="pdf") |
| | for page in doc: |
| | text += page.get_text() |
| | return text |
| | |
| | # Chunk and store in vector DB |
| | def chunk_and_store(text): |
| | chunks = [text[i:i+500] for i in range(0, len(text), 500)] |
| | embeddings = embedder.encode(chunks).tolist() |
| | |
| | for i, chunk in enumerate(chunks): |
| | collection.add(documents=[chunk], ids=[f"id_{len(collection.get()['ids']) + i}"], embeddings=[embeddings[i]]) |
| | db.persist() |
| | |
| | # Retrieve relevant chunks and send to LLaMA3 via Groq |
| | def retrieve_and_ask(query): |
| | if len(collection.get()["documents"]) == 0: |
| | return "Please upload a paper first." |
| | |
| | query_embedding = embedder.encode([query]).tolist()[0] |
| | results = collection.query(query_embeddings=[query_embedding], n_results=3) |
| | context = "\n".join(results["documents"][0]) |
| | |
| | system_prompt = "You are an academic assistant helping students understand research papers." |
| | user_prompt = f"Based on the following context:\n{context}\n\nAnswer the question:\n{query}" |
| | |
| | try: |
| | response = openai.ChatCompletion.create( |
| | model="llama3-70b-8192", |
| | messages=[ |
| | {"role": "system", "content": system_prompt}, |
| | {"role": "user", "content": user_prompt} |
| | ] |
| | ) |
| | return response['choices'][0]['message']['content'] |
| | except Exception as e: |
| | return f"Error: {str(e)}" |
| | |
| | # Gradio UI |
| | def handle_upload(file): |
| | if file is None: |
| | return "Upload a valid PDF file." |
| | text = extract_text_from_pdf(file) |
| | chunk_and_store(text) |
| | return "β
Paper uploaded and processed." |
| | |
| | def handle_query(query): |
| | return retrieve_and_ask(query) |
| | |
| | with gr.Blocks() as demo: |
| | gr.Markdown("### π RAG Academic Assistant\nUpload a paper and ask questions.") |
| | |
| | with gr.Row(): |
| | file = gr.File(label="Upload PDF", type="binary") |
| | upload_btn = gr.Button("Process") |
| | upload_output = gr.Textbox() |
| | |
| | with gr.Row(): |
| | query = gr.Textbox(label="Ask a question") |
| | response = gr.Textbox(label="Answer") |
| | ask_btn = gr.Button("Ask") |
| | |
| | upload_btn.click(handle_upload, inputs=[file], outputs=[upload_output]) |
| | ask_btn.click(handle_query, inputs=[query], outputs=[response]) |
| | |
| | demo.launch() |
| | ``` |
| |
|
| | --- |
| |
|
| | ## β
STEP 3: `requirements.txt` |
| |
|
| | ```txt |
| | gradio |
| | chromadb |
| | sentence-transformers |
| | PyMuPDF |
| | openai |
| | ``` |
| |
|
| | > Hugging Face Spaces will auto-install these on build. |
| |
|
| | --- |
| |
|
| | ## β
STEP 4: GROQ API Key |
| |
|
| | ### π Option 1: Use Hugging Face "Secrets" |
| |
|
| | * Go to your Space β **Settings > Secrets** |
| | * Add a new secret: |
| |
|
| | * **Name:** `GROQ_API_KEY` |
| | * **Value:** `your-api-key-here` |
| |
|
| | No need to change code. It will use `os.getenv("GROQ_API_KEY")`. |
| |
|
| | --- |
| |
|
| | ## β
STEP 5: Deploy on Hugging Face |
| |
|
| | 1. Go to [Hugging Face Spaces](https://huggingface.co/spaces) |
| | 2. Click **Create New Space** |
| | 3. Choose: |
| |
|
| | * **Gradio** |
| | * **Public or Private** |
| | 4. Upload: |
| |
|
| | * `app.py` |
| | * `requirements.txt` |
| | 5. Add GROQ API key under **Settings > Secrets** |
| |
|
| | --- |
| |
|
| | ## β
Youβre Done! |
| |
|
| | After deployment: |
| |
|
| | * Students can upload PDF papers |
| | * Ask natural language questions |
| | * Get Groq/LLaMA3-generated answers from your vector database |
| |
|
| | --- |
| |
|
| | Would you like me to: |
| |
|
| | * π Zip the files for direct upload? |
| | * π§ͺ Add test examples? |
| | * π Add UI branding for universities or students? |
| |
|
| | Let me know what extras you want! |
| |
|