Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +43 -62

README.md CHANGED Viewed

@@ -1,8 +1,9 @@
 ---
 language:
 - en
-license: llama3.1
-base_model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
 tags:
 - data-management
 - data-migration
@@ -39,11 +40,11 @@ model-index:
 # 🚀 Agentic Data 1
-### The First Open-Source LLM Purpose-Built for Data Operations
 **SQL Migration • Schema Analysis • Data Quality • ETL Design • Performance Tuning**
-[![License](https://img.shields.io/badge/License-Llama_3.1-blue.svg)](https://llama.meta.com/llama3/license/)
 [![Model Size](https://img.shields.io/badge/Parameters-8B-green.svg)]()
 [![Training](https://img.shields.io/badge/Training-SFT_+_GRPO-orange.svg)]()
 [![HuggingFace](https://img.shields.io/badge/🤗-DataManagement--AI-yellow.svg)](https://huggingface.co/DataManagement-AI)
@@ -56,7 +57,7 @@ model-index:
 ## 🎯 What is Agentic Data 1?
-Agentic Data 1 is the **first open-source language model specifically designed for data management and migration tasks**. While general-purpose LLMs like GPT-4 or Claude treat data operations as just another coding task, Agentic Data 1 understands the unique challenges of enterprise data ecosystems — from legacy Oracle databases to modern cloud data warehouses.
 Built on DeepSeek-R1-Distill-Llama-8B and enhanced through a rigorous two-stage training pipeline (Supervised Fine-Tuning + GRPO Reinforcement Learning), it delivers **specialist-grade performance** at a fraction of the cost of frontier models.
@@ -69,7 +70,7 @@ Built on DeepSeek-R1-Distill-Llama-8B and enhanced through a rigorous two-stage
 | Data quality rules | Surface-level checks | **Comprehensive quality framework** (duplicates, PII, referential integrity) |
 | ETL pipeline design | Abstract descriptions | **Practical, implementable pipelines** with error handling and rollback |
 | Query performance tuning | Basic index suggestions | **Multi-strategy optimization** (partitioning, materialized views, query rewriting) |
-| Cost to operate | $3-30 per million tokens | **Near-zero** (self-hosted inference) |
 ---
@@ -258,81 +259,61 @@ Design event-driven data pipelines with:
 ---
-## ⚡ Quick Start
-### Basic Usage
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model = AutoModelForCausalLM.from_pretrained(
-    "DataManagement-AI/Agentic-Data-1",
-    device_map="auto",
-    torch_dtype="auto",
-)
-tokenizer = AutoTokenizer.from_pretrained("DataManagement-AI/Agentic-Data-1")
-prompt = "Convert this Oracle SQL to PostgreSQL: SELECT NVL(salary, 0) FROM employees WHERE ROWNUM <= 10;"
-messages = [{"role": "user", "content": prompt}]
-input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
-outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
-print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
-```
-### 4-Bit Quantized (Recommended for Production)
 ```python
-from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
-import torch
-bnb_config = BitsAndBytesConfig(
-    load_in_4bit=True,
-    bnb_4bit_quant_type="nf4",
-    bnb_4bit_compute_dtype=torch.bfloat16,
 )
-model = AutoModelForCausalLM.from_pretrained(
-    "DataManagement-AI/Agentic-Data-1",
-    quantization_config=bnb_config,
-    device_map="auto",
 )
-tokenizer = AutoTokenizer.from_pretrained("DataManagement-AI/Agentic-Data-1")
 ```
-### With vLLM (High-Throughput API Server)
-```bash
-pip install vllm
-vllm serve DataManagement-AI/Agentic-Data-1 --dtype auto --max-model-len 4096
-```
-```python
-from openai import OpenAI
-client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
-response = client.chat.completions.create(
-    model="DataManagement-AI/Agentic-Data-1",
-    messages=[{"role": "user", "content": "Convert Oracle NVL to PostgreSQL equivalent"}],
-)
-```
 ---
 ## 💰 Cost Comparison
-Running your own Agentic Data 1 vs using commercial LLM APIs:
-| Model | Input $/M tokens | Output $/M tokens | Monthly Cost (100 active users) |
 |---|---|---|---|
-| GPT-4 Turbo | $10.00 | $30.00 | **$11,500** |
-| Claude Sonnet 3.5 | $3.00 | $15.00 | **$1,015** |
-| Claude Haiku | $0.25 | $1.25 | **$440** |
-| **Agentic Data 1** (self-hosted) | **~$0.003** | **~$0.003** | **$330** (GPU only) |
-> **99.7% cost reduction** vs GPT-4 Turbo. **67% reduction** vs Claude Haiku. With better domain performance.
 ---
@@ -366,8 +347,8 @@ Agentic Data 1 powers the AI backbone of the [DataManagement.AI](https://dataman
 | **Base Model** | DeepSeek-R1-Distill-Llama-8B |
 | **Training Method** | SFT + GRPO (500 steps, NVIDIA H100) |
 | **Precision** | BFloat16 |
-| **License** | Llama 3.1 Community License |
-| **Model Size** | ~16 GB (FP16) / ~4 GB (4-bit quantized) |
 ---

 ---
 language:
 - en
+license: other
+license_name: datamanagement-ai-commercial
+license_link: https://www.datamanagement.ai/contact-us
 tags:
 - data-management
 - data-migration
 # 🚀 Agentic Data 1
+### The First Specialized Language Model Purpose-Built for Data Operations
 **SQL Migration • Schema Analysis • Data Quality • ETL Design • Performance Tuning**
+[![License](https://img.shields.io/badge/License-Commercial-blue.svg)](https://www.datamanagement.ai/contact-us)
 [![Model Size](https://img.shields.io/badge/Parameters-8B-green.svg)]()
 [![Training](https://img.shields.io/badge/Training-SFT_+_GRPO-orange.svg)]()
 [![HuggingFace](https://img.shields.io/badge/🤗-DataManagement--AI-yellow.svg)](https://huggingface.co/DataManagement-AI)
 ## 🎯 What is Agentic Data 1?
+Agentic Data 1 is the **first specialized language model designed exclusively for data management and migration tasks**. While general-purpose LLMs like GPT-4 or Claude treat data operations as just another coding task, Agentic Data 1 understands the unique challenges of enterprise data ecosystems — from legacy Oracle databases to modern cloud data warehouses.
 Built on DeepSeek-R1-Distill-Llama-8B and enhanced through a rigorous two-stage training pipeline (Supervised Fine-Tuning + GRPO Reinforcement Learning), it delivers **specialist-grade performance** at a fraction of the cost of frontier models.
 | Data quality rules | Surface-level checks | **Comprehensive quality framework** (duplicates, PII, referential integrity) |
 | ETL pipeline design | Abstract descriptions | **Practical, implementable pipelines** with error handling and rollback |
 | Query performance tuning | Basic index suggestions | **Multi-strategy optimization** (partitioning, materialized views, query rewriting) |
+| Cost to operate | $3-30 per million tokens | **Up to 90% lower** via DataManagement.AI API |
 ---
 ---
+## ⚡ Get Access
+Agentic Data 1 is available through the **DataManagement.AI platform** and as a **dedicated API** for enterprise teams.
+### API Access
 ```python
+from openai import OpenAI
+# Use the Agentic Data 1 API (OpenAI-compatible)
+client = OpenAI(
+    base_url="https://api.datamanagement.ai/v1",
+    api_key="your-api-key",
 )
+response = client.chat.completions.create(
+    model="agentic-data-1",
+    messages=[{
+        "role": "user",
+        "content": "Convert this Oracle SQL to PostgreSQL: SELECT NVL(salary, 0) FROM employees WHERE ROWNUM <= 10;"
+    }],
 )
+print(response.choices[0].message.content)
 ```
+### Deployment Options
+| Option | Description | Best For |
+|---|---|---|
+| **Platform** | Use within DataManagement.AI workflows | Teams using our full platform |
+| **API** | OpenAI-compatible REST API | Developers integrating into existing apps |
+| **Dedicated** | Private instance on your infrastructure | Enterprise with data residency requirements |
+<div align="center">
+### 📬 Ready to Get Started?
+[**Request API Access**](https://www.datamanagement.ai/contact-us) • [**Start Free Trial**](https://dmaife.datamanagement.ai/signup) • [**Schedule a Demo**](https://www.datamanagement.ai/contact-us)
+</div>
 ---
 ## 💰 Cost Comparison
+Agentic Data 1 delivers **specialist-grade performance at a fraction of the cost** of general-purpose frontier models:
+| Model | Input $/M tokens | Output $/M tokens | Data Domain Accuracy |
 |---|---|---|---|
+| GPT-4 Turbo | $10.00 | $30.00 | General purpose |
+| Claude Sonnet 3.5 | $3.00 | $15.00 | General purpose |
+| Claude Haiku | $0.25 | $1.25 | General purpose |
+| **Agentic Data 1** | **$0.50** | **$2.50** | **Domain-specialized** |
+> **83% cheaper than Claude Sonnet** with **better performance on data tasks**. Purpose-built beats general-purpose.
 ---
 | **Base Model** | DeepSeek-R1-Distill-Llama-8B |
 | **Training Method** | SFT + GRPO (500 steps, NVIDIA H100) |
 | **Precision** | BFloat16 |
+| **License** | DataManagement-AI Commercial License |
+| **Access** | API / Platform / Dedicated Deployment |
 ---