README / index.html
Optitransfer's picture
Corporate clean org page: no emojis, Swiss datasets only
1d289b2 verified
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>OptiTransfer Data</title>
<style>
* { box-sizing: border-box; margin: 0; padding: 0; }
body {
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
background: #f8fafc;
color: #1e293b;
padding: 2rem 1rem;
}
.container { max-width: 860px; margin: 0 auto; }
.hero {
text-align: center;
padding: 2.5rem 1rem 2rem;
}
.hero h1 { font-size: 2rem; font-weight: 700; color: #0f172a; margin-bottom: 0.5rem; }
.hero p { font-size: 1.05rem; color: #475569; max-width: 640px; margin: 0 auto; line-height: 1.7; }
.badge {
display: inline-block;
background: #f1f5f9;
color: #334155;
border: 1px solid #e2e8f0;
border-radius: 999px;
padding: 0.25rem 0.85rem;
font-size: 0.8rem;
font-weight: 600;
margin: 0.75rem 0.25rem 0;
}
hr { border: none; border-top: 1px solid #e2e8f0; margin: 2rem 0; }
h2 { font-size: 1.2rem; font-weight: 700; color: #0f172a; margin-bottom: 1rem; letter-spacing: -0.01em; }
.section { margin-bottom: 2rem; }
.features {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(190px, 1fr));
gap: 1rem;
margin-top: 0.5rem;
}
.feature-card {
background: #fff;
border: 1px solid #e2e8f0;
border-radius: 10px;
padding: 1.1rem;
}
.feature-card strong { display: block; font-size: 0.9rem; margin-bottom: 0.3rem; color: #0f172a; }
.feature-card p { font-size: 0.82rem; color: #64748b; line-height: 1.5; }
table { width: 100%; border-collapse: collapse; font-size: 0.9rem; }
th { background: #f1f5f9; text-align: left; padding: 0.6rem 0.75rem; font-weight: 600; font-size: 0.78rem; color: #475569; text-transform: uppercase; letter-spacing: 0.04em; }
td { padding: 0.65rem 0.75rem; border-top: 1px solid #f1f5f9; vertical-align: top; }
tr:hover td { background: #fafbfc; }
a { color: #2563eb; text-decoration: none; }
a:hover { text-decoration: underline; }
.tag {
display: inline-block;
background: #f1f5f9;
color: #475569;
border-radius: 4px;
padding: 0.15rem 0.5rem;
font-size: 0.73rem;
margin: 0.15rem 0.15rem 0.15rem 0;
white-space: nowrap;
}
.tag-group { margin-top: 0.5rem; line-height: 1.9; }
.quality-list { list-style: none; }
.quality-list li { padding: 0.35rem 0; font-size: 0.88rem; color: #374151; padding-left: 1.2rem; position: relative; }
.quality-list li::before { content: "\2713"; position: absolute; left: 0; color: #16a34a; font-weight: 700; }
.pricing-grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
gap: 1rem;
margin-top: 0.5rem;
}
.price-card {
background: #fff;
border: 1px solid #e2e8f0;
border-radius: 10px;
padding: 1.25rem;
}
.price-card.featured {
border-color: #2563eb;
box-shadow: 0 0 0 1px rgba(37,99,235,0.1);
}
.price-card h3 { font-size: 0.95rem; font-weight: 700; margin-bottom: 0.5rem; color: #0f172a; }
.price-card .price { font-size: 1.2rem; font-weight: 800; color: #2563eb; margin-bottom: 0.5rem; }
.price-card p { font-size: 0.82rem; color: #64748b; line-height: 1.5; }
.payment { display: flex; flex-wrap: wrap; gap: 0.5rem; margin-top: 0.75rem; }
.payment-item {
background: #fff;
border: 1px solid #e2e8f0;
border-radius: 8px;
padding: 0.4rem 0.85rem;
font-size: 0.84rem;
color: #374151;
}
.contact-bar {
background: #0f172a;
color: #e2e8f0;
border-radius: 10px;
padding: 1.5rem;
text-align: center;
margin-top: 2rem;
}
.contact-bar strong { font-size: 1rem; }
.contact-bar a { color: #93c5fd; }
.contact-bar p { font-size: 0.88rem; margin-top: 0.35rem; }
.dataset-detail { margin-top: 0.75rem; }
.dataset-detail p { font-size: 0.84rem; color: #64748b; line-height: 1.5; margin-bottom: 0.4rem; }
</style>
</head>
<body>
<div class="container">
<div class="hero">
<h1>OptiTransfer Data</h1>
<p>Premium web corpora for LLM pre-training, fine-tuning, RAG, and multilingual NLP. Swiss-registered. EU AI Act compliant. Quality-scored, PII-redacted, SHA256-verified.</p>
<div>
<span class="badge">Swiss-Registered</span>
<span class="badge">EU AI Act Compliant</span>
<span class="badge">SHA256 Verified</span>
<span class="badge">PII Redacted</span>
</div>
</div>
<hr />
<div class="section">
<h2>Capabilities</h2>
<div class="features">
<div class="feature-card">
<strong>LLM Training</strong>
<p>Sovereign national web corpora at scale for pre-training and supervised fine-tuning</p>
</div>
<div class="feature-card">
<strong>RAG Pipelines</strong>
<p>Pre-chunked, embedding-ready corpora with quality scores per chunk</p>
</div>
<div class="feature-card">
<strong>Regulatory NLP</strong>
<p>Domain-classified, jurisdiction-specific government and institutional data</p>
</div>
<div class="feature-card">
<strong>Research</strong>
<p>Reproducible datasets with full metadata, provenance tracking, and QA reports</p>
</div>
</div>
</div>
<div class="section">
<h2>Available Datasets</h2>
<table>
<thead>
<tr>
<th>Dataset</th>
<th>Records</th>
<th>Formats</th>
<th>Access</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<strong><a href="https://huggingface.co/datasets/OptiTransferData/swiss-web-premium-ch" target="_blank">*.ch Swiss Web Premium (A+)</a></strong>
</td>
<td>110,491</td>
<td>Parquet, JSONL, Language Splits, RAG Chunks</td>
<td>
<a href="https://huggingface.co/datasets/OptiTransferData/swiss-web-premium-ch" target="_blank">Sample</a> |
<a href="https://huggingface.co/datasets/OptiTransferData/swiss-web-premium-ch-full" target="_blank">Full</a>
</td>
</tr>
</tbody>
</table>
<div class="dataset-detail">
<p>Flagship Swiss web corpus from the .ch ccTLD. 112.4M tokens across 78 fields. Multilingual coverage: German (61.2%), French (19.0%), English (10.5%), Italian (4.7%), and 25 additional languages. Nine-component quality model, full provenance chain, and independent QA report.</p>
<div class="tag-group">
<span class="tag">LLM Pre-Training</span>
<span class="tag">Supervised Fine-Tuning (SFT)</span>
<span class="tag">Retrieval-Augmented Generation</span>
<span class="tag">Multilingual NLP</span>
<span class="tag">German Language Models</span>
<span class="tag">French Language Models</span>
<span class="tag">Swiss Market AI</span>
<span class="tag">EU AI Act Compliance</span>
<span class="tag">Domain-Specific Training</span>
<span class="tag">Web Corpus Research</span>
<span class="tag">Text Classification</span>
<span class="tag">Summarisation</span>
<span class="tag">Question Answering</span>
<span class="tag">Translation</span>
</div>
</div>
<p style="font-size:0.82rem; color:#64748b; margin-top:1rem;">Free gated samples available on each dataset. Request access to evaluate before purchasing.</p>
</div>
<div class="section">
<h2>Quality Standards</h2>
<ul class="quality-list">
<li>Independent QA audits with documented accuracy metrics</li>
<li>SHA-256 integrity verification on all production files</li>
<li>Quality scoring per record (0 to 100 scale, nine components)</li>
<li>Domain classification and language detection</li>
<li>EU AI Act compliance with full data provenance and licensing transparency</li>
<li>Content-level and URL-level deduplication</li>
<li>PII detection and redaction (email, phone, IBAN, AHV, credit card)</li>
<li>Croissant metadata for ML interoperability</li>
</ul>
</div>
<div class="section">
<h2>Licensing and Pricing</h2>
<div class="pricing-grid">
<div class="price-card">
<h3>Sample</h3>
<div class="price">Free</div>
<p>Gated access. Evaluate data quality, schema, and documentation before committing.</p>
</div>
<div class="price-card featured">
<h3>Full Dataset</h3>
<div class="price">Commercial</div>
<p>Complete production data with commercial licence. All formats included.</p>
</div>
<div class="price-card">
<h3>Enterprise</h3>
<div class="price">Custom</div>
<p>Dedicated support, SLA, bespoke corpora, volume pricing.</p>
</div>
</div>
<p style="margin-top:1rem; font-size:0.88rem; color:#374151;">Contact us for a quote: <a href="mailto:data@optitransfer.ch">data@optitransfer.ch</a></p>
<div class="payment">
<div class="payment-item">Bank Transfer (SEPA/SWIFT)</div>
<div class="payment-item">TWINT (Swiss)</div>
<div class="payment-item">Crypto (BTC / ETH / SOL)</div>
</div>
</div>
<div class="contact-bar">
<strong>OptiTransfer Data</strong>
<p>Swiss-registered | <a href="https://optitransfer.ch">optitransfer.ch</a> | <a href="mailto:data@optitransfer.ch">data@optitransfer.ch</a></p>
</div>
</div>
</body>
</html>