LQ-FSE-base: Korean Financial Sentence Extractor

๊ธˆ์œต ๋ฆฌํฌํŠธ, ๊ธˆ์œต ๊ด€๋ จ ๋‰ด์Šค์—์„œ ๋Œ€ํ‘œ๋ฌธ์žฅ์„ ์ถ”์ถœํ•˜๊ณ  ์—ญํ• (outlook, event, financial, risk)์„ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

Model Description

  • Base Model: klue/roberta-base
  • Architecture: Sentence Encoder (RoBERTa) + Inter-sentence Transformer (2 layers) + Dual Classifiers
  • Task: Extractive Summarization + Role Classification (Multi-task)
  • Language: Korean
  • Domain: Financial Reports (์ฆ๊ถŒ ๋ฆฌํฌํŠธ), Financial News (๊ธˆ์œต ๋‰ด์Šค)

Input Constraints

Parameter Value Description
Max sentence length 128 tokens ๋ฌธ์žฅ๋‹น ์ตœ๋Œ€ ํ† ํฐ ์ˆ˜ (์ดˆ๊ณผ ์‹œ truncation)
Max sentences per document 30 ๋ฌธ์„œ๋‹น ์ตœ๋Œ€ ๋ฌธ์žฅ ์ˆ˜ (์ดˆ๊ณผ ์‹œ ์•ž 30๊ฐœ๋งŒ ์‚ฌ์šฉ)
Input format Plain text ๋ฌธ์žฅ ๋ถ€ํ˜ธ(.!?) ๊ธฐ์ค€์œผ๋กœ ์ž๋™ ๋ถ„๋ฆฌ
  • ์ž…๋ ฅ: ํ•œ๊ตญ์–ด ๊ธˆ์œต ํ…์ŠคํŠธ (์ฆ๊ถŒ ๋ฆฌํฌํŠธ, ๊ธˆ์œต ๋‰ด์Šค ๋“ฑ)
  • ์ถœ๋ ฅ: ๊ฐ ๋ฌธ์žฅ๋ณ„ ๋Œ€ํ‘œ๋ฌธ์žฅ ์ ์ˆ˜ (0~1) + ์—ญํ•  ๋ถ„๋ฅ˜ (outlook/event/financial/risk)

Performance

Metric Score
Extraction F1 0.705
Role Accuracy 0.851

Role Labels

Label Description
outlook ์ „๋ง/์˜ˆ์ธก ๋ฌธ์žฅ
event ์ด๋ฒคํŠธ/์‚ฌ๊ฑด ๋ฌธ์žฅ
financial ์žฌ๋ฌด/์‹ค์  ๋ฌธ์žฅ
risk ๋ฆฌ์Šคํฌ ์š”์ธ ๋ฌธ์žฅ

Usage

import re
import torch
from transformers import AutoConfig, AutoModel, AutoTokenizer

repo_id = "LangQuant/LQ-FSE-base"

# ๋ชจ๋ธ ๋กœ๋“œ
config = AutoConfig.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model.eval()

# ์ž…๋ ฅ ํ…์ŠคํŠธ
text = (
    "์‚ผ์„ฑ์ „์ž์˜ 2024๋…„ 4๋ถ„๊ธฐ ์‹ค์ ์ด ์‹œ์žฅ ์˜ˆ์ƒ์„ ์ƒํšŒํ–ˆ๋‹ค. "
    "๋ฉ”๋ชจ๋ฆฌ ๋ฐ˜๋„์ฒด ๊ฐ€๊ฒฉ ์ƒ์Šน์œผ๋กœ ์˜์—…์ด์ต์ด ์ „๋ถ„๊ธฐ ๋Œ€๋น„ 30% ์ฆ๊ฐ€ํ–ˆ๋‹ค. "
    "HBM3E ์–‘์‚ฐ์ด ๋ณธ๊ฒฉํ™”๋˜๋ฉด์„œ AI ๋ฐ˜๋„์ฒด ์‹œ์žฅ ์ ์œ ์œจ์ด ํ™•๋Œ€๋  ์ „๋ง์ด๋‹ค."
)

# ๋ฌธ์žฅ ๋ถ„๋ฆฌ ๋ฐ ํ† ํฐํ™”
sentences = [s.strip() for s in re.split(r'(?<=[.!?])\s+', text.strip()) if s.strip()]
max_len, max_sent = config.max_length, config.max_sentences

padded = sentences[:max_sent]
num_real = len(padded)
while len(padded) < max_sent:
    padded.append("")

ids_list, mask_list = [], []
for s in padded:
    if s:
        enc = tokenizer(s, max_length=max_len, padding="max_length", truncation=True, return_tensors="pt")
    else:
        enc = {"input_ids": torch.zeros(1, max_len, dtype=torch.long),
               "attention_mask": torch.zeros(1, max_len, dtype=torch.long)}
    ids_list.append(enc["input_ids"])
    mask_list.append(enc["attention_mask"])

input_ids = torch.cat(ids_list).unsqueeze(0)
attention_mask = torch.cat(mask_list).unsqueeze(0)
doc_mask = torch.zeros(1, max_sent)
doc_mask[0, :num_real] = 1

# ์ถ”๋ก 
with torch.no_grad():
    scores, role_logits = model(input_ids, attention_mask, doc_mask)

role_labels = config.role_labels
for i, sent in enumerate(sentences):
    score = scores[0, i].item()
    role = role_labels[role_logits[0, i].argmax().item()]
    marker = "*" if score >= 0.5 else " "
    print(f"  {marker} [{score:.4f}] [{role:10s}] {sent}")

Input Example

์‚ผ์„ฑ์ „์ž์˜ 2024๋…„ 4๋ถ„๊ธฐ ์‹ค์ ์ด ์‹œ์žฅ ์˜ˆ์ƒ์„ ์ƒํšŒํ–ˆ๋‹ค. ๋ฉ”๋ชจ๋ฆฌ ๋ฐ˜๋„์ฒด ๊ฐ€๊ฒฉ ์ƒ์Šน์œผ๋กœ ์˜์—…์ด์ต์ด ์ „๋ถ„๊ธฐ ๋Œ€๋น„ 30% ์ฆ๊ฐ€ํ–ˆ๋‹ค. HBM3E ์–‘์‚ฐ์ด ๋ณธ๊ฒฉํ™”๋˜๋ฉด์„œ AI ๋ฐ˜๋„์ฒด ์‹œ์žฅ ์ ์œ ์œจ์ด ํ™•๋Œ€๋  ์ „๋ง์ด๋‹ค.

Output Example

  * [0.8732] [financial ] ์‚ผ์„ฑ์ „์ž์˜ 2024๋…„ 4๋ถ„๊ธฐ ์‹ค์ ์ด ์‹œ์žฅ ์˜ˆ์ƒ์„ ์ƒํšŒํ–ˆ๋‹ค.
  * [0.7145] [financial ] ๋ฉ”๋ชจ๋ฆฌ ๋ฐ˜๋„์ฒด ๊ฐ€๊ฒฉ ์ƒ์Šน์œผ๋กœ ์˜์—…์ด์ต์ด ์ „๋ถ„๊ธฐ ๋Œ€๋น„ 30% ์ฆ๊ฐ€ํ–ˆ๋‹ค.
  * [0.9021] [outlook   ] HBM3E ์–‘์‚ฐ์ด ๋ณธ๊ฒฉํ™”๋˜๋ฉด์„œ AI ๋ฐ˜๋„์ฒด ์‹œ์žฅ ์ ์œ ์œจ์ด ํ™•๋Œ€๋  ์ „๋ง์ด๋‹ค.
  • * ํ‘œ์‹œ: ๋Œ€ํ‘œ๋ฌธ์žฅ์œผ๋กœ ์„ ์ •๋จ (score โ‰ฅ 0.5)
  • [score]: ๋Œ€ํ‘œ๋ฌธ์žฅ ํ™•๋ฅ  (0~1, ๋†’์„์ˆ˜๋ก ํ•ต์‹ฌ ๋ฌธ์žฅ)
  • [role]: ๋ฌธ์žฅ ์—ญํ•  ๋ถ„๋ฅ˜ (outlook / event / financial / risk)

Disclaimer (๋ฉด์ฑ… ์กฐํ•ญ)

  • ๋ณธ ๋ชจ๋ธ์€ ์—ฐ๊ตฌ ๋ฐ ์ •๋ณด ์ œ๊ณต ๋ชฉ์ ์œผ๋กœ๋งŒ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค.
  • ๋ณธ ๋ชจ๋ธ์˜ ์ถœ๋ ฅ์€ ํˆฌ์ž ์กฐ์–ธ, ๊ธˆ์œต ์ž๋ฌธ, ๋งค๋งค ์ถ”์ฒœ์ด ์•„๋‹™๋‹ˆ๋‹ค.
  • ๋ชจ๋ธ์˜ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ํˆฌ์ž ํŒ๋‹จ์— ๋Œ€ํ•ด LangQuant ๋ฐ ๊ฐœ๋ฐœ์ž๋Š” ์–ด๋– ํ•œ ๋ฒ•์  ์ฑ…์ž„๋„ ์ง€์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  • ๋ชจ๋ธ์˜ ์ •ํ™•์„ฑ, ์™„์ „์„ฑ, ์ ์‹œ์„ฑ์— ๋Œ€ํ•ด ๋ณด์ฆํ•˜์ง€ ์•Š์œผ๋ฉฐ, ์‹ค์ œ ํˆฌ์ž ์˜์‚ฌ๊ฒฐ์ • ์‹œ ๋ฐ˜๋“œ์‹œ ์ „๋ฌธ๊ฐ€์˜ ์กฐ์–ธ์„ ๊ตฌํ•˜์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.
  • ๊ธˆ์œต ์‹œ์žฅ์€ ๋ณธ์งˆ์ ์œผ๋กœ ๋ถˆํ™•์‹คํ•˜๋ฉฐ, ๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šต๋œ ๋ชจ๋ธ์ด ๋ฏธ๋ž˜ ์„ฑ๊ณผ๋ฅผ ๋ณด์žฅํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

Usage Restrictions (์‚ฌ์šฉ ์ œํ•œ)

  • ๊ธˆ์ง€ ์‚ฌํ•ญ:
    • ๋ณธ ๋ชจ๋ธ์„ ์ด์šฉํ•œ ์‹œ์„ธ ์กฐ์ข…, ํ—ˆ์œ„ ์ •๋ณด ์ƒ์„ฑ ๋“ฑ ๋ถˆ๋ฒ•์  ๋ชฉ์ ์˜ ์‚ฌ์šฉ
    • ์ž๋™ํ™”๋œ ํˆฌ์ž ๋งค๋งค ์‹œ์Šคํ…œ์˜ ๋‹จ๋… ์˜์‚ฌ๊ฒฐ์ • ์ˆ˜๋‹จ์œผ๋กœ ์‚ฌ์šฉ
    • ๋ชจ๋ธ ์ถœ๋ ฅ์„ ์ „๋ฌธ ๊ธˆ์œต ์ž๋ฌธ์ธ ๊ฒƒ์ฒ˜๋Ÿผ ์ œ3์ž์—๊ฒŒ ์ œ๊ณตํ•˜๋Š” ํ–‰์œ„
  • ํ—ˆ์šฉ ์‚ฌํ•ญ:
    • ํ•™์ˆ  ์—ฐ๊ตฌ ๋ฐ ๊ต์œก ๋ชฉ์ ์˜ ์‚ฌ์šฉ
    • ๊ธˆ์œต ํ…์ŠคํŠธ ๋ถ„์„ ํŒŒ์ดํ”„๋ผ์ธ์˜ ๋ณด์กฐ ๋„๊ตฌ๋กœ ํ™œ์šฉ
    • ์‚ฌ๋‚ด ๋ฆฌ์„œ์น˜/๋ถ„์„ ์—…๋ฌด์˜ ์ฐธ๊ณ  ์ž๋ฃŒ๋กœ ํ™œ์šฉ
  • ์ƒ์—…์  ์‚ฌ์šฉ ์‹œ LangQuant์— ์‚ฌ์ „ ๋ฌธ์˜๋ฅผ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.

Contributors

Downloads last month
105
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for LangQuant/LQ-FSE-base

Finetuned
(429)
this model