Scicom-intl
/

multilingual-dynamic-entity-decoder

Token Classification

text-generation-inference

Model card Files Files and versions

multilingual-dynamic-entity-decoder / README.md

huseinzolkepliscicom's picture

huseinzolkepliscicom

Update README.md

f99fdf5 verified 3 months ago

|

history blame contribute delete

3.51 kB

	---
	library_name: transformers
	base_model:
	- Qwen/Qwen3-0.6B
	---

	# Model Overview

	This model is a multilingual Named Entity Recognition (NER) transformer designed for name
	and address entity extraction with Malaysian context.

	It supports the following languages:
	- English
	- Malay
	- Chinese
	- Tamil

	The model is built on top of Qwen3(Qwen3-0.6B) and uses a custom non-causal attention
	mechanism.

	## Predicted Classes

	- 0 : Non-entity token
	- 1 : Name entity
	- 2 : Address entity

	## Transformer Inference Example

	```python
	from transformers import AutoTokenizer, Qwen3ForTokenClassification, AttentionInterface
	from typing import Optional

	def register_fa_attention():
	from flash_attn import flash_attn_func, flash_attn_varlen_func

	def custom_attention_forward(
	module: AttentionInterface,
	query: torch.Tensor,
	key: torch.Tensor,
	value: torch.Tensor,
	attention_mask: Optional[torch.Tensor] = None,
	**kwargs,
	):
	cu_seqlens_q = kwargs.get("cu_seqlens_q", None)
	cu_seqlens_k = kwargs.get("cu_seqlens_k", None)
	max_seqlen_q = kwargs.get("max_seqlen_q", None)
	max_seqlen_k = kwargs.get("max_seqlen_k", None)
	# permute query, key, value to (batch, seq_len, n_heads, head_dim)
	query_permute = query.permute(0, 2, 1, 3)
	key_permute = key.permute(0, 2, 1, 3)
	value_permute = value.permute(0, 2, 1, 3)

	if cu_seqlens_q is not None and cu_seqlens_k is not None:
	attn_output = flash_attn_varlen_func(
	q=query_permute.squeeze(0),
	k=key_permute.squeeze(0),
	v=value_permute.squeeze(0),
	cu_seqlens_q=cu_seqlens_q,
	cu_seqlens_k=cu_seqlens_k,
	max_seqlen_q=max_seqlen_q,
	max_seqlen_k=max_seqlen_k,
	causal=False,
	)
	else:
	attn_output = flash_attn_func(
	query_permute, key_permute, value_permute,
	causal=False,
	)
	return attn_output , None

	AttentionInterface.register("fa_noncausal", custom_attention_forward)

	# Register custom non-causal FA (Feel free to use FA2/FA3), required GPU
	register_fa_attention()

	def tokenize_sentence_to_word(sentence:str ):
	tokens = []
	chinese_char_pattern = re.compile(r'[\u4e00-\u9fff]')
	# Split text by spaces first
	parts = sentence.split()
	for part in parts:
	if chinese_char_pattern.search(part):
	# Character-level tokenization for Chinese
	tokens.extend(list(part))
	else:
	# Word-level tokenization for other languages
	tokens.append(part)
	return tokens

	tokenizer = AutoTokenizer.from_pretrained("Scicom-intl/multilingual-dynamic-entity-decoder")
	model = Qwen3ForTokenClassification.from_pretrained(
	"Scicom-intl/multilingual-dynamic-entity-decoder",
	attn_implementation="fa_noncausal",
	dtype=torch.bfloat16,
	device_map={"":"cuda:0"}
	)

	word_token = tokenize_sentence_to_word("Hi, my name is Alex and I'm from Perlis")
	token = tokenizer(
	word_token,
	is_split_into_words=True,
	return_tensors="pt"
	).to(model.device)

	with toch.no_grad():
	output = model(**inputs)
	prediction = output.logits.argmax(dim=-1)
	print(prediction)
	```

	## Evaluation Result

	- F1 macro: 0.81

	## Optimized inference

	We build small dynamic batching library for this, https://github.com/Scicom-AI-Enterprise-Organization/entity-api