MK0727
/

lambda-160m

Text Generation

Model card Files Files and versions

lambda-160m / README.md

MK0727's picture

Update README.md

8d8cf7f verified 23 days ago

|

History Blame Contribute Delete

2.09 kB

	---
	language:
	- ja
	library_name: transformers
	tags:
	- myllm
	- causal-lm
	- custom-code
	- safetensors
	pipeline_tag: text-generation
	---

	# lambda-160m

	lambda-160m is an experimental Japanese causal language model created with a custom `myllm` decoder-only Transformer implementation.

	All training code is publicly available at [KeisukeMiyamoto1324/myllm](https://github.com/KeisukeMiyamoto1324/myllm).

	## Model Details

	\| Item \| Value \|
	\|---\|---:\|
	\| Parameters \| 164.5M \|
	\| Architecture \| Decoder-only Transformer \|
	\| Model type \| `myllm` \|
	\| Context length \| 1024 tokens \|
	\| Tokenizer \| Byte-level BPE \|
	\| Vocabulary size \| 65,536 \|
	\| Layers \| 16 \|
	\| Hidden size \| 768 \|
	\| Attention heads \| 12 \|
	\| FFN size \| 3,072 \|

	## Training Data

	The model was pretrained on a Japanese text mixture.

	\| Dataset \| Notes \|
	\|---\|---\|
	\| `hotchpotch/fineweb-2-edu-japanese` \| Japanese web text, Wikipedia domains excluded \|
	\| `MK0727/CleanedWiki-jp` \| Japanese Wikipedia-style text, ramped from 50% training progress \|

	## Training Setup

	This model was trained on a single RTX PRO 6000.

	\| Item \| Value \|
	\|---\|---:\|
	\| Optimizer \| AdamW \|
	\| Learning rate \| 2e-4 \|
	\| LR schedule \| Warmup cosine \|
	\| Warmup steps \| 2,000 \|
	\| Minimum LR ratio \| 0.1 \|
	\| Batch size \| 96 \|
	\| Max steps \| 40,960 \|

	## Usage

	This repository uses custom Transformers code, so `trust_remote_code=True` is required.

	```python
	from transformers import AutoModelForCausalLM
	from transformers import AutoTokenizer

	repo_id = "MK0727/lambda-160m"

	tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True)

	inputs = tokenizer("日本の首都は、", return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=64)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Limitations

	This model is not instruction-tuned or safety-aligned. It may generate incorrect, biased, unsafe, or low-quality text.

	The model was trained on a limited Japanese corpus mixture and has not been evaluated on standard benchmarks.