databio
/

attribute-standardizer-model6

Model card Files Files and versions

attribute-standardizer-model6 / README.md

saanikat

config files

8943500 over 1 year ago

|

history blame contribute delete

2.61 kB

	### Model Description

	This repository hosts three pre-trained models desgined for metadata attribute standardization for genomic regions metadata. The three pre-trained models are: `ENCODE`, `FAIRTRACKS` and `BEDBASE`. These models, along with their associated files and schema designs are used for standardization by `BEDMS` (BED Metadata Standardizer). To know more about BEDMS, you can visit: https://github.com/databio/bedms

	### Directory struture

	```
	/attribute-standardizer-model6
	/bedbase
	- bedbase_schema_design.yaml # BEDBASE schema
	- label_encoder_bedbase.pkl # Unqiue label values derived from training data, model classifies the output into these labels for BEDBASE schema
	- model_bedbase.pth # BEDBASE schema trained model
	- vectorizer_bedbase.pkl # CountVectorizer instance from the `scikit-learn` library for Bag of Words encoding used as input to the model
	- config_bedbase.yaml # Config file with model parameters
	/encode
	- encode_schema_design.yaml #ENCODE schema
	- label_encoder_encode.pkl # Unqiue label values derived from training data, model classifies the output into these labels for ENCODE schema
	- model_encode.pth # ENCODE schema trained model
	- vectorizer_encode.pkl # CountVectorizer instance from the `scikit-learn` library for Bag of Words encoding used as input to the model
	- config_encode.yaml # Config file with model parameters
	/fairtracks
	- fairtracks_schema_design.yaml # FAIRTRACKS schema
	- label_encoder_fairtracks.pkl # Unqiue label values derived from training data, model classifies the output into these labels for FAIRTRACKS schema
	- model_fairtracks.pth #FAIRTRACKS schema trained model
	- vectorizer_fairtracks.pkl # CountVectorizer instance from the `scikit-learn` library for Bag of Words encoding used as input to the model
	- config_fairtracks.yaml # Config file with model parameters
	```

	### Usage

	To use this model, refer to the GitHub repository of `bedms`:

	[BEDMS](https://github.com/databio/bedms)

	### Contribution

	To add a schema model:
	1. You should first train the new model using [BEDMS](https://github.com/databio/bedms).
	2. Create a new directory within this repository with the name of the new schema. ( For example, "new_schema").
	3. Maintain the directory structure like this:

	```
	/attribute-standardizer-model6
	/new_schema
	- new_schema_design.yaml
	- label_encoder_new_schema.pkl
	- model_new_schema.pth
	- vectorizer_new_schema.pkl
	- config_new_schema.yaml
	```