sparkastML/translate
2024-11-02 15:46:12 +08:00
..
analytics ref: the intention-classification model 2024-09-22 03:58:56 +08:00
validation update: latest synthetic data script 2024-10-07 23:15:25 +08:00
fetcher.py add: dataset 2024-09-16 17:29:12 +08:00
hf-dataset.py add: forced alignment example 2024-11-02 15:46:12 +08:00
LLMtranslator.py update: latest synthetic data script 2024-10-07 23:15:25 +08:00
postprocess.py update: latest synthetic data script 2024-10-07 23:15:25 +08:00
README.md update: README for translate 2024-09-23 21:30:44 +08:00
spider.py add: content fetcher for translate 2024-09-15 23:43:01 +08:00
split_source.py update: latest synthetic data script 2024-10-07 23:15:25 +08:00

sparkastML NMT

A set of models that aims to offer best open-source machine translation, based on the OpenNMT.

News

sparkastML's translation model is now updated!

Details

  • Source Language: Chinese (Simplified)
  • Target Language: English
  • Training Time: Totally 11.3 hours, 46,500 steps (~1×10¹⁸ FLOPs)
  • Training Device:
    • RTX 3080 (20GB): 0-20,000 steps
    • RTX 4070: 20,000-46,500 steps
  • Corpus Size: Over 10 million sentences
  • Validation BLEU Score: 21.28
  • Validation Loss (Cross Entropy): 3.152

Model Download

Avaliable soon.

Special thanks

yumechi for sponsoring an RTX 4070 for training.

History

Sep 19, 2024

sparkastML's translation model is now updated!

Details

  • Source Language: Chinese (Simplified)
  • Target Language: English
  • Training Time: 5 hours, 20,000 steps
  • Training Device: RTX 3080 (20GB)
  • Corpus Size: Over 10 million sentences
  • Validation BLEU Score: 17
  • Version: 1.0

Model Download