MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism - NVIDIA ADLR
We train an 8.3 billion parameter transformer language model with 8-way model parallelism and 64-way data parallelism on 512 GPUs, making it the largest transformer based language model ever trained at 24x the size of BERT and 5.6x the size of GPT-2
23 mentions:
















Keywords:
language model
Date: 2019/08/13 00:00