MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism - NVIDIA ADLR

We train an 8.3 billion parameter transformer language model with 8-way model parallelism and 64-way data parallelism on 512 GPUs, making it the largest transformer based language model ever trained at 24x the size of BERT and 5.6x the size of GPT-2

23 mentions: @chipro@ctnzr@ivan_bezdomny@kentenglish@gwern@TheRealRPuri@pareshkharya@DataScientistFr
Date: 2019/08/13 00:00

Referring Tweets

@ivan_bezdomny Our team, training the biggest neural net to date (Transformer language model with 8.3B trainable parameters). Now let's find out what we can do with this capacity. t.co/xRvi6HBgKW
@chipro "8.3 billion parameters on 512 GPUs with 8-way model parallelism, we achieved up to 15.1 PetaFLOPS sustained performance over the entire application and reached 76% scaling efficiency compared to the single GPU case." Whoa t.co/p2ItGUyKD1 t.co/RlE1WSLxRm
@gwern @WahrMethode @MelMitchell1 And Nvidia's new 8.3b-parameter model does overfit quickly in terms of test loss: t.co/vm3USs0a7Y So GPT-2-1.5b can't be too far away from that level of memorization either.
@ctnzr Here’s how we trained an 8.3B parameter GPT-2. We alternate row- and column- partitioning in the Transformer in order to remove synchronization and use hybrid model/data parallelism. 15 PFlops sustained on 512 GPUs. Details and code: t.co/7eXA6r15yX t.co/sEk4q0hU7T
@kentenglish These #transformer models are getting more intense all the time. Anyone got a few hundred NVIDIA V100s that I can borrow? t.co/kIiPg350Dc
@pareshkharya NVIDIA research team trained the largest ever language model based on transformers with a novel model parallel approach using Pytorch. The code that implements this approach is published on GitHub. t.co/lKvRyUhysV
@TheRealRPuri We Just released a cool #PyTorch #NaturalLanguageProcessing project we've been working on: training an 8.3B GPT2 model with model parallelism. Check it out... Details: t.co/RPK91K4Dda Training Code: t.co/byti9HDZCE
@DataScientistFr We Just released a cool #PyTorch #NaturalLanguageProcessing project we've been working on: training an 8.3B GPT2 model with model parallelism. Check it out... Details: t.co/ir0q4mfC25 Training Code: t.co/zukMQ3IWwr… - t.co/h6gLldXJDq #datascience

Bookmark Comments

Related Entries

Read more 子どもの言語獲得のモデル化とNN Language ModelsNN
0 users, 0 mentions 2018/10/05 03:23
Read more Generalized Language Models
1 users, 24 mentions 2019/02/03 02:18
Read more GitHub - facebookresearch/XLM: PyTorch original implementation of Cross-lingual Language Model Pretr...
0 users, 4 mentions 2019/09/03 17:17
Read more GitHub - facebookresearch/XLM: PyTorch original implementation of Cross-lingual Language Model Pretr...
0 users, 10 mentions 2019/08/21 02:16
Read more GitHub - NVIDIA/Megatron-LM: Ongoing research training transformer language models at scale, includi...
0 users, 3 mentions 2019/09/20 03:48