Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
lucidrains committed Jul 17, 2020
1 parent a48cd5f commit ea97fb3
Showing 1 changed file with 8 additions and 8 deletions.
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ from mixture_of_experts import MoE
inputs = torch.randn(4, 1024, 512)

experts = MoE(
dim = 512,
num_experts = 16, # increase the experts (# parameters) of your model without increasing computation
hidden_dim = 512 * 4, # size of hidden dimension in each expert, defaults to 4 * dimension
activation = nn.LeakyReLU # use your preferred activation, will default to ReLU
dim = 512,
num_experts = 16, # increase the experts (# parameters) of your model without increasing computation
hidden_dim = 512 * 4, # size of hidden dimension in each expert, defaults to 4 * dimension
activation = nn.LeakyReLU # use your preferred activation, will default to ReLU
)

out, aux_loss = experts(inputs) # (4, 1024, 512), (1,)
Expand All @@ -31,10 +31,10 @@ out, aux_loss = experts(inputs) # (4, 1024, 512), (1,)

```bibtex
@misc{lepikhin2020gshard,
title = {GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding},
author = {Dmitry Lepikhin and HyoukJoong Lee and Yuanzhong Xu and Dehao Chen and Orhan Firat and Yanping Huang and Maxim Krikun and Noam Shazeer and Zhifeng Chen},
year = {2020},
eprint = {2006.16668},
title = {GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding},
author = {Dmitry Lepikhin and HyoukJoong Lee and Yuanzhong Xu and Dehao Chen and Orhan Firat and Yanping Huang and Maxim Krikun and Noam Shazeer and Zhifeng Chen},
year = {2020},
eprint = {2006.16668},
archivePrefix = {arXiv},
primaryClass = {cs.CL}
}
Expand Down

0 comments on commit ea97fb3

Please sign in to comment.