Vanilla Policy Gradient (REINFORCE) Actor Critic Method MADDPG (Multi Agent Deep Deterministic Policy Gradients)