Release v4.0.0 · joshuaspear/offline_rl_ope

Various bug fixes (see release log in README.md)
Predefined propensity models including:
- Generic feedforward MLP for continuous and discrete action spaces built in PyTorch
- xGBoost for continuous and discrete action spaces built in sklearn
- Both PyTorch and sklearn models can handle space discrete actions spaces i.e., a propensity model can be exposed to 'new' actions provided the full action space definition is provided at the training time of the propensity model
Metrics pattern with:
- Effective sample size calculation
- Proportion of valid weights i.e., the mean proportion of weights between a min and max value across trajectories
Refactored the BehavPolicy class to accept a 'policy_func' that aligns with the other policy classes

Provide feedback