Release Version 6.0.0 · joshuaspear/offline_rl_ope

Updated PropensityModels structure for sklearn and added a helper class for compatability with torch
Full runtime typechecking with jaxtyping
Fixed bug with IS methods where the average was being taken twice
Significantly simplified API, especially integrating Policy classes with propensity models
Generalised d3rlpy API to allow for wrapping continuous policies with D3RlPyTorchAlgoPredict
Added explicit stochastic policies for d3rlpy
Introduced 'policy_func' which is any function/method which outputs type Union[TorchPolicyReturn, NumpyPolicyReturn]
Simplified and unified ISCallback in d3rlpy/api using PolicyFactory
Added 'premade' doubly robust estimators for vanilla DR, weighted DR, per-decision DR and weighted per-decision DR

Provide feedback