-
Notifications
You must be signed in to change notification settings - Fork 38
Differentiable Functions
The code explained below is no longer used (at least not much).
A crucial ingredient for building learning systems is what we call differentiable functions. We use the term differentiable function for a function with a gradient (more precisely an adjoint derivative), i.e., (an abstraction of) a differentiable function on a vector space with inner product.
If A and B are Euclidean spaces (i.e.
Thus, we can define the gradient as a function.
In scala code, we define differentiable functions as a trait consisting of functions with gradients:
trait DiffbleFunction[A, B]{self =>
def apply(a: A): B
def grad(a: A) : B => A
...
}
The companion object includes a method for constructing differentiable functions.
object DiffbleFunction{
def apply[A, B](f: => A => B)(grd: => A => (B => A)) = new DiffbleFunction[A, B]{
def apply(a: A) = f(a)
def grad(a: A) = grd(a)
}
...
}
The gradient plays a crucial role in learning by back-propagation. Namely, we try to learn the best parameter in
The composition of differentiable functions is a differentiable function with gradient given by the chain rule and taking an adjoint. In code:
trait DiffbleFunction[A, B] extends Any{self =>
def apply(a: A): B
def grad(a: A) : B => A
/**
* Composition f *: g is f(g(_))
*/
def *:[C](that: => DiffbleFunction[B, C]) = andthen(that)
def andthen[C](that: => DiffbleFunction[B, C]): DiffbleFunction[A, C] = DiffbleFunction((a: A) => that(this(a)))(
(a: A) =>
(c: C) =>
grad(a)(that.grad(this(a))(c)))
Remark: This is a case where making the correct definition is helped greatly by type safety.
Given a function
trait DiffbleFunction[A, B] extends Any{
...
/**
* Conjugate that by this.
*/
def ^:(that: B => B) = (a : A) => grad(a)(that(apply(a)))
...
}
Often the feedback depends on
trait DiffbleFunction[A, B] extends Any{
...
/**
* post-compose by the gradient of this, for instance for a feedback.
*/
def **:(that: A => B) = (a : A) => grad(a)(that(a))
...
}
Given a pair of differentiable functions, we can define the direct sum of these:
trait DiffbleFunction[A, B] extends Any{self =>
...
def oplus[C, D](that : DiffbleFunction[C, D]) = {
def func(ac: (A, C)) = (this(ac._1), that(ac._2))
def grad(ac: (A, C))(bd : (B, D)) = (self.grad(ac._1)(bd._1), that.grad(ac._2)(bd._2))
DiffbleFunction(func)(grad)
}
...
}
Most of the differentiable functions we construct will be built using linear structures, which in turn are built from those on probability distributions. We shall describe these in other pages.