Differentiable Functions

Obsolete documentation

The code explained below is no longer used (at least not much).

Learning system combinators.

A crucial ingredient for building learning systems is what we call differentiable functions. We use the term differentiable function for a function with a gradient (more precisely an adjoint derivative), i.e., (an abstraction of) a differentiable function on a vector space with inner product.

Differentiable functions

If A and B are Euclidean spaces (i.e. $R^n$ and $R^m$), a differentiable function $f: A \to B$ has a a derivative at a point $a$ in $A$, which is a linear transformation $Df(a): A \to B$. In the case where $B= R$, the linear transformation is the inner product with a vector, the gradient $grad(a)$. In general, we can define the gradient as the adjoint linear transformation, i.e., the transpose of the matrix of $Df(a)$.

Thus, we can define the gradient as a function.

$$grad: A \to B \to A$$

In scala code, we define differentiable functions as a trait consisting of functions with gradients:

trait DiffbleFunction[A, B]{self =>
    def apply(a: A): B

    def grad(a: A) : B => A

...
}

The companion object includes a method for constructing differentiable functions.

object DiffbleFunction{
  def apply[A, B](f: => A => B)(grd: => A => (B => A)) = new DiffbleFunction[A, B]{
    def apply(a: A) = f(a)

    def grad(a: A) = grd(a)
  }
...
}

Feedbacks and differentiable functions.

The gradient plays a crucial role in learning by back-propagation. Namely, we try to learn the best parameter in $A$ given $f: A \to B$ and a feedback on $b = f(a)$ which is the deviation from the ideal $b$. If the feedback is $b'$ in $B$, we get a corresponding feedback on $A$ as $a' = grad(f)(a)(b')$. We learn by flowing in the direction of $a'$.

Composing differentiable functions

The composition of differentiable functions is a differentiable function with gradient given by the chain rule and taking an adjoint. In code:

trait DiffbleFunction[A, B] extends Any{self =>
    def apply(a: A): B

    def grad(a: A) : B => A

    /**
     * Composition f *: g is f(g(_))
     */
    def *:[C](that: => DiffbleFunction[B, C]) = andthen(that)

    def andthen[C](that: => DiffbleFunction[B, C]): DiffbleFunction[A, C] = DiffbleFunction((a: A) => that(this(a)))(
                            (a: A) =>
                                (c: C) =>
                                  grad(a)(that.grad(this(a))(c)))

Remark: This is a case where making the correct definition is helped greatly by type safety.

Conjugation: back-propagation of feedback

Given a function $g: B -> B$, which typically gives feedback on $B$, and a differentiable function $f: A -> B$, we can back-propagate the feedback to give a feedback $A -> A$ by conjugation, namely $a -> grad(f)(a)(g(f(a))). In code:

trait DiffbleFunction[A, B] extends Any{
  ...

  /**
  * Conjugate that by this.
  */
  def ^:(that: B => B) = (a : A) => grad(a)(that(apply(a)))

  ...
}

Often the feedback depends on $a$ in $A$, not just $b = f(a)$. In this case, we need another method which post-composes a feedback $A -> B$ (depending also on $b=f(a)$) by the gradient.

trait DiffbleFunction[A, B] extends Any{
  ...

  /**
   * post-compose by the gradient of this, for instance for a feedback.
  */
  def **:(that: A => B) = (a : A) => grad(a)(that(a))  

  ...
}

Direct sums

Given a pair of differentiable functions, we can define the direct sum of these:

trait DiffbleFunction[A, B] extends Any{self =>
  ...

  def oplus[C, D](that : DiffbleFunction[C, D]) = {
  def func(ac: (A, C)) = (this(ac._1), that(ac._2))

  def grad(ac: (A, C))(bd : (B, D)) = (self.grad(ac._1)(bd._1), that.grad(ac._2)(bd._2))

  DiffbleFunction(func)(grad)  
  }
 ...

 }

Most of the differentiable functions we construct will be built using linear structures, which in turn are built from those on probability distributions. We shall describe these in other pages.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly