Skip to content

Differentiable Functions

Siddhartha Gadgil edited this page Dec 10, 2019 · 3 revisions

Obsolete documentation

The code explained below is no longer used (at least not much).

Learning system combinators.

A crucial ingredient for building learning systems is what we call differentiable functions. We use the term differentiable function for a function with a gradient (more precisely an adjoint derivative), i.e., (an abstraction of) a differentiable function on a vector space with inner product.

Differentiable functions

If A and B are Euclidean spaces (i.e. $R^n$ and $R^m$), a differentiable function $f: A \to B$ has a a derivative at a point $a$ in $A$, which is a linear transformation $Df(a): A \to B$. In the case where $B= R$, the linear transformation is the inner product with a vector, the gradient $grad(a)$. In general, we can define the gradient as the adjoint linear transformation, i.e., the transpose of the matrix of $Df(a)$.

Thus, we can define the gradient as a function.

$$grad: A \to B \to A$$

In scala code, we define differentiable functions as a trait consisting of functions with gradients:

trait DiffbleFunction[A, B]{self =>
    def apply(a: A): B

    def grad(a: A) : B => A

...
}

The companion object includes a method for constructing differentiable functions.

object DiffbleFunction{
  def apply[A, B](f: => A => B)(grd: => A => (B => A)) = new DiffbleFunction[A, B]{
    def apply(a: A) = f(a)

    def grad(a: A) = grd(a)
  }
...
}

Feedbacks and differentiable functions.

The gradient plays a crucial role in learning by back-propagation. Namely, we try to learn the best parameter in $A$ given $f: A \to B$ and a feedback on $b = f(a)$ which is the deviation from the ideal $b$. If the feedback is $b'$ in $B$, we get a corresponding feedback on $A$ as $a' = grad(f)(a)(b')$. We learn by flowing in the direction of $a'$.

Composing differentiable functions

The composition of differentiable functions is a differentiable function with gradient given by the chain rule and taking an adjoint. In code:

trait DiffbleFunction[A, B] extends Any{self =>
    def apply(a: A): B

    def grad(a: A) : B => A

    /**
     * Composition f *: g is f(g(_))
     */
    def *:[C](that: => DiffbleFunction[B, C]) = andthen(that)

    def andthen[C](that: => DiffbleFunction[B, C]): DiffbleFunction[A, C] = DiffbleFunction((a: A) => that(this(a)))(
                            (a: A) =>
                                (c: C) =>
                                  grad(a)(that.grad(this(a))(c)))

Remark: This is a case where making the correct definition is helped greatly by type safety.

Conjugation: back-propagation of feedback

Given a function $g: B -> B$, which typically gives feedback on $B$, and a differentiable function $f: A -> B$, we can back-propagate the feedback to give a feedback $A -> A$ by conjugation, namely $a -> grad(f)(a)(g(f(a))). In code:

trait DiffbleFunction[A, B] extends Any{
  ...

  /**
  * Conjugate that by this.
  */
  def ^:(that: B => B) = (a : A) => grad(a)(that(apply(a)))

  ...
}

Often the feedback depends on $a$ in $A$, not just $b = f(a)$. In this case, we need another method which post-composes a feedback $A -> B$ (depending also on $b=f(a)$) by the gradient.

trait DiffbleFunction[A, B] extends Any{
  ...

  /**
   * post-compose by the gradient of this, for instance for a feedback.
  */
  def **:(that: A => B) = (a : A) => grad(a)(that(a))  

  ...
}

Direct sums

Given a pair of differentiable functions, we can define the direct sum of these:

trait DiffbleFunction[A, B] extends Any{self =>
  ...

  def oplus[C, D](that : DiffbleFunction[C, D]) = {
  def func(ac: (A, C)) = (this(ac._1), that(ac._2))

  def grad(ac: (A, C))(bd : (B, D)) = (self.grad(ac._1)(bd._1), that.grad(ac._2)(bd._2))

  DiffbleFunction(func)(grad)  
  }
 ...

 }

Most of the differentiable functions we construct will be built using linear structures, which in turn are built from those on probability distributions. We shall describe these in other pages.