Prior to this change, there was a circular reference between Leaf and
Variable. This means that the objects (and referenced Tensors) are not
collected as soon as they go out of scope, which lead to higher memory
usage and out-of-memory errors.
* _forward is renamed forward since users should override it
* some __call__ overrides are changed to forward
* function which return a single variable are changed to return that
variable instead of a one-element tuple