Eager execution is a feature that makes TensorFlow execute operations immediately: concrete values are returned, instead of a computational graph to be executed later.
Getting Started
Eager execution is a NumPy-like library for numerical computation with support for GPU acceleration and automatic differentiation, and a flexible platform for machine learning research and experimentation. It’s available as tf.contrib.eager, starting with version 1.50 of TensorFlow.
In particular, not all TensorFlow APIs currently work with eager execution enabled, and some models may be slow to execute, compared to models defined without using eager execution.
With TensorFlow installed, eager execution is enabled via a single call:1
2
3import tensorflow as tf # version >= 1.50
import tensorflow.contrib.eager as tfe
tfe.enable_eager_execution()
Enabling eager execution changes how TensorFlow functions behave (in particular, Tensor
objects will reference concrete values instead of being symbolic handles to nodes in a computational graph). As a result, eager execution should be enabled at the beginning of a program and cannot be disabled afterwards in the same program.
Eager execution simplifies your code
You no longer need to worry about …:
- placeholders
- sessions
- control dependencies
- “lazy loading”
- {name, variable, op} scopes
Boilerplate
1 | x = tf.placeholder(tf.float32, shape=[1, 1]) |
Lazy Loading
1 | x = tf.random_uniform([2, 2]) |
Tensors Act Like NumPy Arrays
1 | x = tf.constant([1.0, 2.0, 3.0]) |
Gradients
TensorFlow eager execution provides an autograd-style API for automatic differentiation.
- tfe.gradients_function()
- tfe.value_and_gradients_function()
- tfe.implicit_gradients()
- tfe.implicit_value_and_gradients()
tfe.gradients_function()
Returns a Python function that computes the derivatives of the Python functionf
with respect to its arguments.f
must return a scalar value. When the returned function is invoked, it returns a list ofTensor
objects (one element for each argument off
).1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19def f(x, y):
return x ** 2 + y ** 2
g = tfe.gradients_function(f)
g(2., 3.) # [<tf.Tensor: id=36, shape=(), dtype=float32, numpy=4.0>, <tf.Tensor: id=17, shape=(), dtype=float32, numpy=6.0>]
def f(x):
return tf.multiply(x, x) # Or x * x
assert 9 == f(3.).numpy()
df = tfe.gradients_function(f)
assert 6 == df(3.)[0].numpy()
# Second order deriviative.
d2f = tfe.gradients_function(lambda x: df(x)[0])
assert 2 == d2f(3.)[0].numpy()
# Third order derivative.
d3f = tfe.gradients_function(lambda x : d2f(x)\[0\])
assert 0 == d3f(3.)[0].numpy()
tfe.value_and_gradients_function(f)
: Similar to tfe.gradients_function
, except that when the returned function is invoked, it returns the value of f
in addition to the list of derivatives of f
with respect to its arguments.
tfe.implicit_gradients()
tfe.gradients_function的功能是对函数的输入参数求导,但在实际使用中,我们往往更希望对TensorFlow中的变量(Variable)求导,因为变量中保存的是模型的参数,这才是我们真正要优化、做梯度下降的部分。tfe.implicit_gradients的功能就是,可以对“计算过程中所有用到的变量”求导。
1 | vx = tfe.Variable(initial_value=1.0, name="vx") |
我们定义了两个变量vx和vy,但在f在计算过程中,只用到了vx这个变量,所以只会对vx求导,相应的导数为2 * x ,这就是g所要做的计算。g的返回值是一个列表,列表中以(梯度,变量)的形式存储了所有计算的梯度的值和变量的值。这里就应当是[(4, 1)]。
1 | def loss_fn(...): |
For example, the linear regression model described above can be written into a class:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36class Model(object):
def __init__(self):
self.W = tfe.Variable(5., name='weight')
self.B = tfe.Variable(10., name='bias')
def predict(self, inputs):
return inputs * self.W + self.B
# The loss function to be optimized
def loss(model, inputs, targets):
error = model.predict(inputs) - targets
return tf.reduce_mean(tf.square(error))
# A toy dataset of points around 3 * x + 2
NUM_EXAMPLES = 1000
training_inputs = tf.random_normal([NUM_EXAMPLES])
noise = tf.random_normal([NUM_EXAMPLES])
training_outputs = training_inputs * 3 + 2 + noise
# Define:
# 1. A model
# 2. Derivatives of a loss function with respect to model parameters
# 3. A strategy for updating the variables based on the derivatives
model = Model()
grad = tfe.implicit_gradients(loss)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
# The training loop
print("Initial loss: %f" % loss(model, training_inputs, training_outputs).numpy())
for i in range(201):
optimizer.apply_gradients(grad(model, training_inputs, training_outputs))
if i % 20 == 0:
print("Loss at step %d: %f" %
(i, loss(model, training_inputs, training_outputs).numpy()))
print("Final loss: %f" % loss(model, training_inputs, training_outputs).numpy())
print("W, B = %s, %s" % (model.W.numpy(), model.B.numpy()))