基于cs20si的Tensorflow笔记,课程主页;
本节主要内容:Linear and Logistic Regression
Linear Regression
Model relationship between a scalar dependent variable y and independent variables X.
Data
- X : number of incidents of fire in The City of Chicago
- Y : number of incidents of theft in The City of Chicago
Data file in https://github.com/chiphuyen/stanford-tensorflow-tutorials/blob/master/data/fire_theft.xls
Code
1 | # -*-coding:utf-8 -*- |
Analyze the Code
1 | optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01).minimize(loss) # learning rate can be a tensor |
- Why is train_op in the fetches list of tf.Session.run()?
We can actually pass any TensorFlow ops as fetches in tf.Session.run(). TensorFlow will execute the part of the graph that those ops depend on. In this case, we see that train_op
has the purpose of minimize loss, and loss depends on variables w and b. - How does TensorFlow know what variables to update?
Session looks at all trainable variables that optimizer depends on and update them.
Optimizers
By default, the optimizer(gradient descent) trains all the trainable variables whose objective function depend on. If there are variables that you do not want to train, you can set the keyword trainable to False when you declare a variable.1
2
3tf.Variable(initial_value=None, trainable=True, collections=None,
validate_shape=True, caching_device=None, name=None, variable_def=None, dtype=None,
expected_shape=None, import_scope=None)
One example of a variable you don’t want to train is the variable global_step, a common variable you will see in many TensorFlow model to keep track of how many times you’ve run your model.You can also ask your optimizer to take gradients of specific variables. You can also modify the gradients calculated by your optimizer.1
2
3
4
5
6
7
8
9# create an optimizer.
optimizer = GradientDescentOptimizer(learning_rate=0.1)
# compute the gradients for a list of variables.
grads_and_vars = opt.compute_gradients(loss, <list of variables>)
# grads_and_vars is a list of tuples (gradient, variable). Do whatever you
# need to the 'gradient' part, for example, subtract each of them by 1.
subtracted_grads_and_vars = [(gv[0] - 1.0, gv[1]) for gv in grads_and_vars]
# ask the optimizer to apply the subtracted gradients.
optimizer.apply_gradients(subtracted_grads_and_vars)
The optimizer classes automatically compute derivatives on your graph, but creators of new Optimizers or expert users can call the lower-level functions below :1
2tf.gradients(ys, xs, grad_ys=None, name='gradients',
colocate_gradients_with_ops=False, gate_gradients=False, aggregation_method=None)
This method constructs symbolic partial derivatives of sum of ys w.r.t. x in xs. ys and xs are each a Tensor or a list of tensors. grad_ys is a list of Tensor, holding the gradients received by the ys. The list must be the same length as ys.
_Technical detail: This is especially useful when training only parts of a model. For example, we can use tf.gradients() for to take the derivative G of the loss w.r.t. to the middle layer. Then we use an optimizer to minimize the difference between the middle layer output M and M + G. This only updates the lower half of the network._
List of optimizers
- tf.train.GradientDescentOptimizer
- tf.train.AdagradOptimizer
- tf.train.MomentumOptimizer
- tf.train.AdamOptimizer
- tf.train.ProximalGradientDescentOptimizer
- tf.train.ProximalAdagradOptimizer
- tf.train.RMSPropOptimizer
- And more
_More details in https://www.tensorflow.org/api_docs/python/train/_“RMSprop is an extension of Adagrad that deals with its radically diminishing learning rates. It is identical to Adadelta, except that Adadelta uses the RMS of parameter updates in the numerator update rule. Adam, finally, adds bias-correction and momentum to RMSprop. Insofar, RMSprop, Adadelta, and Adam are very similar algorithms that do well in similar circumstances. Kingma et al. [15] show that its bias-correction helps Adam slightly outperform RMSprop towards the end of optimization as gradients become sparser. Insofar, Adam might be the best overall choice.” from Sebastian Ruder’s Blog
Huber loss
该损失函数结合square loss与absolute loss,如果预测值与真实值相差很小,取均方误差;如果相差很大,取绝对值误差。1
2
3
4
5
6def huber_loss(labels, predictions, delta=1.0):
residual = tf.abs(predictions - labels)
condition = tf.less(residual, delta)
small_res = 0.5 * tf.square(residual)
large_res = delta * residual - 0.5 * tf.square(delta)
return tf.cond(condition, lambda:small_res, lambda:large_res) # 条件语句 https://www.tensorflow.org/api_docs/python/tf/cond