Tensorflow_note_3

基于cs20si的Tensorflow笔记,课程主页
本节主要内容:Linear and Logistic Regression

Linear Regression

Model relationship between a scalar dependent variable y and independent variables X.

Data

  • X : number of incidents of fire in The City of Chicago
  • Y : number of incidents of theft in The City of Chicago

Data file in https://github.com/chiphuyen/stanford-tensorflow-tutorials/blob/master/data/fire_theft.xls

Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# -*-coding:utf-8 -*-
# Created by Helic on 17-9-8

import tensorflow as tf
import numpy
import matplotlib.pyplot as plt
import xlrd

data_location = 'data/fire_theft.xls'

# Step 1: read in data from the .xls file
book = xlrd.open_workbook(data_location, encoding_override="utf-8")
sheet = book.sheet_by_index(0)
data = numpy.asarray([sheet.row_values(i) for i in range(1, sheet.nrows)])
n_samples = sheet.nrows - 1

# Step 2: create placeholders for input X (number of fire) and label Y (number of theft)
X = tf.placeholder(tf.float32, name="X")
Y = tf.placeholder(tf.float32, name="Y")

# Step 3: create weight and bias, initialized to 0
w = tf.Variable(0.0, name="weights")
b = tf.Variable(0.0, name="bias")

# Step 4: construct model to predict Y (number of theft) from the number of fire
Y_predicted = X * w + b

# Step 5: use the square error as the loss function
loss = tf.square(Y - Y_predicted, name="loss")

# Step 6: using gradient descent with learning rate of 0.01 to minimize loss
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001).minimize(loss)

with tf.Session() as sess:
# Step 7: initialize the necessary variables, in this case, w and b
sess.run(tf.global_variables_initializer())
# Step 8: train the model
for i in range(100): # run 100 epochs
total_loss = 0
for x, y in data:
# Session runs train_op to minimize loss
# Session runs train_op and fetch values of loss
_, l = sess.run([optimizer, loss], feed_dict={X: x, Y: y})
total_loss += l
print('Epoch {0}: {1}'.format(i, total_loss / n_samples))

# Step 9: output the values of w and b
w_value, b_value = sess.run([w, b])
print(w_value, b_value)
# print(data.T[0], data.T[1]) # 转置
# plot the results
X, Y = data.T[0], data.T[1]
plt.plot(X, Y, 'bo', label='Real data')
plt.plot(X, X * w_value + b_value, 'r', label='Predicted data')
plt.legend()
plt.show()

TensorBoard
Matplotlib

Analyze the Code

1
2
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01).minimize(loss) # learning rate can be a tensor
sess.run(optimizer, feed_dict={X: x, Y:y})
  1. Why is train_op in the fetches list of tf.Session.run()?
    We can actually pass any TensorFlow ops as fetches in tf.Session.run(). TensorFlow will execute the part of the graph that those ops depend on. In this case, we see that train_op
    has the purpose of minimize loss, and loss depends on variables w and b.
  2. How does TensorFlow know what variables to update?
    Session looks at all trainable variables that optimizer depends on and update them.

Optimizers

By default, the optimizer(gradient descent) trains all the trainable variables whose objective function depend on. If there are variables that you do not want to train, you can set the keyword trainable to False when you declare a variable.

1
2
3
tf.Variable(initial_value=None, trainable=True, collections=None,
validate_shape=True, caching_device=None, name=None, variable_def=None, dtype=None,
expected_shape=None, import_scope=None)

One example of a variable you don’t want to train is the variable global_step, a common variable you will see in many TensorFlow model to keep track of how many times you’ve run your model.You can also ask your optimizer to take gradients of specific variables. You can also modify the gradients calculated by your optimizer.

1
2
3
4
5
6
7
8
9
# create an optimizer.
optimizer = GradientDescentOptimizer(learning_rate=0.1)
# compute the gradients for a list of variables.
grads_and_vars = opt.compute_gradients(loss, <list of variables>)
# grads_and_vars is a list of tuples (gradient, variable). Do whatever you
# need to the 'gradient' part, for example, subtract each of them by 1.
subtracted_grads_and_vars = [(gv[0] - 1.0, gv[1]) for gv in grads_and_vars]
# ask the optimizer to apply the subtracted gradients.
optimizer.apply_gradients(subtracted_grads_and_vars)

The optimizer classes automatically compute derivatives on your graph, but creators of new Optimizers or expert users can call the lower-level functions below :

1
2
tf.gradients(ys, xs, grad_ys=None, name='gradients',
colocate_gradients_with_ops=False, gate_gradients=False, aggregation_method=None)

This method constructs symbolic partial derivatives of sum of ys w.r.t. x in xs. ys and xs are each a Tensor or a list of tensors. grad_ys is a list of Tensor, holding the gradients received by the ys. The list must be the same length as ys.
_Technical detail: This is especially useful when training only parts of a model. For example, we can use tf.gradients() for to take the derivative G of the loss w.r.t. to the middle layer. Then we use an optimizer to minimize the difference between the middle layer output M and M + G. This only updates the lower half of the network._

List of optimizers

  • tf.train.GradientDescentOptimizer
  • tf.train.AdagradOptimizer
  • tf.train.MomentumOptimizer
  • tf.train.AdamOptimizer
  • tf.train.ProximalGradientDescentOptimizer
  • tf.train.ProximalAdagradOptimizer
  • tf.train.RMSPropOptimizer
  • And more
    _More details in https://www.tensorflow.org/api_docs/python/train/_

    “RMSprop is an extension of Adagrad that deals with its radically diminishing learning rates. It is identical to Adadelta, except that Adadelta uses the RMS of parameter updates in the numerator update rule. Adam, finally, adds bias-correction and momentum to RMSprop. Insofar, RMSprop, Adadelta, and Adam are very similar algorithms that do well in similar circumstances. Kingma et al. [15] show that its bias-correction helps Adam slightly outperform RMSprop towards the end of optimization as gradients become sparser. Insofar, Adam might be the best overall choice.” from Sebastian Ruder’s Blog

    Huber loss

    该损失函数结合square loss与absolute loss,如果预测值与真实值相差很小,取均方误差;如果相差很大,取绝对值误差。
    Huber loss
    1
    2
    3
    4
    5
    6
    def huber_loss(labels, predictions, delta=1.0):
    residual = tf.abs(predictions - labels)
    condition = tf.less(residual, delta)
    small_res = 0.5 * tf.square(residual)
    large_res = delta * residual - 0.5 * tf.square(delta)
    return tf.cond(condition, lambda:small_res, lambda:large_res) # 条件语句 https://www.tensorflow.org/api_docs/python/tf/cond