In Tensorflow, we could use Optimizer to train model:

train_op = optimizer.minimize(loss, global_step=global_step)
....
sess.run([train_op], feed_dict={x: batch, y_: batch})


But sometimes, model need to be split to two parts and trained separately, so we need to compute gradients and apply them by two steps:

first_part_vars = []
second_part_vars = []

for var in tf.trainable_variables():
if var.name.find("first_part/") >= 0:
first_part_vars.append(var)
if var.name.find("second_part/") >= 0:
second_part_vars.append(var)
....

....


Then how could we delivery gradients from first part to second part? Here is the equation to answer:

$latex \frac{\partial Loss} {\partial W_{second-part}} = \frac{\partial Loss} {\partial IV} \cdot \frac{\partial IV} {\partial W_{second-part}} &s=4$

The $latex IV &s=1$ means ‘intermediate vector’, which is the interface vector between first-part and second-part and it is belong to both first-part and second-part. The $latex W_{second-part} &s=1$ is the weights of second part of model. Therefore we could use tf.gradients() to connect gradients of two parts:

first_grads_and_vars = opt.compute_gradients(loss, var_list=[first_part_vars.extend(iv)])