When I run code like:

with tf.device('/GPU:0'):
  images = tf.random_crop(images, [IMAGE_HEIGHT, IMAGE_WIDTH, IMAGE_CHANNELS])
...

it reports:

Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.

Looks operation tf.random_crop() doen’t have CUDA kernel implementation. Therefore I need to write it myself. The solution is surprisingly simple: write a function to do random_crop on one image by using tf.random_uniform() and tf.slice(), and then use tf.map_fn() to apply it on multi-images.

def my_random_crop(value, size):
    shape = tf.shape(value)
    size = tf.convert_to_tensor(size, dtype = tf.int32)
    limit = shape - size + 1
    offset = tf.random_uniform(tf.shape(shape), dtype = size.dtype, maxval = size.dtype.max) % limit
    return tf.slice(value, offset, size)
...
images = tf.map_fn(lambda img: my_random_crop(img, [IMAGE_HEIGHT, IMAGE_WIDTH, IMAGE_CHANNELS]), images)

It can run on GPU now.