Introduction:

TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. It is the way you apply a machine learning model after you have trained it and is used for easier deployment of  machine learning models.

It is written in C++ and supports gRPC and RESTful API.The C++ internally is used to build the binaries and also handle saving and loading of tensorflow model and different versions of those model.

Tensorflow Serving Architecture:

Screenshot from 2018-08-15 14-47-26

Here, the Source is used to identify different versions of tensorflow models that is to be loaded using the file system plugin.Thus, the created loader is used to identify the requirements for a models to be loaded  and specifies memory, cpus or gpus required to load a model.Then the model is loaded by manager when the resources are ready.Then the model interacts with client with gRPC or REST API.

Features:

  1. Multiple Models:- It supports multiple models. Different versions of same model can be loaded into serving or different models can be loaded. Thus, provides the benefit as the experimental model can be loaded into serving while a production model is already running in the server.
  2. Isolation:- It provides multiple thread for multiple model.So that they can use the required resources without interfering the resources used by other model.
  3. Low Latency:- It provides high throughput with dynamic request batching technique.It also handles asynchronous request in optimized manner.

Protobuf:

Protobuf(Protocol Buffers) is the way of serializing structure data in efficient format.Google uses Protocol Buffers for almost all of its internal RPC protocols and file formats and thus the models in tensorflow serving.

Getting Started:

Now you have successfully setup tensorflow serving environment.

        Here we assume that we already have trained model and saved checkpoint file.

Export model into protobuf format:-

We have briefly discussed about protobuf file above.The freezed tensorflow model is saved in this format which we also refer as SavedModel.We can create protobuf file from previously saved tensorflow checkpoint.The checkpoint carries information of tensorflow model like weights and biases and model structure which is also recorded in protobuf.The difference between checkpoint and SavedModel is that checkpoint is dependent on the code of the model whereas SavedModel is independent of the code that created the model.So, checkpoint is mainly saved as intermediate files while the model is training and SavedModel is used export those model in different environment like tensorflow serving.

We can save both tensorflow checkpoint files and keras checkpoints. Keras save checkpoint file in hdf5 format.

Firstly let’s look at the code to save tensorflow model into protobuf format:

import os
import sys
import shutil
import tensorflow as tf

sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(__file__))))

from cnn_model import conv_bias, conv_weight, conv_model

# Command line arguments
tf.app.flags.DEFINE_string('checkpoint_dir', '../../data/cnn_model',
                          """Directory where to read training checkpoints.""")
tf.app.flags.DEFINE_string('output_dir', '../../data',
                          """Directory where to export the model.""")
tf.app.flags.DEFINE_integer('model_version', 1,
                           """Version number of the model.""")
FLAGS = tf.app.flags.FLAGS

def preprocess_image(image_buffer):
    image = tf.image.decode_jpeg(image_buffer, channels=3)
    image = tf.image.convert_image_dtype(image, dtype=tf.float32)
    image = tf.subtract(image, 0.5)
    image = tf.multiply(image, 2.0)
    return image

def main(_):
    with tf.Graph().as_default():
     
       serialized_tf_example = tf.placeholder(tf.string, name='input_image')
       feature_configs = {
           'image/encoded': tf.FixedLenFeature(
               shape=[], dtype=tf.string),
       }

       tf_example = tf.parse_example(serialized_tf_example, feature_configs)
       jpegs = tf_example['image/encoded']
       images = tf.map_fn(preprocess_image, jpegs, dtype=tf.float32)

       # Create Image Classification model
       learning_rate = 0.01
       _dropout = 1.0

       weight = conv_weight()
       bias = conv_bias()

       y = conv_model(images, weight, bias, _dropout=_dropout, train=True)
       y_softmax = tf.nn.softmax(y, name="y_output")

# Create saver to restore from checkpoints
 saver = tf.train.Saver()

       with tf.Session() as sess:
           # Restore the model from last checkpoints
           ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
           saver.restore(sess, ckpt.model_checkpoint_path)

           # (re-)create export directory
           export_path = os.path.join(
               tf.compat.as_bytes(FLAGS.output_dir),
               tf.compat.as_bytes(str(FLAGS.model_version)))
           if os.path.exists(export_path):
               shutil.rmtree(export_path)

           # create model builder
           builder = tf.saved_model.builder.SavedModelBuilder(export_path)

           # create tensors info
           predict_tensor_inputs_info = tf.saved_model.utils.build_tensor_info(jpegs)
           predict_tensor_scores_info = tf.saved_model.utils.build_tensor_info(
               y_softmax)

           # build prediction signature
           prediction_signature = (
               tf.saved_model.signature_def_utils.build_signature_def(
                   inputs={'images': predict_tensor_inputs_info},
                   outputs={'scores': predict_tensor_scores_info},
                                  method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME
               )
           )

           # save the model
           legacy_init_op = tf.group(tf.tables_initializer(), name='legacy_init_op')
           builder.add_meta_graph_and_variables(
               sess, [tf.saved_model.tag_constants.SERVING],
               signature_def_map={
                   'predict_images': prediction_signature
               },
               legacy_init_op=legacy_init_op)

           builder.save()

    print("Successfully exported Image Classification model version '{}' into '{}'".format(
       FLAGS.model_version, FLAGS.output_dir))

if __name__ == '__main__':
    tf.app.run()

Now let us dive into code one by one:

serialized_tf_example = tf.placeholder(tf.string, name='input_image')
       feature_configs = {
           'image/encoded': tf.FixedLenFeature(
               shape=[], dtype=tf.string),
       }

tf_example = tf.parse_example(serialized_tf_example, feature_configs)
jpegs = tf_example['image/encoded']
images = tf.map_fn(preprocess_image, jpegs, dtype=tf.float32)

Here, serialized_tf_example takes serialized image.

And feature_configs defines expected image type which will be image/encoded for jpeg if you pass jpeg image and type of the feature which is tf.sting here.The tf_example parses the passes serialized_tf_example and gets jpeg format.Then the jpeg is converted to tensor format which is float32 in this case which then can be passed to our conv_model().

def preprocess_image(image_buffer):
    image = tf.image.decode_jpeg(image_buffer, channels=3)
    image = tf.image.convert_image_dtype(image, dtype=tf.float32)
    image = tf.subtract(image, 0.5)
    image = tf.multiply(image, 2.0)
    return image

The transformation process is  carried out using preprocess_image() function where the given image string is decoded to jpeg format which is then converted to float32 tensor and then the pixels rescaled to [-1, 1].

prediction_signature = (
               tf.saved_model.signature_def_utils.build_signature_def(
                   inputs={'images': predict_tensor_inputs_info},
                   outputs={'scores': predict_tensor_scores_info},
                   method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME
               )
           )

Then we have to create SignatureDefs also called Signature Definition which carries information about model so that it can be later loaded by server.SignatureDefs is composed of three components:

  1. Input as a map of string to TensorInfo i.e. predict_tensor_inputs_info
  2. Output as a map of string to TensorInfo i.e. predict_tensor_scores_info
  3. Method Name which is existing method in loading or tooling system like i.e.”tensorflow/serving/classify”, “tensorflow/serving/predict”, “tensorflow/serving/regress”.Here, we are using tensorflow/serving/predict.

legacy_init_op = tf.group(tf.tables_initializer(), name='legacy_init_op')

builder.add_meta_graph_and_variables(
sess, [tf.saved_model.tag_constants.SERVING],
signature_def_map={
'predict_images': prediction_signature
},
legacy_init_op=legacy_init_op)

builder.save()

print("Successfully exported Image Classification model version '{}' into '{}'".format(
FLAGS.model_version, FLAGS.output_dir))

Then we save the model.The model is saved in protobuf format.The model is saved with    version number associated to it as its folder name which is then identified by tensorflow serving for running it.

The saved model folder is in format:

    |-output_dir

        |-Version Number i.e 1, 2,..

            |-variables  |-saved_model.pb

The model is then served in previously created environment for tensorflow_serving as:

docker run -p 8501:8501 \
–mount type=bind,\
source=absolute_path_to_path_to_image_classification_model,\
target=/models/model_folder_name \
-e MODEL_NAME=model_name -t tensorflow/serving

The path to image classification model is path to output_dir in this case.

If model name is not defined it takes default as model name.Now your exported model is running in the server with specified port.

Create Client Request:

You can use both gRPC or RESTful request to connect to tensorflow server.The above syntax can  respond to gRPC request of client at port 8500 and RESTful request at port 8501.

Let us install  grpcio and grpcio-tools using pip using command:

pip install grpcio grpcio-tools

And for client code let us install:

pip install tensorflow-serving-api

Firstly, lets look into client code of gRPC request to tensorflow model server:

import time
from argparse import ArgumentParser

# Communication to TensorFlow server via gRPC
import grpc
from grpc.beta import implementations
from PIL import Image
import tensorflow as tf

# TensorFlow serving stuff to send messages
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
from tensorflow.contrib.util import make_tensor_proto

from tensorflow.contrib.util import make_tensor_proto

from os import listdir
from os.path import isfile, join
import io

def parse_args():

  parser = ArgumentParser(description='Request a TensorFlow server for a prediction on the image')
  parser.add_argument('-s', '--server',
                      dest='server',
                      default='localhost:9000',
                      help='prediction service host:port')

  parser.add_argument('-i', '--image_path',
                      dest='image_path',
                      default='',
                      help='path to images folder', )
  parser.add_argument('-b', '--batch_mode',
                      dest='batch_mode',
                      default='true',
                      help='send image as batch or one-by-one')
  args = parser.parse_args()

  host, port = args.server.split(':')
  return host, port, args.image_path, args.batch_mode == 'true'

def main():

  # parse command line arguments
  host, port, image_path, batch_mode = parse_args()
  host = host + ":" + str(int(port))
  # host = 'localhost:9000'
  channel = grpc.insecure_channel(host)
  stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

  filenames = [(image_path + '/' + f) for f in listdir(image_path) if isfile(join(image_path, f))]
  files = []
  imagedata = []

for filename in filenames:

 img = Image.open(filename).convert("RGB")
 image = img.resize((100, 100))

 imgByteArr = io.BytesIO()
 image.save(imgByteArr, format='JPEG')
 imgByteArr = imgByteArr.getvalue()

 imagedata.append(imgByteArr)

  start = time.time()

if  batch_mode:
      print('In batch mode')
      request = predict_pb2.PredictRequest()
      request.model_spec.name = 'image_classification'
      request.model_spec.signature_name = 'predict_images'

      request.inputs['images'].CopyFrom(make_tensor_proto(imagedata, shape=[len(imagedata)]))

      result = stub.Predict(request, 60.0)
      print(result)

else:
      print('In one-by-one mode')
      for data in imagedata:
          request = predict_pb2.PredictRequest()
          request.model_spec.name = 'image_classification'
          request.model_spec.signature_name = 'predict_images'

          request.inputs['images'].CopyFrom(make_tensor_proto(data, shape=[1]))

          result = stub.Predict(request, 60.0)  # 60 secs timeout
          print(result)

  end = time.time()
  time_diff = end - start
  print('time elapased: {}'.format(time_diff))

if __name__ == '__main__':
  main()

Now let us look in code for client side:

channel = grpc.insecure_channel(host)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

This code establishes a connection to the url where model server is running i.e.host which may be in the form localhost:9000 for our case.Here, we have created prediction stub which will be later used to predict output as per client request.

request = predict_pb2.PredictRequest()
request.model_spec.name = 'image_classification'
request.model_spec.signature_name = 'predict_images'

request.inputs['images'].CopyFrom(make_tensor_proto(data, shape=[1]))
result = stub.Predict(request, 60.0)

We have to be aware that we have passed model_spec.name as image_classification here.This must be same as the model name specified while running our tensorflow server and model_spec.signature_name is the one we specified while creating the SavedModel.Then the result is predicted using stub.Predict().

Now, let’s look at the code of RESTful request.

import time
from argparse import ArgumentParser
from PIL import Image
import requests
import base64
from os import listdir
from os.path import isfile, join
import io

def parse_args():
    parser = ArgumentParser(description='Request a TensorFlow server for a prediction on the image')
    parser.add_argument('-s', '--server',
                       dest='server',
                       default='localhost:9000',
                       help='prediction service host:port')
    parser.add_argument('-i', '--image_path',
                       dest='image_path',
                       default='',
                       help='path to images folder', )
    parser.add_argument('-b', '--batch_mode',
                       dest='batch_mode',
                       default='true',
                       help='send image as batch or one-by-one')
    args = parser.parse_args()

    host, port = args.server.split(':')

    return host, port, args.image_path, args.batch_mode == 'true'

def classify(host, model, signature, images):
    json_data ={"signature_name": signature, "instances": images}
    r = requests.post('http://{}/v1/models/{}:predict'.format(host, model),
       json=json_data)
    return r.json()

def main():
    # parse command line arguments
    host, port, image_path, batch_mode = parse_args()
    host = host + ":" + str(int(port))
    # host = 'localhost:9000'

    filenames = [(image_path + '/' + f) for f in listdir(image_path) if isfile(join(image_path, f))]
    imagedata = []
    for filename in filenames:
       # read image and resize
       img = Image.open(filename).convert("RGB")
       image = img.resize((100, 100))

       # encode img to jpg
       imgByteArr = io.BytesIO()
       image.save(imgByteArr, format='JPEG')
       imgByteArr = imgByteArr.getvalue()

       # encode jpg img to b64 encoding
       imagedata.append({"images": { "b64": base64.b64encode(imgByteArr) }})

    start = time.time()

    if batch_mode:
       print('In batch mode')
       result = classify(host, 'image_classification', 'predict_images', imagedata)

       print(result)
    else:
       print('In one-by-one mode')
       for data in imagedata:
           result = classify(host, 'image_classification',              'predict_images', [data])
           print(result)

    end = time.time()
    time_diff = end - start
    print('time elapased: {}'.format(time_diff))

if __name__ == '__main__':
    main()

imagedata.append({"images": { "b64": base64.b64encode(imgByteArr) }})

Here, we have to observe that the image is encoded into base64 format.When we have to use input that need to be binary values such as images we must base64 encode the input.

result = classify(host, 'image_classification', 'predict_images', imagedata)

The result for our saved model can be obtained by passing the request in format specified above in classify() function.

Then we can make request to our server as:

python3 image_classification_client.py –server=localhost:8501–image_path=path_to_image_classification_data_directory –batch_mode=false

Thus we successfully made request to our tensorflow server.

Note: Making grpc request with tensorflow serving api overrides existing tensorflow installation code if previously installed.So, if you want to use above code directly it is suggested to use it with bazel.Otherwise, simply remove tf.app code and substitute it with parser.add_argument() format while taking arguments.

Now let us view the way we can save our keras checkpoint file to SavedModel.

The model can be save using following code:

import os
from tensorflow.python.keras import backend as K
from tensorflow.python.keras.models import load_model
from tensorflow.python.saved_model import builder as saved_model_builder
from tensorflow.python.saved_model.signature_def_utils import predict_signature_def
from tensorflow.python.saved_model import tag_constants

# Function to export Keras model to Protocol Buffer format
# Inputs:
#       path_to_h5: Path to Keras h5 model
#       export_path: Path to store Protocol Buffer model
def export_h5_to_pb(path_to_h5, export_path):
  # Set the learning phase to Test since the model is already trained.
  K.set_learning_phase(0)

  # Load the Keras model
  keras_model = load_model(path_to_h5)
  # Build the Protocol Buffer SavedModel at 'export_path'
  builder = saved_model_builder.SavedModelBuilder(export_path)

  # Create prediction signature to be used by TensorFlow Serving Predict API
  signature = predict_signature_def(
      inputs={"images": keras_model.input},
      outputs={"scores": keras_model.output})

  with K.get_session() as sess:
      # Save the meta graph and the variables
      builder.add_meta_graph_and_variables(
          sess=sess,
          tags=[tag_constants.SERVING],
          signature_def_map={"predict": signature})
  builder.save()

We can serve the saved model in same way as above and make request in same way as per requirement of the  model.

Conclusion:

Getting started with tensorflow serving might be quite difficult but tensorflow serving serves as great tool for deploying tensorflow model.It provides you with out-of-the box option for hosting tensorflow in a standard way and can be scaled to Kubernetes and AWS.

References:

  1. https://www.tensorflow.org/serving/
  2. https://towardsdatascience.com/how-to-deploy-machine-learning-models-with-tensorflow-part-1-make-your-model-ready-for-serving-776a14ec3198
  3. https://www.youtube.com/watch?v=q_IkJcPyNl0&t=529s
  4. https://mc.ai/serving-image-based-deep-learning-models-with-tensorflow-servings-restful-api/
  5. https://www.youtube.com/watch?v=PbiYll21Jxg&list=PLZdsdjcJ44WU_cY2Y1LFLnmsSjFD5BZLZ
  6. https://medium.com/@johnsondsouza23/export-keras-model-to-protobuf-for-tensorflow-serving-101ad6c65142
  7. https://www.tensorflow.org/serving/docker

Leave a Reply

Your email address will not be published. Required fields are marked *