How to Read the Ftrecord File Tensorflow

The TFRecord format is a simple format for storing a sequence of binary records.

Protocol buffers are a cross-platform, cross-language library for efficient serialization of structured data.

Protocol messages are defined by .proto files, these are often the easiest mode to understand a bulletin type.

The tf.railroad train.Case message (or protobuf) is a flexible message type that represents a {"string": value} mapping. It is designed for apply with TensorFlow and is used throughout the college-level APIs such every bit TFX.

This notebook demonstrates how to create, parse, and employ the tf.train.Case bulletin, and so serialize, write, and read tf.railroad train.Example letters to and from .tfrecord files.

Setup

          import tensorflow as tf  import numpy as np import IPython.brandish as display                  

tf.train.Example

Data types for tf.train.Example

Fundamentally, a tf.railroad train.Example is a {"string": tf.train.Feature} mapping.

The tf.train.Feature bulletin type can accept one of the following three types (See the .proto file for reference). About other generic types can be coerced into i of these:

  1. tf.train.BytesList (the following types tin can be coerced)

    • string
    • byte
  2. tf.train.FloatList (the following types can be coerced)

    • bladder (float32)
    • double (float64)
  3. tf.railroad train.Int64List (the post-obit types tin can exist coerced)

    • bool
    • enum
    • int32
    • uint32
    • int64
    • uint64

In gild to catechumen a standard TensorFlow blazon to a tf.train.Example-compatible tf.train.Feature, you tin can use the shortcut functions beneath. Annotation that each function takes a scalar input value and returns a tf.train.Feature containing one of the iii list types higher up:

          # The following functions can exist used to convert a value to a blazon uniform # with tf.railroad train.Example.  def _bytes_feature(value):   """Returns a bytes_list from a cord / byte."""   if isinstance(value, type(tf.abiding(0))):     value = value.numpy() # BytesList won't unpack a cord from an EagerTensor.   return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))  def _float_feature(value):   """Returns a float_list from a float / double."""   return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))  def _int64_feature(value):   """Returns an int64_list from a bool / enum / int / uint."""   return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))                  

Below are some examples of how these functions piece of work. Notation the varying input types and the standardized output types. If the input type for a function does not match ane of the coercible types stated above, the function will raise an exception (e.g. _int64_feature(ane.0) will error out because 1.0 is a bladder—therefore, it should be used with the _float_feature part instead):

          impress(_bytes_feature(b'test_string')) print(_bytes_feature(u'test_bytes'.encode('utf-8')))  print(_float_feature(np.exp(1)))  print(_int64_feature(True)) impress(_int64_feature(1))                  
bytes_list {   value: "test_string" }  bytes_list {   value: "test_bytes" }  float_list {   value: 2.7182817459106445 }  int64_list {   value: 1 }  int64_list {   value: one }        

All proto messages can exist serialized to a binary-cord using the .SerializeToString method:

          feature = _float_feature(np.exp(1))  characteristic.SerializeToString()                  
b'\x12\x06\n\x04T\xf8-@'        

Creating a tf.train.Case message

Suppose you want to create a tf.train.Instance message from existing data. In do, the dataset may come up from anywhere, merely the process of creating the tf.railroad train.Instance message from a unmarried observation will be the aforementioned:

  1. Within each ascertainment, each value needs to be converted to a tf.train.Feature containing 1 of the 3 compatible types, using i of the functions above.

  2. You create a map (dictionary) from the characteristic proper noun cord to the encoded feature value produced in #1.

  3. The map produced in step 2 is converted to a Features bulletin.

In this notebook, you lot will create a dataset using NumPy.

This dataset volition have 4 features:

  • a boolean characteristic, False or True with equal probability
  • an integer feature uniformly randomly chosen from [0, 5]
  • a string feature generated from a string table by using the integer characteristic every bit an index
  • a bladder feature from a standard normal distribution

Consider a sample consisting of ten,000 independently and identically distributed observations from each of the above distributions:

          # The number of observations in the dataset. n_observations = int(1e4)  # Boolean feature, encoded as False or Truthful. feature0 = np.random.choice([Fake, Truthful], n_observations)  # Integer feature, random from 0 to iv. feature1 = np.random.randint(0, 5, n_observations)  # String feature. strings = np.array([b'true cat', b'canis familiaris', b'chicken', b'horse', b'caprine animal']) feature2 = strings[feature1]  # Bladder characteristic, from a standard normal distribution. feature3 = np.random.randn(n_observations)                  

Each of these features can be coerced into a tf.train.Example-compatible type using one of _bytes_feature, _float_feature, _int64_feature. You can and so create a tf.train.Example message from these encoded features:

          def serialize_example(feature0, feature1, feature2, feature3):   """   Creates a tf.train.Example message ready to be written to a file.   """   # Create a dictionary mapping the feature name to the tf.train.Case-compatible   # data type.   feature = {       'feature0': _int64_feature(feature0),       'feature1': _int64_feature(feature1),       'feature2': _bytes_feature(feature2),       'feature3': _float_feature(feature3),   }    # Create a Features message using tf.train.Example.    example_proto = tf.railroad train.Instance(features=tf.train.Features(feature=feature))   return example_proto.SerializeToString()                  

For instance, suppose you have a single ascertainment from the dataset, [Simulated, iv, bytes('goat'), 0.9876]. Yous can create and impress the tf.train.Example bulletin for this observation using create_message(). Each unmarried observation will exist written every bit a Features bulletin as per the above. Note that the tf.train.Example message is just a wrapper around the Features message:

          # This is an case observation from the dataset.  example_observation = []  serialized_example = serialize_example(False, 4, b'goat', 0.9876) serialized_example                  
b'\nR\n\x14\n\x08feature2\x12\x08\north\x06\due north\x04goat\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x04\due north\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x00\n\x14\n\x08feature3\x12\x08\x12\x06\northward\x04[\xd3|?'        

To decode the message use the tf.train.Example.FromString method.

          example_proto = tf.train.Instance.FromString(serialized_example) example_proto                  
features {   characteristic {     cardinal: "feature0"     value {       int64_list {         value: 0       }     }   }   feature {     key: "feature1"     value {       int64_list {         value: 4       }     }   }   characteristic {     cardinal: "feature2"     value {       bytes_list {         value: "goat"       }     }   }   feature {     key: "feature3"     value {       float_list {         value: 0.9876000285148621       }     }   } }        

TFRecords format details

A TFRecord file contains a sequence of records. The file can only exist read sequentially.

Each record contains a byte-string, for the data-payload, plus the data-length, and CRC-32C (32-chip CRC using the Castagnoli polynomial) hashes for integrity checking.

Each record is stored in the following formats:

          uint64 length uint32 masked_crc32_of_length byte   data[length] uint32 masked_crc32_of_data                  

The records are concatenated together to produce the file. CRCs are described here, and the mask of a CRC is:

          masked_crc = ((crc >> 15) | (crc << 17)) + 0xa282ead8ul                  

TFRecord files using tf.data

The tf.data module also provides tools for reading and writing information in TensorFlow.

Writing a TFRecord file

The easiest way to get the data into a dataset is to use the from_tensor_slices method.

Applied to an array, it returns a dataset of scalars:

          tf.data.Dataset.from_tensor_slices(feature1)                  
<TensorSliceDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>        

Applied to a tuple of arrays, it returns a dataset of tuples:

          features_dataset = tf.data.Dataset.from_tensor_slices((feature0, feature1, feature2, feature3)) features_dataset                  
<TensorSliceDataset element_spec=(TensorSpec(shape=(), dtype=tf.bool, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None), TensorSpec(shape=(), dtype=tf.string, name=None), TensorSpec(shape=(), dtype=tf.float64, name=None))>        
          # Utilize `take(1)` to only pull one case from the dataset. for f0,f1,f2,f3 in features_dataset.have(one):   print(f0)   print(f1)   print(f2)   print(f3)                  
tf.Tensor(Fake, shape=(), dtype=bool) tf.Tensor(iv, shape=(), dtype=int64) tf.Tensor(b'caprine animal', shape=(), dtype=string) tf.Tensor(0.5251196235602504, shape=(), dtype=float64)        

Utilise the tf.information.Dataset.map method to utilise a part to each element of a Dataset.

The mapped function must operate in TensorFlow graph mode—it must operate on and return tf.Tensors. A non-tensor office, similar serialize_example, can be wrapped with tf.py_function to make it compatible.

Using tf.py_function requires to specify the shape and blazon information that is otherwise unavailable:

          def tf_serialize_example(f0,f1,f2,f3):   tf_string = tf.py_function(     serialize_example,     (f0, f1, f2, f3),  # Pass these args to the in a higher place function.     tf.string)      # The return type is `tf.string`.   return tf.reshape(tf_string, ()) # The effect is a scalar.                  
          tf_serialize_example(f0, f1, f2, f3)                  
<tf.Tensor: shape=(), dtype=string, numpy=b'\nR\n\x14\n\x08feature3\x12\x08\x12\x06\northward\x04=n\x06?\north\x11\northward\x08feature0\x12\x05\x1a\x03\northward\x01\x00\n\x14\northward\x08feature2\x12\x08\n\x06\n\x04goat\n\x11\n\x08feature1\x12\x05\x1a\x03\northward\x01\x04'>        

Apply this function to each element in the dataset:

          serialized_features_dataset = features_dataset.map(tf_serialize_example) serialized_features_dataset                  
<MapDataset element_spec=TensorSpec(shape=(), dtype=tf.string, name=None)>        
          def generator():   for features in features_dataset:     yield serialize_example(*features)                  
          serialized_features_dataset = tf.data.Dataset.from_generator(     generator, output_types=tf.cord, output_shapes=())                  
          serialized_features_dataset                  
<FlatMapDataset element_spec=TensorSpec(shape=(), dtype=tf.string, proper noun=None)>        

And write them to a TFRecord file:

          filename = 'examination.tfrecord' author = tf.information.experimental.TFRecordWriter(filename) writer.write(serialized_features_dataset)                  
WARNING:tensorflow:From /tmp/ipykernel_25215/3575438268.py:2: TFRecordWriter.__init__ (from tensorflow.python.data.experimental.ops.writers) is deprecated and volition be removed in a future version. Instructions for updating: To write TFRecords to deejay, use `tf.io.TFRecordWriter`. To save and load the contents of a dataset, use `tf.information.experimental.relieve` and `tf.data.experimental.load`        

Reading a TFRecord file

You can likewise read the TFRecord file using the tf.data.TFRecordDataset course.

More than information on consuming TFRecord files using tf.data tin can exist found in the tf.information: Build TensorFlow input pipelines guide.

Using TFRecordDatasets tin be useful for standardizing input data and optimizing performance.

          filenames = [filename] raw_dataset = tf.data.TFRecordDataset(filenames) raw_dataset                  
<TFRecordDatasetV2 element_spec=TensorSpec(shape=(), dtype=tf.string, name=None)>        

At this point the dataset contains serialized tf.train.Example messages. When iterated over it returns these as scalar string tensors.

Utilize the .have method to only evidence the showtime 10 records.

          for raw_record in raw_dataset.take(10):   print(repr(raw_record))                  
<tf.Tensor: shape=(), dtype=string, numpy=b'\nR\n\x11\north\x08feature0\x12\x05\x1a\x03\n\x01\x00\n\x11\north\x08feature1\x12\x05\x1a\x03\n\x01\x04\n\x14\north\x08feature2\x12\x08\due north\x06\n\x04goat\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04=northward\x06?'> <tf.Tensor: shape=(), dtype=string, numpy=b'\nR\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x04\due north\x11\northward\x08feature0\x12\x05\x1a\x03\northward\x01\x01\n\x14\northward\x08feature3\x12\x08\x12\x06\northward\x04\x9d\xfa\x98\xbe\north\x14\due north\x08feature2\x12\x08\n\x06\n\x04goat'> <tf.Tensor: shape=(), dtype=string, numpy=b'\nQ\north\x13\n\x08feature2\x12\x07\n\x05\n\x03dog\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x01\n\x14\northward\x08feature3\x12\x08\x12\x06\n\x04a\xc0r?\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x01'> <tf.Tensor: shape=(), dtype=cord, numpy=b'\nQ\northward\x11\north\x08feature0\x12\x05\x1a\x03\n\x01\x01\n\x11\northward\x08feature1\x12\x05\x1a\x03\n\x01\x00\n\x13\northward\x08feature2\x12\x07\n\x05\northward\x03cat\due north\x14\n\x08feature3\x12\x08\x12\x06\northward\x04\x92Q(?'> <tf.Tensor: shape=(), dtype=cord, numpy=b'\nR\n\x14\north\x08feature2\x12\x08\n\x06\n\x04goat\n\x14\n\x08feature3\x12\x08\x12\x06\north\x04>\xc0\xe5>\due north\x11\n\x08feature0\x12\x05\x1a\x03\northward\x01\x01\due north\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x04'> <tf.Tensor: shape=(), dtype=string, numpy=b'\nU\n\x14\northward\x08feature3\x12\x08\x12\x06\due north\x04I!\xde\xbe\north\x11\northward\x08feature1\x12\x05\x1a\x03\n\x01\x02\northward\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x00\n\x17\n\x08feature2\x12\x0b\n\t\n\x07chicken'> <tf.Tensor: shape=(), dtype=string, numpy=b'\nQ\north\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x00\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x01\north\x14\north\x08feature3\x12\x08\x12\x06\n\x04\xe0\x1a\xab\xbf\n\x13\n\x08feature2\x12\x07\n\x05\n\x03cat'> <tf.Tensor: shape=(), dtype=string, numpy=b'\nQ\n\x13\north\x08feature2\x12\x07\north\x05\n\x03cat\due north\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x01\n\x14\north\x08feature3\x12\x08\x12\x06\n\x04\x87\xb2\xd7?\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x00'> <tf.Tensor: shape=(), dtype=string, numpy=b'\nR\n\x11\north\x08feature1\x12\x05\x1a\x03\northward\x01\x04\north\x11\n\x08feature0\x12\x05\x1a\x03\north\x01\x01\northward\x14\northward\x08feature3\x12\x08\x12\x06\n\x04n\xe19>\n\x14\n\x08feature2\x12\x08\n\x06\n\x04goat'> <tf.Tensor: shape=(), dtype=string, numpy=b'\nR\north\x14\northward\x08feature3\x12\x08\x12\x06\north\x04\x1as\xd9\xbf\n\x11\north\x08feature0\x12\x05\x1a\x03\n\x01\x01\north\x11\north\x08feature1\x12\x05\x1a\x03\north\x01\x04\n\x14\n\x08feature2\x12\x08\n\x06\n\x04goat'>        

These tensors tin can be parsed using the office beneath. Note that the feature_description is necessary here considering tf.data.Datasets apply graph-execution, and demand this clarification to build their shape and type signature:

          # Create a description of the features. feature_description = {     'feature0': tf.io.FixedLenFeature([], tf.int64, default_value=0),     'feature1': tf.io.FixedLenFeature([], tf.int64, default_value=0),     'feature2': tf.io.FixedLenFeature([], tf.string, default_value=''),     'feature3': tf.io.FixedLenFeature([], tf.float32, default_value=0.0), }  def _parse_function(example_proto):   # Parse the input `tf.train.Example` proto using the dictionary above.   return tf.io.parse_single_example(example_proto, feature_description)                  

Alternatively, utilise tf.parse example to parse the whole batch at one time. Utilise this function to each item in the dataset using the tf.information.Dataset.map method:

          parsed_dataset = raw_dataset.map(_parse_function) parsed_dataset                  
<MapDataset element_spec={'feature0': TensorSpec(shape=(), dtype=tf.int64, name=None), 'feature1': TensorSpec(shape=(), dtype=tf.int64, name=None), 'feature2': TensorSpec(shape=(), dtype=tf.string, proper noun=None), 'feature3': TensorSpec(shape=(), dtype=tf.float32, proper name=None)}>        

Use eager execution to brandish the observations in the dataset. At that place are 10,000 observations in this dataset, merely you will just display the commencement 10. The data is displayed equally a dictionary of features. Each item is a tf.Tensor, and the numpy chemical element of this tensor displays the value of the characteristic:

          for parsed_record in parsed_dataset.take(10):   print(repr(parsed_record))                  
{'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=iv>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'goat'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=0.5251196>} {'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=i>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=4>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'goat'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=-0.29878703>} {'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=one>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=1>, 'feature2': <tf.Tensor: shape=(), dtype=cord, numpy=b'dog'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=0.94824797>} {'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=ane>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'cat'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=0.65749466>} {'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=i>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=4>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'goat'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=0.44873232>} {'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=two>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'chicken'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=-0.4338477>} {'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=1>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'cat'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=-1.3367577>} {'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=1>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'true cat'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=1.6851357>} {'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=ane>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=four>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'goat'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=0.18152401>} {'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=ane>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=4>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'goat'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=-one.6988251>}        

Here, the tf.parse_example part unpacks the tf.railroad train.Example fields into standard tensors.

TFRecord files in Python

The tf.io module likewise contains pure-Python functions for reading and writing TFRecord files.

Writing a TFRecord file

Next, write the x,000 observations to the file exam.tfrecord. Each observation is converted to a tf.train.Case bulletin, and then written to file. You can and so verify that the file test.tfrecord has been created:

          # Write the `tf.train.Example` observations to the file. with tf.io.TFRecordWriter(filename) every bit writer:   for i in range(n_observations):     example = serialize_example(feature0[i], feature1[i], feature2[i], feature3[i])     writer.write(example)                  
          du -sh {filename}        
984K    test.tfrecord        

Reading a TFRecord file

These serialized tensors can be hands parsed using tf.train.Example.ParseFromString:

          filenames = [filename] raw_dataset = tf.information.TFRecordDataset(filenames) raw_dataset                  
<TFRecordDatasetV2 element_spec=TensorSpec(shape=(), dtype=tf.string, name=None)>        
          for raw_record in raw_dataset.take(one):   example = tf.railroad train.Case()   example.ParseFromString(raw_record.numpy())   print(example)                  
features {   feature {     cardinal: "feature0"     value {       int64_list {         value: 0       }     }   }   characteristic {     key: "feature1"     value {       int64_list {         value: 4       }     }   }   characteristic {     key: "feature2"     value {       bytes_list {         value: "goat"       }     }   }   characteristic {     cardinal: "feature3"     value {       float_list {         value: 0.5251196026802063       }     }   } }        

That returns a tf.train.Instance proto which is dificult to utilise as is, but it's fundamentally a representation of a:

          Dict[str,      Union[Listing[float],            List[int],            Listing[str]]]                  

The following lawmaking manually converts the Example to a lexicon of NumPy arrays, without using TensorFlow Ops. Refer to the PROTO file for detials.

          outcome = {} # example.features.characteristic is the lexicon for key, feature in case.features.feature.items():   # The values are the Characteristic objects which incorporate a `kind` which contains:   # i of three fields: bytes_list, float_list, int64_list    kind = feature.WhichOneof('kind')   outcome[fundamental] = np.array(getattr(feature, kind).value)  result                  
{'feature3': assortment([0.5251196]),  'feature1': assortment([4]),  'feature0': array([0]),  'feature2': array([b'caprine animal'], dtype='|S4')}        

Walkthrough: Reading and writing prototype information

This is an terminate-to-end case of how to read and write epitome data using TFRecords. Using an image as input data, you volition write the information equally a TFRecord file, then read the file dorsum and display the image.

This can exist useful if, for example, you lot want to use several models on the same input dataset. Instead of storing the prototype data raw, it tin can exist preprocessed into the TFRecords format, and that can exist used in all farther processing and modelling.

Get-go, permit's download this prototype of a cat in the snow and this photo of the Williamsburg Bridge, NYC under construction.

Fetch the images

          cat_in_snow  = tf.keras.utils.get_file(     '320px-Felis_catus-cat_on_snow.jpg',     'https://storage.googleapis.com/download.tensorflow.org/example_images/320px-Felis_catus-cat_on_snow.jpg')  williamsburg_bridge = tf.keras.utils.get_file(     '194px-New_East_River_Bridge_from_Brooklyn_det.4a09796u.jpg',     'https://storage.googleapis.com/download.tensorflow.org/example_images/194px-New_East_River_Bridge_from_Brooklyn_det.4a09796u.jpg')                  
Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/320px-Felis_catus-cat_on_snow.jpg 24576/17858 [=========================================] - 0s 0us/step 32768/17858 [=======================================================] - 0s 0us/step Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/194px-New_East_River_Bridge_from_Brooklyn_det.4a09796u.jpg 16384/15477 [===============================] - 0s 0us/stride 24576/15477 [===============================================] - 0s 0us/step        
          display.display(display.Image(filename=cat_in_snow)) display.brandish(display.HTML('Image cc-by: <a "href=https://commons.wikimedia.org/wiki/File:Felis_catus-cat_on_snow.jpg">Von.grzanka</a>'))                  

jpeg

          display.display(brandish.Prototype(filename=williamsburg_bridge)) display.display(display.HTML('<a "href=https://commons.wikimedia.org/wiki/File:New_East_River_Bridge_from_Brooklyn_det.4a09796u.jpg">From Wikimedia</a>'))                  

jpeg

Write the TFRecord file

Every bit earlier, encode the features as types compatible with tf.train.Example. This stores the raw paradigm string feature, besides as the height, width, depth, and capricious characterization characteristic. The latter is used when you write the file to distinguish between the cat prototype and the span prototype. Use 0 for the cat prototype, and 1 for the bridge image:

          image_labels = {     cat_in_snow : 0,     williamsburg_bridge : ane, }                  
          # This is an instance, but using the cat paradigm. image_string = open up(cat_in_snow, 'rb').read()  characterization = image_labels[cat_in_snow]  # Create a dictionary with features that may be relevant. def image_example(image_string, label):   image_shape = tf.io.decode_jpeg(image_string).shape    feature = {       'height': _int64_feature(image_shape[0]),       'width': _int64_feature(image_shape[1]),       'depth': _int64_feature(image_shape[ii]),       'label': _int64_feature(label),       'image_raw': _bytes_feature(image_string),   }    return tf.railroad train.Example(features=tf.train.Features(feature=characteristic))  for line in str(image_example(image_string, label)).split('\n')[:fifteen]:   impress(line) impress('...')                  
features {   feature {     key: "depth"     value {       int64_list {         value: three       }     }   }   feature {     key: "height"     value {       int64_list {         value: 213       } ...        

Observe that all of the features are now stored in the tf.train.Case message. Side by side, functionalize the lawmaking in a higher place and write the case letters to a file named images.tfrecords:

          # Write the raw image files to `images.tfrecords`. # Showtime, process the two images into `tf.railroad train.Example` messages. # So, write to a `.tfrecords` file. record_file = 'images.tfrecords' with tf.io.TFRecordWriter(record_file) as writer:   for filename, characterization in image_labels.items():     image_string = open up(filename, 'rb').read()     tf_example = image_example(image_string, label)     writer.write(tf_example.SerializeToString())                  
          du -sh {record_file}        
36K images.tfrecords        

Read the TFRecord file

You now have the file—images.tfrecords—and tin now iterate over the records in it to read dorsum what yous wrote. Given that in this example you will simply reproduce the epitome, the only feature you will need is the raw image string. Excerpt it using the getters described above, namely example.features.feature['image_raw'].bytes_list.value[0]. You tin can also utilise the labels to determine which tape is the cat and which 1 is the bridge:

          raw_image_dataset = tf.data.TFRecordDataset('images.tfrecords')  # Create a lexicon describing the features. image_feature_description = {     'top': tf.io.FixedLenFeature([], tf.int64),     'width': tf.io.FixedLenFeature([], tf.int64),     'depth': tf.io.FixedLenFeature([], tf.int64),     'label': tf.io.FixedLenFeature([], tf.int64),     'image_raw': tf.io.FixedLenFeature([], tf.string), }  def _parse_image_function(example_proto):   # Parse the input tf.train.Example proto using the dictionary above.   return tf.io.parse_single_example(example_proto, image_feature_description)  parsed_image_dataset = raw_image_dataset.map(_parse_image_function) parsed_image_dataset                  
<MapDataset element_spec={'depth': TensorSpec(shape=(), dtype=tf.int64, name=None), 'height': TensorSpec(shape=(), dtype=tf.int64, proper noun=None), 'image_raw': TensorSpec(shape=(), dtype=tf.string, proper noun=None), 'label': TensorSpec(shape=(), dtype=tf.int64, proper name=None), 'width': TensorSpec(shape=(), dtype=tf.int64, name=None)}>        

Recover the images from the TFRecord file:

          for image_features in parsed_image_dataset:   image_raw = image_features['image_raw'].numpy()   brandish.display(display.Image(data=image_raw))                  

jpeg

jpeg

shawpongle.blogspot.com

Source: https://www.tensorflow.org/tutorials/load_data/tfrecord

0 Response to "How to Read the Ftrecord File Tensorflow"

Publicar un comentario

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel