How to Read the Ftrecord File Tensorflow
The TFRecord format is a simple format for storing a sequence of binary records.
Protocol buffers are a cross-platform, cross-language library for efficient serialization of structured data.
Protocol messages are defined by .proto
files, these are often the easiest mode to understand a bulletin type.
The tf.railroad train.Case
message (or protobuf) is a flexible message type that represents a {"string": value}
mapping. It is designed for apply with TensorFlow and is used throughout the college-level APIs such every bit TFX.
This notebook demonstrates how to create, parse, and employ the tf.train.Case
bulletin, and so serialize, write, and read tf.railroad train.Example
letters to and from .tfrecord
files.
Setup
import tensorflow as tf import numpy as np import IPython.brandish as display
tf.train.Example
Data types for tf.train.Example
Fundamentally, a tf.railroad train.Example
is a {"string": tf.train.Feature}
mapping.
The tf.train.Feature
bulletin type can accept one of the following three types (See the .proto
file for reference). About other generic types can be coerced into i of these:
-
tf.train.BytesList
(the following types tin can be coerced)-
string
-
byte
-
-
tf.train.FloatList
(the following types can be coerced)-
bladder
(float32
) -
double
(float64
)
-
-
tf.railroad train.Int64List
(the post-obit types tin can exist coerced)-
bool
-
enum
-
int32
-
uint32
-
int64
-
uint64
-
In gild to catechumen a standard TensorFlow blazon to a tf.train.Example
-compatible tf.train.Feature
, you tin can use the shortcut functions beneath. Annotation that each function takes a scalar input value and returns a tf.train.Feature
containing one of the iii list
types higher up:
# The following functions can exist used to convert a value to a blazon uniform # with tf.railroad train.Example. def _bytes_feature(value): """Returns a bytes_list from a cord / byte.""" if isinstance(value, type(tf.abiding(0))): value = value.numpy() # BytesList won't unpack a cord from an EagerTensor. return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value])) def _float_feature(value): """Returns a float_list from a float / double.""" return tf.train.Feature(float_list=tf.train.FloatList(value=[value])) def _int64_feature(value): """Returns an int64_list from a bool / enum / int / uint.""" return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
Below are some examples of how these functions piece of work. Notation the varying input types and the standardized output types. If the input type for a function does not match ane of the coercible types stated above, the function will raise an exception (e.g. _int64_feature(ane.0)
will error out because 1.0
is a bladder—therefore, it should be used with the _float_feature
part instead):
impress(_bytes_feature(b'test_string')) print(_bytes_feature(u'test_bytes'.encode('utf-8'))) print(_float_feature(np.exp(1))) print(_int64_feature(True)) impress(_int64_feature(1))
bytes_list { value: "test_string" } bytes_list { value: "test_bytes" } float_list { value: 2.7182817459106445 } int64_list { value: 1 } int64_list { value: one }
All proto messages can exist serialized to a binary-cord using the .SerializeToString
method:
feature = _float_feature(np.exp(1)) characteristic.SerializeToString()
b'\x12\x06\n\x04T\xf8-@'
Creating a tf.train.Case
message
Suppose you want to create a tf.train.Instance
message from existing data. In do, the dataset may come up from anywhere, merely the process of creating the tf.railroad train.Instance
message from a unmarried observation will be the aforementioned:
-
Within each ascertainment, each value needs to be converted to a
tf.train.Feature
containing 1 of the 3 compatible types, using i of the functions above. -
You create a map (dictionary) from the characteristic proper noun cord to the encoded feature value produced in #1.
-
The map produced in step 2 is converted to a
Features
bulletin.
In this notebook, you lot will create a dataset using NumPy.
This dataset volition have 4 features:
- a boolean characteristic,
False
orTrue
with equal probability - an integer feature uniformly randomly chosen from
[0, 5]
- a string feature generated from a string table by using the integer characteristic every bit an index
- a bladder feature from a standard normal distribution
Consider a sample consisting of ten,000 independently and identically distributed observations from each of the above distributions:
# The number of observations in the dataset. n_observations = int(1e4) # Boolean feature, encoded as False or Truthful. feature0 = np.random.choice([Fake, Truthful], n_observations) # Integer feature, random from 0 to iv. feature1 = np.random.randint(0, 5, n_observations) # String feature. strings = np.array([b'true cat', b'canis familiaris', b'chicken', b'horse', b'caprine animal']) feature2 = strings[feature1] # Bladder characteristic, from a standard normal distribution. feature3 = np.random.randn(n_observations)
Each of these features can be coerced into a tf.train.Example
-compatible type using one of _bytes_feature
, _float_feature
, _int64_feature
. You can and so create a tf.train.Example
message from these encoded features:
def serialize_example(feature0, feature1, feature2, feature3): """ Creates a tf.train.Example message ready to be written to a file. """ # Create a dictionary mapping the feature name to the tf.train.Case-compatible # data type. feature = { 'feature0': _int64_feature(feature0), 'feature1': _int64_feature(feature1), 'feature2': _bytes_feature(feature2), 'feature3': _float_feature(feature3), } # Create a Features message using tf.train.Example. example_proto = tf.railroad train.Instance(features=tf.train.Features(feature=feature)) return example_proto.SerializeToString()
For instance, suppose you have a single ascertainment from the dataset, [Simulated, iv, bytes('goat'), 0.9876]
. Yous can create and impress the tf.train.Example
bulletin for this observation using create_message()
. Each unmarried observation will exist written every bit a Features
bulletin as per the above. Note that the tf.train.Example
message is just a wrapper around the Features
message:
# This is an case observation from the dataset. example_observation = [] serialized_example = serialize_example(False, 4, b'goat', 0.9876) serialized_example
b'\nR\n\x14\n\x08feature2\x12\x08\north\x06\due north\x04goat\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x04\due north\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x00\n\x14\n\x08feature3\x12\x08\x12\x06\northward\x04[\xd3|?'
To decode the message use the tf.train.Example.FromString
method.
example_proto = tf.train.Instance.FromString(serialized_example) example_proto
features { characteristic { cardinal: "feature0" value { int64_list { value: 0 } } } feature { key: "feature1" value { int64_list { value: 4 } } } characteristic { cardinal: "feature2" value { bytes_list { value: "goat" } } } feature { key: "feature3" value { float_list { value: 0.9876000285148621 } } } }
TFRecords format details
A TFRecord file contains a sequence of records. The file can only exist read sequentially.
Each record contains a byte-string, for the data-payload, plus the data-length, and CRC-32C (32-chip CRC using the Castagnoli polynomial) hashes for integrity checking.
Each record is stored in the following formats:
uint64 length uint32 masked_crc32_of_length byte data[length] uint32 masked_crc32_of_data
The records are concatenated together to produce the file. CRCs are described here, and the mask of a CRC is:
masked_crc = ((crc >> 15) | (crc << 17)) + 0xa282ead8ul
TFRecord files using tf.data
The tf.data
module also provides tools for reading and writing information in TensorFlow.
Writing a TFRecord file
The easiest way to get the data into a dataset is to use the from_tensor_slices
method.
Applied to an array, it returns a dataset of scalars:
tf.data.Dataset.from_tensor_slices(feature1)
<TensorSliceDataset element_spec=TensorSpec(shape=(), dtype=tf.int64, name=None)>
Applied to a tuple of arrays, it returns a dataset of tuples:
features_dataset = tf.data.Dataset.from_tensor_slices((feature0, feature1, feature2, feature3)) features_dataset
<TensorSliceDataset element_spec=(TensorSpec(shape=(), dtype=tf.bool, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None), TensorSpec(shape=(), dtype=tf.string, name=None), TensorSpec(shape=(), dtype=tf.float64, name=None))>
# Utilize `take(1)` to only pull one case from the dataset. for f0,f1,f2,f3 in features_dataset.have(one): print(f0) print(f1) print(f2) print(f3)
tf.Tensor(Fake, shape=(), dtype=bool) tf.Tensor(iv, shape=(), dtype=int64) tf.Tensor(b'caprine animal', shape=(), dtype=string) tf.Tensor(0.5251196235602504, shape=(), dtype=float64)
Utilise the tf.information.Dataset.map
method to utilise a part to each element of a Dataset
.
The mapped function must operate in TensorFlow graph mode—it must operate on and return tf.Tensors
. A non-tensor office, similar serialize_example
, can be wrapped with tf.py_function
to make it compatible.
Using tf.py_function
requires to specify the shape and blazon information that is otherwise unavailable:
def tf_serialize_example(f0,f1,f2,f3): tf_string = tf.py_function( serialize_example, (f0, f1, f2, f3), # Pass these args to the in a higher place function. tf.string) # The return type is `tf.string`. return tf.reshape(tf_string, ()) # The effect is a scalar.
tf_serialize_example(f0, f1, f2, f3)
<tf.Tensor: shape=(), dtype=string, numpy=b'\nR\n\x14\n\x08feature3\x12\x08\x12\x06\northward\x04=n\x06?\north\x11\northward\x08feature0\x12\x05\x1a\x03\northward\x01\x00\n\x14\northward\x08feature2\x12\x08\n\x06\n\x04goat\n\x11\n\x08feature1\x12\x05\x1a\x03\northward\x01\x04'>
Apply this function to each element in the dataset:
serialized_features_dataset = features_dataset.map(tf_serialize_example) serialized_features_dataset
<MapDataset element_spec=TensorSpec(shape=(), dtype=tf.string, name=None)>
def generator(): for features in features_dataset: yield serialize_example(*features)
serialized_features_dataset = tf.data.Dataset.from_generator( generator, output_types=tf.cord, output_shapes=())
serialized_features_dataset
<FlatMapDataset element_spec=TensorSpec(shape=(), dtype=tf.string, proper noun=None)>
And write them to a TFRecord file:
filename = 'examination.tfrecord' author = tf.information.experimental.TFRecordWriter(filename) writer.write(serialized_features_dataset)
WARNING:tensorflow:From /tmp/ipykernel_25215/3575438268.py:2: TFRecordWriter.__init__ (from tensorflow.python.data.experimental.ops.writers) is deprecated and volition be removed in a future version. Instructions for updating: To write TFRecords to deejay, use `tf.io.TFRecordWriter`. To save and load the contents of a dataset, use `tf.information.experimental.relieve` and `tf.data.experimental.load`
Reading a TFRecord file
You can likewise read the TFRecord file using the tf.data.TFRecordDataset
course.
More than information on consuming TFRecord files using tf.data
tin can exist found in the tf.information: Build TensorFlow input pipelines guide.
Using TFRecordDataset
s tin be useful for standardizing input data and optimizing performance.
filenames = [filename] raw_dataset = tf.data.TFRecordDataset(filenames) raw_dataset
<TFRecordDatasetV2 element_spec=TensorSpec(shape=(), dtype=tf.string, name=None)>
At this point the dataset contains serialized tf.train.Example
messages. When iterated over it returns these as scalar string tensors.
Utilize the .have
method to only evidence the showtime 10 records.
for raw_record in raw_dataset.take(10): print(repr(raw_record))
<tf.Tensor: shape=(), dtype=string, numpy=b'\nR\n\x11\north\x08feature0\x12\x05\x1a\x03\n\x01\x00\n\x11\north\x08feature1\x12\x05\x1a\x03\n\x01\x04\n\x14\north\x08feature2\x12\x08\due north\x06\n\x04goat\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04=northward\x06?'> <tf.Tensor: shape=(), dtype=string, numpy=b'\nR\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x04\due north\x11\northward\x08feature0\x12\x05\x1a\x03\northward\x01\x01\n\x14\northward\x08feature3\x12\x08\x12\x06\northward\x04\x9d\xfa\x98\xbe\north\x14\due north\x08feature2\x12\x08\n\x06\n\x04goat'> <tf.Tensor: shape=(), dtype=string, numpy=b'\nQ\north\x13\n\x08feature2\x12\x07\n\x05\n\x03dog\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x01\n\x14\northward\x08feature3\x12\x08\x12\x06\n\x04a\xc0r?\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x01'> <tf.Tensor: shape=(), dtype=cord, numpy=b'\nQ\northward\x11\north\x08feature0\x12\x05\x1a\x03\n\x01\x01\n\x11\northward\x08feature1\x12\x05\x1a\x03\n\x01\x00\n\x13\northward\x08feature2\x12\x07\n\x05\northward\x03cat\due north\x14\n\x08feature3\x12\x08\x12\x06\northward\x04\x92Q(?'> <tf.Tensor: shape=(), dtype=cord, numpy=b'\nR\n\x14\north\x08feature2\x12\x08\n\x06\n\x04goat\n\x14\n\x08feature3\x12\x08\x12\x06\north\x04>\xc0\xe5>\due north\x11\n\x08feature0\x12\x05\x1a\x03\northward\x01\x01\due north\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x04'> <tf.Tensor: shape=(), dtype=string, numpy=b'\nU\n\x14\northward\x08feature3\x12\x08\x12\x06\due north\x04I!\xde\xbe\north\x11\northward\x08feature1\x12\x05\x1a\x03\n\x01\x02\northward\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x00\n\x17\n\x08feature2\x12\x0b\n\t\n\x07chicken'> <tf.Tensor: shape=(), dtype=string, numpy=b'\nQ\north\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x00\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x01\north\x14\north\x08feature3\x12\x08\x12\x06\n\x04\xe0\x1a\xab\xbf\n\x13\n\x08feature2\x12\x07\n\x05\n\x03cat'> <tf.Tensor: shape=(), dtype=string, numpy=b'\nQ\n\x13\north\x08feature2\x12\x07\north\x05\n\x03cat\due north\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x01\n\x14\north\x08feature3\x12\x08\x12\x06\n\x04\x87\xb2\xd7?\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x00'> <tf.Tensor: shape=(), dtype=string, numpy=b'\nR\n\x11\north\x08feature1\x12\x05\x1a\x03\northward\x01\x04\north\x11\n\x08feature0\x12\x05\x1a\x03\north\x01\x01\northward\x14\northward\x08feature3\x12\x08\x12\x06\n\x04n\xe19>\n\x14\n\x08feature2\x12\x08\n\x06\n\x04goat'> <tf.Tensor: shape=(), dtype=string, numpy=b'\nR\north\x14\northward\x08feature3\x12\x08\x12\x06\north\x04\x1as\xd9\xbf\n\x11\north\x08feature0\x12\x05\x1a\x03\n\x01\x01\north\x11\north\x08feature1\x12\x05\x1a\x03\north\x01\x04\n\x14\n\x08feature2\x12\x08\n\x06\n\x04goat'>
These tensors tin can be parsed using the office beneath. Note that the feature_description
is necessary here considering tf.data.Dataset
s apply graph-execution, and demand this clarification to build their shape and type signature:
# Create a description of the features. feature_description = { 'feature0': tf.io.FixedLenFeature([], tf.int64, default_value=0), 'feature1': tf.io.FixedLenFeature([], tf.int64, default_value=0), 'feature2': tf.io.FixedLenFeature([], tf.string, default_value=''), 'feature3': tf.io.FixedLenFeature([], tf.float32, default_value=0.0), } def _parse_function(example_proto): # Parse the input `tf.train.Example` proto using the dictionary above. return tf.io.parse_single_example(example_proto, feature_description)
Alternatively, utilise tf.parse example
to parse the whole batch at one time. Utilise this function to each item in the dataset using the tf.information.Dataset.map
method:
parsed_dataset = raw_dataset.map(_parse_function) parsed_dataset
<MapDataset element_spec={'feature0': TensorSpec(shape=(), dtype=tf.int64, name=None), 'feature1': TensorSpec(shape=(), dtype=tf.int64, name=None), 'feature2': TensorSpec(shape=(), dtype=tf.string, proper noun=None), 'feature3': TensorSpec(shape=(), dtype=tf.float32, proper name=None)}>
Use eager execution to brandish the observations in the dataset. At that place are 10,000 observations in this dataset, merely you will just display the commencement 10. The data is displayed equally a dictionary of features. Each item is a tf.Tensor
, and the numpy
chemical element of this tensor displays the value of the characteristic:
for parsed_record in parsed_dataset.take(10): print(repr(parsed_record))
{'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=iv>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'goat'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=0.5251196>} {'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=i>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=4>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'goat'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=-0.29878703>} {'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=one>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=1>, 'feature2': <tf.Tensor: shape=(), dtype=cord, numpy=b'dog'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=0.94824797>} {'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=ane>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'cat'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=0.65749466>} {'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=i>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=4>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'goat'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=0.44873232>} {'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=two>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'chicken'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=-0.4338477>} {'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=1>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'cat'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=-1.3367577>} {'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=1>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=0>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'true cat'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=1.6851357>} {'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=ane>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=four>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'goat'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=0.18152401>} {'feature0': <tf.Tensor: shape=(), dtype=int64, numpy=ane>, 'feature1': <tf.Tensor: shape=(), dtype=int64, numpy=4>, 'feature2': <tf.Tensor: shape=(), dtype=string, numpy=b'goat'>, 'feature3': <tf.Tensor: shape=(), dtype=float32, numpy=-one.6988251>}
Here, the tf.parse_example
part unpacks the tf.railroad train.Example
fields into standard tensors.
TFRecord files in Python
The tf.io
module likewise contains pure-Python functions for reading and writing TFRecord files.
Writing a TFRecord file
Next, write the x,000 observations to the file exam.tfrecord
. Each observation is converted to a tf.train.Case
bulletin, and then written to file. You can and so verify that the file test.tfrecord
has been created:
# Write the `tf.train.Example` observations to the file. with tf.io.TFRecordWriter(filename) every bit writer: for i in range(n_observations): example = serialize_example(feature0[i], feature1[i], feature2[i], feature3[i]) writer.write(example)
du -sh {filename}
984K test.tfrecord
Reading a TFRecord file
These serialized tensors can be hands parsed using tf.train.Example.ParseFromString
:
filenames = [filename] raw_dataset = tf.information.TFRecordDataset(filenames) raw_dataset
<TFRecordDatasetV2 element_spec=TensorSpec(shape=(), dtype=tf.string, name=None)>
for raw_record in raw_dataset.take(one): example = tf.railroad train.Case() example.ParseFromString(raw_record.numpy()) print(example)
features { feature { cardinal: "feature0" value { int64_list { value: 0 } } } characteristic { key: "feature1" value { int64_list { value: 4 } } } characteristic { key: "feature2" value { bytes_list { value: "goat" } } } characteristic { cardinal: "feature3" value { float_list { value: 0.5251196026802063 } } } }
That returns a tf.train.Instance
proto which is dificult to utilise as is, but it's fundamentally a representation of a:
Dict[str, Union[Listing[float], List[int], Listing[str]]]
The following lawmaking manually converts the Example
to a lexicon of NumPy arrays, without using TensorFlow Ops. Refer to the PROTO file for detials.
outcome = {} # example.features.characteristic is the lexicon for key, feature in case.features.feature.items(): # The values are the Characteristic objects which incorporate a `kind` which contains: # i of three fields: bytes_list, float_list, int64_list kind = feature.WhichOneof('kind') outcome[fundamental] = np.array(getattr(feature, kind).value) result
{'feature3': assortment([0.5251196]), 'feature1': assortment([4]), 'feature0': array([0]), 'feature2': array([b'caprine animal'], dtype='|S4')}
Walkthrough: Reading and writing prototype information
This is an terminate-to-end case of how to read and write epitome data using TFRecords. Using an image as input data, you volition write the information equally a TFRecord file, then read the file dorsum and display the image.
This can exist useful if, for example, you lot want to use several models on the same input dataset. Instead of storing the prototype data raw, it tin can exist preprocessed into the TFRecords format, and that can exist used in all farther processing and modelling.
Get-go, permit's download this prototype of a cat in the snow and this photo of the Williamsburg Bridge, NYC under construction.
Fetch the images
cat_in_snow = tf.keras.utils.get_file( '320px-Felis_catus-cat_on_snow.jpg', 'https://storage.googleapis.com/download.tensorflow.org/example_images/320px-Felis_catus-cat_on_snow.jpg') williamsburg_bridge = tf.keras.utils.get_file( '194px-New_East_River_Bridge_from_Brooklyn_det.4a09796u.jpg', 'https://storage.googleapis.com/download.tensorflow.org/example_images/194px-New_East_River_Bridge_from_Brooklyn_det.4a09796u.jpg')
Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/320px-Felis_catus-cat_on_snow.jpg 24576/17858 [=========================================] - 0s 0us/step 32768/17858 [=======================================================] - 0s 0us/step Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/194px-New_East_River_Bridge_from_Brooklyn_det.4a09796u.jpg 16384/15477 [===============================] - 0s 0us/stride 24576/15477 [===============================================] - 0s 0us/step
display.display(display.Image(filename=cat_in_snow)) display.brandish(display.HTML('Image cc-by: <a "href=https://commons.wikimedia.org/wiki/File:Felis_catus-cat_on_snow.jpg">Von.grzanka</a>'))
display.display(brandish.Prototype(filename=williamsburg_bridge)) display.display(display.HTML('<a "href=https://commons.wikimedia.org/wiki/File:New_East_River_Bridge_from_Brooklyn_det.4a09796u.jpg">From Wikimedia</a>'))
Write the TFRecord file
Every bit earlier, encode the features as types compatible with tf.train.Example
. This stores the raw paradigm string feature, besides as the height, width, depth, and capricious characterization
characteristic. The latter is used when you write the file to distinguish between the cat prototype and the span prototype. Use 0
for the cat prototype, and 1
for the bridge image:
image_labels = { cat_in_snow : 0, williamsburg_bridge : ane, }
# This is an instance, but using the cat paradigm. image_string = open up(cat_in_snow, 'rb').read() characterization = image_labels[cat_in_snow] # Create a dictionary with features that may be relevant. def image_example(image_string, label): image_shape = tf.io.decode_jpeg(image_string).shape feature = { 'height': _int64_feature(image_shape[0]), 'width': _int64_feature(image_shape[1]), 'depth': _int64_feature(image_shape[ii]), 'label': _int64_feature(label), 'image_raw': _bytes_feature(image_string), } return tf.railroad train.Example(features=tf.train.Features(feature=characteristic)) for line in str(image_example(image_string, label)).split('\n')[:fifteen]: impress(line) impress('...')
features { feature { key: "depth" value { int64_list { value: three } } } feature { key: "height" value { int64_list { value: 213 } ...
Observe that all of the features are now stored in the tf.train.Case
message. Side by side, functionalize the lawmaking in a higher place and write the case letters to a file named images.tfrecords
:
# Write the raw image files to `images.tfrecords`. # Showtime, process the two images into `tf.railroad train.Example` messages. # So, write to a `.tfrecords` file. record_file = 'images.tfrecords' with tf.io.TFRecordWriter(record_file) as writer: for filename, characterization in image_labels.items(): image_string = open up(filename, 'rb').read() tf_example = image_example(image_string, label) writer.write(tf_example.SerializeToString())
du -sh {record_file}
36K images.tfrecords
Read the TFRecord file
You now have the file—images.tfrecords
—and tin now iterate over the records in it to read dorsum what yous wrote. Given that in this example you will simply reproduce the epitome, the only feature you will need is the raw image string. Excerpt it using the getters described above, namely example.features.feature['image_raw'].bytes_list.value[0]
. You tin can also utilise the labels to determine which tape is the cat and which 1 is the bridge:
raw_image_dataset = tf.data.TFRecordDataset('images.tfrecords') # Create a lexicon describing the features. image_feature_description = { 'top': tf.io.FixedLenFeature([], tf.int64), 'width': tf.io.FixedLenFeature([], tf.int64), 'depth': tf.io.FixedLenFeature([], tf.int64), 'label': tf.io.FixedLenFeature([], tf.int64), 'image_raw': tf.io.FixedLenFeature([], tf.string), } def _parse_image_function(example_proto): # Parse the input tf.train.Example proto using the dictionary above. return tf.io.parse_single_example(example_proto, image_feature_description) parsed_image_dataset = raw_image_dataset.map(_parse_image_function) parsed_image_dataset
<MapDataset element_spec={'depth': TensorSpec(shape=(), dtype=tf.int64, name=None), 'height': TensorSpec(shape=(), dtype=tf.int64, proper noun=None), 'image_raw': TensorSpec(shape=(), dtype=tf.string, proper noun=None), 'label': TensorSpec(shape=(), dtype=tf.int64, proper name=None), 'width': TensorSpec(shape=(), dtype=tf.int64, name=None)}>
Recover the images from the TFRecord file:
for image_features in parsed_image_dataset: image_raw = image_features['image_raw'].numpy() brandish.display(display.Image(data=image_raw))
Source: https://www.tensorflow.org/tutorials/load_data/tfrecord
0 Response to "How to Read the Ftrecord File Tensorflow"
Publicar un comentario