I have two TFRecords A and B of different sizes and containing different data elements.
I need to take all possible pairs of records from A and B. Therefore, during training or testing, I would like the signal of epoch to end only when all combinations have been exhausted, after which the process should resume for the next epoch.
In doing this, of course, I would like to specify a batchsize.
I have gone through the documentation of tf.data.Dataset and have found nothing which does something like this.
Of course, if I were to write a python generator, this could be accomplished. But unfortunately, this is not useful because according to documentation, python generators will be bounded by the GIL i.e the global interpreter lock.
Thus, suppose that,
A contains {image1, image2, image3}, while B contains {im1, im2, im3, im4, im5, im6}. And I have specified a batchsize of 2. Then I would like the output to be something like following :
(image1, im1) and (image2, im4)
(image3, im2) and (image1, im2)
(image2, im1) and (image2, im3)
..............
15 more combinations
and then the next epoch starts.
How can that be achieved in TensorFlow ?