I am reading input through stdin (hadoop streaming in reducer).
I need to detect when last record comes in. I am running for loop on stdin data.
I tried to read the stdin first to calculate the total records and then again read to proceed with business processing, but as soon as I read a record from stdin
to calculate total_cnt then the records goes out from the stream and later when I try to read stdin for processing there is no record in stdin.
total_cnt = 0
for line in stdin:
total cnt += 1
for line in stdin:
##Some Processing##
I don't want to store stdin to somewhere and read data from that location twice (1. total record count and 2. data processing).
Is there any way I can detect when last record comes in from stdin?
I am using python version 2.7.11 and need to implement this in approach in Hadoop reducer.