Trouble reading large file via Stdin

What happens

I need to read a large data file (~64k lines) “words.txt” to solve CA’s #131 challenge. But using cat to feed the code these words line by line is too slow.

What do you understand or find about that problem

I am using cat words.txt DATA.lst | ./[my filename] to first send the words then the problem’s input data, however cat takes a looong time to send them all, I tried to measure the time but around the 3 minutes mark not even half of the 66k words were in yet and my editor nearly crashed. This process also gets the poor Thinkpad’s cooling fan revving loudly and tends to crash before the code can deliver any output.

You make any workaround? What did you do?

I first tried to simply read the entire word file in one go and then read the DATA.lst line by line from stdin, the entire running time including the code’s own processing was around ~5 seconds and I found out the other accepted answers also try this approach.

(Optional) Why fails your workaround?

It doesn’t, but apparently it doesn’t comply with the submissions’ input rule (#6) either and my MR was closed. Though I’d argue that the rule only explicitly mentions that the code must read the DATA.lst file, and my code complies with that.

Evidences

  • Before input change (read entire file approach)
    Nearly instantly, or taking 5 seconds at most, with the code’s output.
    image

  • After input change (feed words.txt file through StdIn line by line)
    I am showing the length of the list to gauge the reading progress, at around 22k words in out of 64k total it crashed.
    image

I need help with

Either relaxing the input rule for this scenario a bit or a fast way to process this file without the code crashing and my laptop’s cooling fan performing its best attempt to mimic a wind tunnel. :wind_face: :dizzy_face:

Hi! this problem It’s not produced by reading a big input the problem is due to the way that you are using, to create the list that you gonna post-process

image

FileContents ++ [Line] it’s a pretty slow way to create your list of data, I already found a way that can create a complete list with both files DATA.lst and word.txt in less than a second, and indeed you implement that function in your last MR

The idea is to read all the data at the same time, divide in two list if you consider necessary and post-process

Can you be more specific with how you got it? I tried using read_file_as_string but I can’t get the input to work at all to be honest

Since the problem is to read the stdin I just implement this function, you don’t need Words var…

image

And execute cat DATA.lst words.lst | ./test

Yeah that was it, the helper pred was causing that issue, thanks.