Trouble reading large file via Stdin

gusic-fan · March 12, 2021, 12:13am

What happens

I need to read a large data file (~64k lines) “words.txt” to solve CA’s #131 challenge. But using cat to feed the code these words line by line is too slow.

What do you understand or find about that problem

I am using cat words.txt DATA.lst | ./[my filename] to first send the words then the problem’s input data, however cat takes a looong time to send them all, I tried to measure the time but around the 3 minutes mark not even half of the 66k words were in yet and my editor nearly crashed. This process also gets the poor Thinkpad’s cooling fan revving loudly and tends to crash before the code can deliver any output.

You make any workaround? What did you do?

I first tried to simply read the entire word file in one go and then read the DATA.lst line by line from stdin, the entire running time including the code’s own processing was around ~5 seconds and I found out the other accepted answers also try this approach.

(Optional) Why fails your workaround?

It doesn’t, but apparently it doesn’t comply with the submissions’ input rule (#6) either and my MR was closed. Though I’d argue that the rule only explicitly mentions that the code must read the DATA.lst file, and my code complies with that.

Evidences

Before input change (read entire file approach)
Nearly instantly, or taking 5 seconds at most, with the code’s output.
After input change (feed words.txt file through StdIn line by line)
I am showing the length of the list to gauge the reading progress, at around 22k words in out of 64k total it crashed.

I need help with

Either relaxing the input rule for this scenario a bit or a fast way to process this file without the code crashing and my laptop’s cooling fan performing its best attempt to mimic a wind tunnel.

uneasy-ruler · March 12, 2021, 2:39pm

Hi! this problem It’s not produced by reading a big input the problem is due to the way that you are using, to create the list that you gonna post-process

FileContents ++ [Line] it’s a pretty slow way to create your list of data, I already found a way that can create a complete list with both files DATA.lst and word.txt in less than a second, and indeed you implement that function in your last MR

The idea is to read all the data at the same time, divide in two list if you consider necessary and post-process

gusic-fan · March 12, 2021, 6:39pm

Can you be more specific with how you got it? I tried using read_file_as_string but I can’t get the input to work at all to be honest

uneasy-ruler · March 12, 2021, 6:53pm

Since the problem is to read the stdin I just implement this function, you don’t need Words var…

And execute cat DATA.lst words.lst | ./test

gusic-fan · March 12, 2021, 7:58pm

Yeah that was it, the helper pred was causing that issue, thanks.

Topic		Replies	Views
Input data for exercises - Challenges code training question	3	465	October 3, 2020
Elvish Error: pipe2: too many open files training	2	406	May 11, 2021
Elixir- read from stdin training	7	615	August 20, 2020
Problem with postlude in 'elvish' language training	3	277	April 16, 2021
Build_solutions_rust file not found training	6	311	August 20, 2020