QUESTION: Hi, I'm writing a program using the GEANT4 Monte Carlo toolkit. It's all object oriented programming- I only took C++ 101 in undergrad. I have my program completed except for one part.
Here is my question: I need to read in a 1-2 gb text file, three lines at a time. I don't want to load the whole file into memory (program already requires ~700Mb). Also, my understanding of getline() is that it scans lines until you tell it to stop- so, my program would continually slow down as getline() had to go further into the file. Is it possible to have a persistent pointer (?) to the file/location in the file? Or, is there a good, efficient way to jump to a specific line in the file?
All I can think of now is to break my files into 100Mb chunks, load one, use the data from that file, delete that chunk of memory and load the next...not very elegant, but, maybe the easiest.
My files are ASCII and I'm writing on FEDORA 6 system. Also, although most lines have the same number of characters, no line is guaranteed to have the same number of characters in it.
ANSWER: getline reads one line at a time. http://www.cplusplus.com/reference/iostream/istream/getline.html
So do as you need to do reading 3 lines at a time. The underlying code tends to buffer in larger chunks and provide the lines from that buffer. You'll find it to be as efficient as you reading in 1MB and processing it locally. There is also a way to control the size of the buffer used by the file system. The default is probably adequate.
---------- FOLLOW-UP ----------
QUESTION: Thank you for your response, I really appreciate it!
I kinda meant for my question to be less on how to read data in the file and more on how to efficiently skip to, for example, line number 3,140,327 of a text file. Is there a good, efficient way to skip to an arbitrary line in a text file? If so, how could that be achieved?
Ah, OK. I think the fastest way to do this is to use raw I/O and read big chunks into memory and use a tight loop, like:
void skipLines( char *buffer, register int toLine )
register int count;
register char *ptr = buffer;
while( count < toLine )
if( *ptr++ == '\n' )
This assumes you have enough in buffer to get to toLine. Checking for the end of the buffer and reloading more and continuing isn't hard but it takes time. Windows has memory mapped files and you might consider seeing if a disk file can be copies to a memory mapped one.
I think I told you what you expected here. You can count on the memory usage versus speed relationship. The more memory you use (i.e. loading the whole file) will get you more speed. You may find loading the whole file, which will force Windows to use paged memory may be faster than reading in chunks.
I recommend trying several methods and timing them. Sometimes I've been surprised when I think I've done something the fastest way only to find out it wasn't.