You are here:

C++/File Handling

Advertisement


Question
QUESTION: Hi, I'm writing a program using the GEANT4 Monte Carlo toolkit. It's all object oriented programming- I only took C++ 101 in undergrad. I have my program completed except for one part.

Here is my question: I need to read in a 1-2 gb text file, three lines at a time. I don't want to load the whole file into memory (program already requires ~700Mb). Also, my understanding of getline() is that it scans lines until you tell it to stop- so, my program would continually slow down as getline() had to go further into the file. Is it possible to have a persistent pointer (?) to the file/location in the file? Or, is there a good, efficient way to jump to a specific line in the file?

All I can think of now is to break my files into 100Mb chunks, load one, use the data from that file, delete that chunk of memory and load the next...not very elegant, but, maybe the easiest.

My files are ASCII and I'm writing on FEDORA 6 system. Also, although most lines have the same number of characters, no line is guaranteed to have the same number of characters in it.

Thank you,
Michael

ANSWER: getline reads one line at a time.  http://www.cplusplus.com/reference/iostream/istream/getline.html

So do as you need to do reading 3 lines at a time.  The underlying code tends to buffer in larger chunks and provide the lines from that buffer.  You'll find it to be as efficient as you reading in 1MB and processing it locally.  There is also a way to control the size of the buffer used by the file system.  The default is probably adequate.

Bill

---------- FOLLOW-UP ----------

QUESTION: Thank you for your response, I really appreciate it!

I kinda meant for my question to be less on how to read data in the file and more on how to efficiently skip to, for example, line number 3,140,327 of a text file.  Is there a good, efficient way to skip to an arbitrary line in a text file?  If so, how could that be achieved?

Thank you,
Michael

Answer
Ah, OK.  I think the fastest way to do this is to use raw I/O and read big chunks into memory and use a tight loop, like:

void skipLines( char *buffer, register int toLine )
{
 register int count;
 register char *ptr = buffer;

while( count < toLine )
   if( *ptr++ == '\n' )
     ++count;
}

This assumes you have enough in buffer to get to toLine.  Checking for the end of the buffer and reloading more and continuing isn't hard but it takes time.  Windows has memory mapped files and you might consider seeing if a disk file can be copies to a memory mapped one.

I think I told you what you expected here.  You can count on the memory usage versus speed relationship.  The more memory you use (i.e. loading the whole file) will get you more speed.  You may find loading the whole file, which will force Windows to use paged memory may be faster than reading in chunks.

I recommend trying several methods and timing them.  Sometimes I've been surprised when I think I've done something the fastest way only to find out it wasn't.

Bill

C++

All Answers


Answers by Expert:


Ask Experts

Volunteer


Bill A

Expertise

I can answer questions about C++, programming algorithms, Windows programming in MFC (which is C++). I cannot answer questions about STL (templates) and I have no experience with Linux. I do enjoy reviewing code and critiquing it or finding problems in it. I will also gladly show better algorithms or methods if you want to take advantage of that.

Experience

I've developed a commercial embedded C compiler/assembler and IDE with debugger toolset, of which the IDE and debugger are written in C++. I work in the industry writing high tech embedded programs and Windows programs to communicate with the embedded devices.

Publications
Book: Embedded Systems Design using the Rabbit 3000 Microprocessor Authored Chapter 10 in its entirety.

Education/Credentials
BS Computer Engineering

©2016 About.com. All rights reserved.