You are here:

C++/writing a struct of variable size in a file

Advertisement


Question
Hi, i have a question. In my program im trying to write a struct of variable size into a file for random acess later, like playlist n1 has the varibles artist and title in it. I already tried to make that struct with the types char*, but i couldnt make anything neither with string. Take a look at my code:

struct MusicRecord
{
  string Artist;
  string Album;
  string Title;
  string Path;
};

...

void FileSystem::ALP_InsertRecord(MusicRecord &Rec, int Index)
{
  ofstream Stream( "playlist.dat", ios::out|ios::binary);
  Stream.seekp( ( Index ) * sizeof ( MusicRecord ) );
  Stream.write( (char*) &Rec, sizeof( MusicRecord ) );
  Stream.close();
}

void FileSystem::ALP_GetRecord(MusicRecord &Rec, int Index)
{
  ifstream Stream( "playlist.dat", ios::binary|ios::in );
  Stream.seekg( ( Index ) * sizeof( MusicRecord ) );
  Stream.read( (char *) &Rec, sizeof( MusicRecord ) );
  Stream.close();
}

Any idea? if you want to use char* instead of string no problems.

Thank you

Answer
I am surprised that you have not twigged the problem here. You are trying to use random access using a fixed size on data you (rightly) term in your question subject "variable size".

In fact using std::string (I shall just use string from now on) or char * will give you a struct of a fixed size. However the character data will most likely be held elsewhere even for string, the size of char* and string will be fixed however. They are or contain pointers to the character data and when you blat an instance of the struct to your file you just write the char * pointer or any internal pointers in string to the file. You do not write what these pointers (memory addresses) point to. Worse, loading these structs back into memory will leave the pointers pointing to invalid memory as they contain pointers to their previous data. You will only possibly get away with this in trivial programs where you write the data then read it back without destroying the original data first.

In the case of some string implementations you might get some character data written to your record as they may cache short strings directly in the structure of the class, longer strings will still be stored elsewhere however. You will also store much more information than you require. Here is an example use of your struct for a possible MS Windows system setup:

 MusicRecord mr, *pmr(&mr);
 size_t mr_size = sizeof(mr);

 mr.Album  = "Absolution";
 mr.Artist = "Muse";
 mr.Title  = "Hysteria";
 mr.Path   = "C:\\Documents and Settings\\Whoever\\My Documents"
         "\\My Music\\Muse.Absolution.Hysteria.08.mp3";

The path is a long string so I split into two parts. It is still a single string (this is a little known feature of C and C++). I have also defined a pointer to the MusicRecord and obtained the size of the struct. Compiling under MS Visual C++ 8 (aka 2005) for Win32 we find that MusicRecord is 128 bytes in size, thus each string is 32 bytes in size. In memory the struct looks like the following:

mr
.Artist
   00 00 00 00 cc cc cc cc 4d 75 73 65 00 cc cc cc  ....Muse.
   cc cc cc cc cc cc cc cc 04 00 00 00 0f 00 00 00  ........

.Album
   00 00 00 00 cc cc cc cc 41 62 73 6f 6c 75 74 69  ....Absoluti
   6f 6e 00 cc cc cc cc cc 0a 00 00 00 0f 00 00 00  on.........

.Title
   00 00 00 00 cc cc cc cc 48 79 73 74 65 72 69 61  ....Hysteria
   00 cc cc cc cc cc cc cc 08 00 00 00 0f 00 00 00  .........

.Path
   00 00 00 00 cc cc cc cc 28 60 b7 01 cc cc cc cc  ....(`.
   cc cc cc cc cc cc cc cc 57 00 00 00 5f 00 00 00  W..._...

I have added annotations to show which bytes refer to which parts of the mr MusicRecord. They do however form a single contiguous block of memory. As you can see the values for the first three strings are short enough to be kept within the strings themselves. The path string is too long and is held elsewhere. In fact if we look at the structure where the character data was held for the other strings in this case it holds a pointer to the character data at 0x01b76028 (you have to place the bytes in reverse order to get the address). Pointing the VC++ debugger memory view at this location reveals:

   43 3a 5c 44 6f 63 75 6d 65 6e 74 73 20 61 6e 64  C:\Documents and
   20 53 65 74 74 69 6e 67 73 5c 57 68 6f 65 76 65   Settings\Whoeve
   72 5c 4d 79 20 44 6f 63 75 6d 65 6e 74 73 5c 4d  r\My Documents\M
   79 20 4d 75 73 69 63 5c 4d 75 73 65 2e 41 62 73  y Music\Muse.Abs
   6f 6c 75 74 69 6f 6e 2e 48 79 73 74 65 72 69 61  olution.Hysteria
   2e 30 38 2e 6d 70 33 00 cd cd cd cd cd cd cd cd  .08.mp3.

So handling strings and other data referred to by pointers (or references) requires more work than just blatting the memory image to a file, as does handling variable sized data as does storing data in which we are not interested as it is irrelevant to the persistent state of the data (the VC++ std:string implementation is a good example of this - all you need are the characters and length of the string, not all the internal housekeeping state of the std::string implementation).

So what options do you have? Well the most obvious is to use fixed size strings for each field. However this means defining a maximum length for each field and wastes space for strings that are shorter than the maximum, and will always be too short for the odd one or two cases:

   struct MusicRecord
   {
       char Artist[128];
       char Album[128];
       char Title[128];
       char Path[256];
   };

You also have to be wary of security problems. If you have strings that are shorter than the maximum then the rest of the struct will contain 'junk'. However this so called 'junk' may contain data useful to a cracker (more commonly, an incorrectly, called a hacker these days). So you have to ensure that each struct is initialised with say zeros or at least the unused regions are cleared before writing to file.

The other options are more complex. Basically you store all strings in a special section of your file, one after another. However you track exactly where these strings are written. You can still use a fixed size record but it is not a direct match for the form you wish your data to appear in. In the file each record now looks like:

   struct MusicFileRecord
   {
       size_t   ArtistLength;
       offset_t ArtistFilePosition;
       size_t   AlbumLength;
       offset_t AlbumFilePosition;
       size_t   TitleLength;
       offset_t TitleFilePosition;
       size_t   PathLength;
       offset_t PathFilePosition;
   };

An alternative approach is to just store the file offset positions and store the length of each string just before the character data at the specified file position. I shall go with the above scheme for this example.

Your file is now composed of 2 parts: the fixed sized records part and the variable sized string part. In fact it is easiest to start with 2 separate files. To access a record you seek for it as before, using the size of the MusicFileRecord in the fixed size record file.

Once you have the MusicFileRecord you use the data in it to build a MusicRecord by seeking to the position of the character data for each of the fields and reading the number of characters for that field from the string data file. Assuming the files exist and open on the streams recordStream, stringDataStream, and with no error checking etc. the code might look like:

   void GetMusicRecord( int index, MusicRecord & mr )
   {
       recordStream.seekg( index * sizeof(MusicFileRecord) );
       
       MusicFileRecord mfr;
       recordStream.read( (char *)&mfr, sizeof(MusicFileRecord) );

       size_t maxFieldLength( std::max( std::max( mfr.ArtistLength
         , mfr.AlbumLength
         )
         , std::max( mfr.TitleLength
         , mfr.PathLength
         )
         )
         );

       char * charBuf = new char[maxFieldLength+1];

       stringDataStream.seekg( mfr.ArtistFilePosition );
       stringDataStream.read( charBuf, mfr.ArtistLength );
       charBuf[mfr.ArtistLength] = '\0';
       mr.Artist = charBuf;

       stringDataStream.seekg( mfr.AlbumFilePosition );
       stringDataStream.read( charBuf, mfr.AlbumLength );
       charBuf[mfr.AlbumLength] = '\0';
       mr.Album = charBuf;

    // And similarly for Title and Path...

       delete [] charBuf;
   }

Things to note are that first a MusicFileRecord is read and then used to read the strings and assign them to the passed in MusicRecord. Each string is read from the stringDataStream into a dynamically allocated char buffer the size of which is the size of the field in the record having the greatest length, determined via repeated calls to std::max (include <algorithm>). This buffer is hand terminated after the raw string data is read and used to assign the string to the relevant member of the passed in MusicRecord. It is manually deallocated. A smart pointer could have been used here. Finally, if anything goes wrong halfway through fixing up the passed in MemoryRecord then the record will be in an invalid state having only been partially updated.

To create the files we do the reverse. Create a MusicFileRecord from a MusicRecord after writing the strings to the stringFileStream so we have the string file offset values to hand.

I am running out of room for this answer (I only allowed 10000 characters), so will leave this for you to work out how to do. I would like to point out some problems updating existing records. If you change a record then the new string data may or may not fit over the old data. If it does not then you will have to allocate a new string at the end of the string data and update the MusicFileRecord with the new offset(s). Deleting records will have a similar problem. You cannot just remove the record and string data from a file so you just mark the record as dead (say by zeroing all the fields). Over time your data files will accumulate more of these dead areas and so it makes sense to rebuild them every now and then which will probably give records new index values.

You can combine the two files into one but then would have to leave some space for new records and you would require an extra offset to be fixed up at the start of the file to indicate the offset to the start of the second part. Getting around the problems of having no space to store more records or strings because you have run into the other part of the file would require your file to be composed of chunks containing records or strings and storing offsets to each of the chunks.  

C++

All Answers


Answers by Expert:


Ask Experts

Volunteer


Ralph McArdell

Expertise

I am a software developer with more than 15 years C++ experience and over 25 years experience developing a wide variety of applications for Windows NT/2000/XP, UNIX, Linux and other platforms. I can help with basic to advanced C++, C (although I do not write just-C much if at all these days so maybe ask in the C section about purely C matters), software development and many platform specific and system development problems.

Experience

My career started in the mid 1980s working as a batch process operator for the now defunct Inner London Education Authority, working on Prime mini computers. I then moved into the role of Programmer / Analyst, also on the Primes, then into technical support and finally into the micro computing section, using a variety of 16 and 8 bit machines. Following the demise of the ILEA I worked for a small company, now gone, called Hodos. I worked on a part task train simulator using C and the Intel DVI (Digital Video Interactive) - the hardware based predecessor to Indeo. Other projects included a CGI based train simulator (different goals to the first), and various other projects in C and Visual Basic (er, version 1 that is). When Hodos went into receivership I went freelance and finally managed to start working in C++. I initially had contracts working on train simulators (surprise) and multimedia - I worked on many of the Dorling Kindersley CD-ROM titles and wrote the screensaver games for the Wallace and Gromit Cracking Animator CD. My more recent contracts have been more traditionally IT based, working predominately in C++ on MS Windows NT, 2000. XP, Linux and UN*X. These projects have had wide ranging additional skill sets including system analysis and design, databases and SQL in various guises, C#, client server and remoting, cross porting applications between platforms and various client development processes. I have an interest in the development of the C++ core language and libraries and try to keep up with at least some of the papers on the ISO C++ Standard Committee site at http://www.open-std.org/jtc1/sc22/wg21/.

Education/Credentials

©2016 About.com. All rights reserved.