C++/writing a struct of variable size in a file
Hi, i have a question. In my program im trying to write a struct of variable size into a file for random acess later, like playlist n°1 has the varibles artist and title in it. I already tried to make that struct with the types char*, but i couldn´t make anything neither with string. Take a look at my code:
void FileSystem::ALP_InsertRecord(MusicRecord &Rec, int Index)
ofstream Stream( "playlist.dat", ios::out|ios::binary);
Stream.seekp( ( Index ) * sizeof ( MusicRecord ) );
Stream.write( (char*) &Rec, sizeof( MusicRecord ) );
void FileSystem::ALP_GetRecord(MusicRecord &Rec, int Index)
ifstream Stream( "playlist.dat", ios::binary|ios::in );
Stream.seekg( ( Index ) * sizeof( MusicRecord ) );
Stream.read( (char *) &Rec, sizeof( MusicRecord ) );
Any idea? if you want to use char* instead of string no problems.
I am surprised that you have not twigged the problem here. You are trying to use random access using a fixed size on data you (rightly) term in your question subject "variable size".
In fact using std::string (I shall just use string from now on) or char * will give you a struct of a fixed size. However the character data will most likely be held elsewhere even for string, the size of char* and string will be fixed however. They are or contain pointers to the character data and when you blat an instance of the struct to your file you just write the char * pointer or any internal pointers in string to the file. You do not write what these pointers (memory addresses) point to. Worse, loading these structs back into memory will leave the pointers pointing to invalid memory as they contain pointers to their previous data. You will only possibly get away with this in trivial programs where you write the data then read it back without destroying the original data first.
In the case of some string implementations you might get some character data written to your record as they may cache short strings directly in the structure of the class, longer strings will still be stored elsewhere however. You will also store much more information than you require. Here is an example use of your struct for a possible MS Windows system setup:
MusicRecord mr, *pmr(&mr);
size_t mr_size = sizeof(mr);
mr.Album = "Absolution";
mr.Artist = "Muse";
mr.Title = "Hysteria";
mr.Path = "C:\\Documents and Settings\\Whoever\\My Documents"
The path is a long string so I split into two parts. It is still a single string (this is a little known feature of C and C++). I have also defined a pointer to the MusicRecord and obtained the size of the struct. Compiling under MS Visual C++ 8 (aka 2005) for Win32 we find that MusicRecord is 128 bytes in size, thus each string is 32 bytes in size. In memory the struct looks like the following:
00 00 00 00 cc cc cc cc 4d 75 73 65 00 cc cc cc ....ÞÞÞÞMuse.ÞÞÞ
cc cc cc cc cc cc cc cc 04 00 00 00 0f 00 00 00 ÞÞÞÞÞÞÞÞ........
00 00 00 00 cc cc cc cc 41 62 73 6f 6c 75 74 69 ....ÞÞÞÞAbsoluti
6f 6e 00 cc cc cc cc cc 0a 00 00 00 0f 00 00 00 on.ÞÞÞÞÞ........
00 00 00 00 cc cc cc cc 48 79 73 74 65 72 69 61 ....ÞÞÞÞHysteria
00 cc cc cc cc cc cc cc 08 00 00 00 0f 00 00 00 .ÞÞÞÞÞÞÞ........
00 00 00 00 cc cc cc cc 28 60 b7 01 cc cc cc cc ....ÞÞÞÞ(`ú.ÞÞÞÞ
cc cc cc cc cc cc cc cc 57 00 00 00 5f 00 00 00 ÞÞÞÞÞÞÞÞW..._...
I have added annotations to show which bytes refer to which parts of the mr MusicRecord. They do however form a single contiguous block of memory. As you can see the values for the first three strings are short enough to be kept within the strings themselves. The path string is too long and is held elsewhere. In fact if we look at the structure where the character data was held for the other strings in this case it holds a pointer to the character data at 0x01b76028 (you have to place the bytes in reverse order to get the address). Pointing the VC++ debugger memory view at this location reveals:
43 3a 5c 44 6f 63 75 6d 65 6e 74 73 20 61 6e 64 C:\Documents and
20 53 65 74 74 69 6e 67 73 5c 57 68 6f 65 76 65 Settings\Whoeve
72 5c 4d 79 20 44 6f 63 75 6d 65 6e 74 73 5c 4d r\My Documents\M
79 20 4d 75 73 69 63 5c 4d 75 73 65 2e 41 62 73 y Music\Muse.Abs
6f 6c 75 74 69 6f 6e 2e 48 79 73 74 65 72 69 61 olution.Hysteria
2e 30 38 2e 6d 70 33 00 cd cd cd cd cd cd cd cd .08.mp3.ÖÖÖÖÖÖÖÖ
So handling strings and other data referred to by pointers (or references) requires more work than just blatting the memory image to a file, as does handling variable sized data as does storing data in which we are not interested as it is irrelevant to the persistent state of the data (the VC++ std:string implementation is a good example of this - all you need are the characters and length of the string, not all the internal housekeeping state of the std::string implementation).
So what options do you have? Well the most obvious is to use fixed size strings for each field. However this means defining a maximum length for each field and wastes space for strings that are shorter than the maximum, and will always be too short for the odd one or two cases:
You also have to be wary of security problems. If you have strings that are shorter than the maximum then the rest of the struct will contain 'junk'. However this so called 'junk' may contain data useful to a cracker (more commonly, an incorrectly, called a hacker these days). So you have to ensure that each struct is initialised with say zeros or at least the unused regions are cleared before writing to file.
The other options are more complex. Basically you store all strings in a special section of your file, one after another. However you track exactly where these strings are written. You can still use a fixed size record but it is not a direct match for the form you wish your data to appear in. In the file each record now looks like:
An alternative approach is to just store the file offset positions and store the length of each string just before the character data at the specified file position. I shall go with the above scheme for this example.
Your file is now composed of 2 parts: the fixed sized records part and the variable sized string part. In fact it is easiest to start with 2 separate files. To access a record you seek for it as before, using the size of the MusicFileRecord in the fixed size record file.
Once you have the MusicFileRecord you use the data in it to build a MusicRecord by seeking to the position of the character data for each of the fields and reading the number of characters for that field from the string data file. Assuming the files exist and open on the streams recordStream, stringDataStream, and with no error checking etc. the code might look like:
void GetMusicRecord( int index, MusicRecord & mr )
recordStream.seekg( index * sizeof(MusicFileRecord) );
recordStream.read( (char *)&mfr, sizeof(MusicFileRecord) );
size_t maxFieldLength( std::max( std::max( mfr.ArtistLength
, std::max( mfr.TitleLength
char * charBuf = new char[maxFieldLength+1];
stringDataStream.seekg( mfr.ArtistFilePosition );
stringDataStream.read( charBuf, mfr.ArtistLength );
charBuf[mfr.ArtistLength] = '\0';
mr.Artist = charBuf;
stringDataStream.seekg( mfr.AlbumFilePosition );
stringDataStream.read( charBuf, mfr.AlbumLength );
charBuf[mfr.AlbumLength] = '\0';
mr.Album = charBuf;
// And similarly for Title and Path...
delete  charBuf;
Things to note are that first a MusicFileRecord is read and then used to read the strings and assign them to the passed in MusicRecord. Each string is read from the stringDataStream into a dynamically allocated char buffer the size of which is the size of the field in the record having the greatest length, determined via repeated calls to std::max (include <algorithm>). This buffer is hand terminated after the raw string data is read and used to assign the string to the relevant member of the passed in MusicRecord. It is manually deallocated. A smart pointer could have been used here. Finally, if anything goes wrong halfway through fixing up the passed in MemoryRecord then the record will be in an invalid state having only been partially updated.
To create the files we do the reverse. Create a MusicFileRecord from a MusicRecord after writing the strings to the stringFileStream so we have the string file offset values to hand.
I am running out of room for this answer (I only allowed 10000 characters), so will leave this for you to work out how to do. I would like to point out some problems updating existing records. If you change a record then the new string data may or may not fit over the old data. If it does not then you will have to allocate a new string at the end of the string data and update the MusicFileRecord with the new offset(s). Deleting records will have a similar problem. You cannot just remove the record and string data from a file so you just mark the record as dead (say by zeroing all the fields). Over time your data files will accumulate more of these dead areas and so it makes sense to rebuild them every now and then which will probably give records new index values.
You can combine the two files into one but then would have to leave some space for new records and you would require an extra offset to be fixed up at the start of the file to indicate the offset to the start of the second part. Getting around the problems of having no space to store more records or strings because you have run into the other part of the file would require your file to be composed of chunks containing records or strings and storing offsets to each of the chunks.