C++/strtok in c++
Expert: Ralph McArdell - 1/24/2007
QuestionI have used strtok to tokenise a textfile to extract words by removing ."'.
But I am not able to put the words in an array below are my code so far
Thanks to help.
Ritesh
// reading a text file
#include <iostream>
#include <fstream>
#include <string>
#include <stdio.h>
using namespace std;
int main () {
const int maxCharsPerLine = 100;
const int maxLineInFile = 10;
char line[maxCharsPerLine];
char arrayOfChars[maxLineInFile][maxCharsPerLine];//2D array to hold 10 lines X 100 characters see ****
int i=0;
char *Words[50];
char *pch;
ifstream myFile ("text.txt"); // myfile is a file object of ifstream
if (myFile.is_open())
{
while (! myFile.eof() )
{
myFile.getline (line,100,'\n'); //(line , no of chars , delimeter)
strcpy(arrayOfChars[i], line);
i++;
//cout << line << endl;
//using strtok function to tokenise word with delimeters
pch = strtok (line," '\"' ");
while (pch != NULL)
{
pch = strtok (NULL, " '\"'!,.");
if(strlen(pch)==4)
{
cout<<pch<<'\n';
int i=0;
strcpy(Words[i],pch); // add words to array (not vector)????
i++;
}
pch = strtok (NULL, " '\"'!,.");
}
}
myFile.close();
}
else cout <<"Unable to open file"<<endl;
return 0;
}
AnswerI am not surprised. Although you carefully use an array of 10 * 100 characters for the storing of lines you use an array of 50 char pointers (char *) for the Words array.
Later you try to copy the characters for each token from the line memory to the memory pointed to by which ever element of the Words array using strcpy.
Oh dear! There is no such memory because you did _not_ allocate any and assign the pointers to it to the elements of the Words array. In fact the pointers in this array are complete rubbish and could point anywhere in memory. This is because local stack variables (also called automatic variables in C and C++) are _not_ initialised to any specific value, taking whatever the value of the memory is at the location the variable is allocated.
This means that if you are lucky you will get an immediate crash or error when executing the program because a pointer in Words points to memory the process cannot access. If you are unlucky they will all point to an areas of memory accessible to the process and it will therefore overwrite random sections of memory, something that may not (in a larger program) be picked up for some time, making locating the source of the problem much more difficult.
On another note your program is fragile, inconsistent, probably broken and seems to contain redundant operations and data.
- What happens if your file contains more than maxLineInFile lines or some lines are longer than or equal to maxCharsPerLine characters? Hint: you keep on incrementing i with no checks on its value.
- Why do you store the line data? You are processing each line as soon as it is read, and once read and tokenised you do not refer to that line again.
- What happens if myFile.getline fails?
- I am not sure what the token characters you were trying for in " '\"' ". It reads: space single-quote double-quote single-quote space. You could have used: " '\"" (space single-quote double-quote). If you wish to check just try printing the string to the console:
std::cout << " '\"" << '\n';
- Assuming you fix the Words array, what happens if each line contains an _average_ of 6 or more four letter words per line rather than 5 or less? Hint: what would be the total number of four letter words in the file?
- What happens to i when you store a word? What value will it have the next time you store a word. Hint: This second i is local to the if clause in which it is defined.
- You have nice explanatory constant values to indicate the dimensions of the line and arrayOfChars arrays but not the Words array. Why? You might also like to define (string literal) constants for the delimiter character strings, giving them some meaningful names that might indicate what they are for that might hint at why you need different sets of delimiters.
- Your naming style is inconsistent: line, arrayOfChars, myFile vs. Words. One that I use often is to use the arrayOfChars style for modifiable objects and reserve uppercase first letter names for constants, functions and types. So you might then have a words array and a MaxCharsPerLine constant.
On a related note it is a common practice to reserve ALL_UPPERCASE names for pre-processor #define macro names so they are obvious and different from all other symbols. They are therefore less likely to cause problems due to the disregard the pre-processor has for context: when the pre-processor substitutes a #define macro name for its value it is like doing a search and replace all, and it does this _before_ the compilation proper starts. This is why C++ has various facilities to reduce the need for using the pre-processor and its macro facility, such as its constant values, which I am very pleased to see you are using in preference to #define macros.
Hope this helps