You are here:

C++/strtok in c++

Advertisement


Question
I have used strtok to tokenise a textfile to extract words by removing ."'.
But I am not able to put the words in an array below are my code so far

Thanks to help.
Ritesh

// reading a text file
#include <iostream>
#include <fstream>
#include <string>
#include <stdio.h>

using namespace std;

int main () {


  const int maxCharsPerLine = 100;
  const int maxLineInFile = 10;
  char line[maxCharsPerLine];
  char arrayOfChars[maxLineInFile][maxCharsPerLine];//2D array to hold 10 lines X 100 characters see ****
  int i=0;
  
char *Words[50];
  char *pch;
 

 ifstream myFile ("text.txt"); // myfile is a file object of ifstream
 
 if (myFile.is_open())
 {
   while (! myFile.eof() )
   {
     myFile.getline (line,100,'\n'); //(line , no of chars , delimeter)
     strcpy(arrayOfChars[i], line);
         i++;
       //cout << line << endl;


//using strtok function to tokenise word with delimeters
  pch = strtok (line," '\"' ");
 while (pch != NULL)
 {


  

pch = strtok (NULL, " '\"'!,.");

if(strlen(pch)==4)
{
cout<<pch<<'\n';
int i=0;
strcpy(Words[i],pch); // add words to array (not vector)????
i++;
}  


  pch = strtok (NULL, " '\"'!,.");
 }

   }
   myFile.close();


 }

 else cout <<"Unable to open file"<<endl;



 return 0;
}


Answer
I am not surprised. Although you carefully use an array of 10 * 100 characters for the storing of lines you use an array of 50 char pointers (char *) for the Words array.

Later you try to copy the characters for each token from the line memory to the memory pointed to by which ever element of the Words array using strcpy.

Oh dear! There is no such memory because you did _not_ allocate any and assign the pointers to it to the elements of the Words array. In fact the pointers in this array are complete rubbish and could point anywhere in memory. This is because local stack variables (also called automatic variables in C and C++) are _not_ initialised to any specific value, taking whatever the value of the memory is at the location the variable is allocated.

This means that if you are lucky you will get an immediate crash or error when executing the program because a pointer in Words points to memory the process cannot access. If you are unlucky they will all point to an areas of memory accessible to the process and it will therefore overwrite random sections of memory, something that may not (in a larger program) be picked up for some time, making locating the source of the problem much more difficult.

On another note your program is fragile, inconsistent, probably broken and seems to contain redundant operations and data.

- What happens if your file contains more than maxLineInFile lines or some lines are longer than or equal to maxCharsPerLine characters? Hint: you keep on incrementing i with no checks on its value.

- Why do you store the line data? You are processing each line as soon as it is read, and once read and tokenised you do not refer to that line again.

- What happens if myFile.getline fails?

- I am not sure what the token characters you were trying for in " '\"' ". It reads: space single-quote double-quote single-quote space. You could have used: " '\"" (space single-quote double-quote). If you wish to check just try printing the string to the console:

   std::cout << " '\"" << '\n';

- Assuming you fix the Words array, what happens if each line contains an _average_ of 6 or more four letter words per line rather than 5 or less? Hint: what would be the total number of four letter words in the file?

- What happens to i when you store a word? What value will it have the next time you store a word. Hint: This second i is local to the if clause in which it is defined.

- You have nice explanatory constant values to indicate the dimensions of the line and arrayOfChars arrays but not the Words array. Why? You might also like to define (string literal) constants for the delimiter character strings, giving them some meaningful names that might indicate what they are for that might hint  at why you need different sets of delimiters.

- Your naming style is inconsistent: line, arrayOfChars, myFile vs. Words. One that I use often is to use the arrayOfChars style for modifiable objects and reserve uppercase first letter names for constants, functions and types. So you might then have a words array and a MaxCharsPerLine constant.

On a related note it is a common practice to reserve ALL_UPPERCASE names for pre-processor #define macro names so they are obvious and different from all other symbols. They are therefore less likely to cause problems due to the disregard the pre-processor has for context: when the pre-processor substitutes a #define macro name for its value it is like doing a search and replace all, and it does this _before_ the compilation proper starts. This is why C++ has various facilities to reduce the need for using the pre-processor and its macro facility, such as its constant values, which I am very pleased to see you are using in preference to #define macros.

Hope this helps  

C++

All Answers


Answers by Expert:


Ask Experts

Volunteer


Ralph McArdell

Expertise

I am a software developer with more than 15 years C++ experience and over 25 years experience developing a wide variety of applications for Windows NT/2000/XP, UNIX, Linux and other platforms. I can help with basic to advanced C++, C (although I do not write just-C much if at all these days so maybe ask in the C section about purely C matters), software development and many platform specific and system development problems.

Experience

My career started in the mid 1980s working as a batch process operator for the now defunct Inner London Education Authority, working on Prime mini computers. I then moved into the role of Programmer / Analyst, also on the Primes, then into technical support and finally into the micro computing section, using a variety of 16 and 8 bit machines. Following the demise of the ILEA I worked for a small company, now gone, called Hodos. I worked on a part task train simulator using C and the Intel DVI (Digital Video Interactive) - the hardware based predecessor to Indeo. Other projects included a CGI based train simulator (different goals to the first), and various other projects in C and Visual Basic (er, version 1 that is). When Hodos went into receivership I went freelance and finally managed to start working in C++. I initially had contracts working on train simulators (surprise) and multimedia - I worked on many of the Dorling Kindersley CD-ROM titles and wrote the screensaver games for the Wallace and Gromit Cracking Animator CD. My more recent contracts have been more traditionally IT based, working predominately in C++ on MS Windows NT, 2000. XP, Linux and UN*X. These projects have had wide ranging additional skill sets including system analysis and design, databases and SQL in various guises, C#, client server and remoting, cross porting applications between platforms and various client development processes. I have an interest in the development of the C++ core language and libraries and try to keep up with at least some of the papers on the ISO C++ Standard Committee site at http://www.open-std.org/jtc1/sc22/wg21/.

Education/Credentials

©2016 About.com. All rights reserved.