You are here:

C++/Exploding a string into an array

Advertisement


Question
Okay, so I want to explode a given string into an array like PHP does. So I want something like this:
str = "my name is blah"
array_new = some_function(str)
and so then
array_new[0]=my
array_new[1]=name
array_new[2]=is
etc...

And then search for a given value within that array:
if is_in("blah",arraw_new) = true
{
//so if "blah" is a value within the array_new //array do this stuff
}


Answer
There are several ways to do this. The most obvious is to leverage the facilities of the C++ standard library.

One problem with your example is that in C and C++ you cannot return a built in array from a function. Instead you would have to return a pointer to the first string in the array. Alternatively you can pass the array into the function.

An alternative, following the C++ standard library route, is to use types from the library. The obvious ones being std::string and std::vector (a vector in this sense is a single dimension array). std::vector is a class template rather than a class, needing the type of the elements to complete it. These types have several advantages over their C counter parts (zero terminated arrays of char and built in arrays). Firstly the std::string type has many useful operations that can be used. Secondly they take care of memory management, a std::vector type will grow automatically as needed. I will use these types in my examples here for ease of exposition. You can of course translate the examples back into using C strings and the <cstring> (<string.h>) functions such as strchr and strcpy, and using built in arrays. Using the names from your question the initial stab at the code using these facilities would look something like:

   #include <string> // for std::string
   #include <vector> // for std::vector

   void some_function
   ( std::string const & source
   , std::vector<std::string> & parts
   )
   {
   // ...
   }

   int main()
   {
       std::string str( "my name is blah");
       std::vector<std::string> array_new;
       some_function( str, array_new );
   }

Now I do not like the names str, array_new and some_function for actual code. Maybe sentence, words and explode_string or something equally meaningful. Making these changes gives:

   int main()
   {
       std::string sentence( "my name is blah");
       std::vector<std::string> words;
       explode_string( sentence, words );
   }

It is common to create type aliases for types such as std::vector<std::string> as they can be a bit cumbersome:

   #include <string>
   #include <vector>

   typedef std::vector<std::string>  string_vector_type;

   // ...

This leaves the problem of the implementation of explode_string. The basic idea is to:

   While we have a substring
       Get substring from source from current position to next space
       Append substring to parts vector
   End While

The std::string class provides many operations to locate the position of a character within a string, the most fundamental being find. The find operation has several variations. The best one here is the following:

   std::string::size_type std::string::find
         (char c, std::string::size_type pos ) const;

std::string::size_type is some unsigned integer type. The above version (overload) of find takes the character to locate and a character index position to start the search. It returns the position of the next character searched for in the string beyond or at the starting position. If the character is not found then the special value std::string::npos is returned. We use the returned values to calculate the range of characters in the substring for each part using:

   std::string std::string::substr
         ( std::string::size_type pos
         , std::string::size_type len
         ) const;

Which takes the position of the first character of the substring and the length of the substring. Note that this differs from some substring functions that take first and last positions.

Having obtained the substring we append it (or rather a copy of it) to our string vector using its push_back operation.

The above operations make it plain that we have to track the start and end of each part and calculate the length of each part. The length is the difference between the part end and start positions, i.e. end - start. When we have processed one part we move the start position to the character _after_ the previous end position to search for the next part of the string. This gives us something like:

       std::string::size_type part_start( 0 );
       std::string::size_type part_end( 0 );

       do
       {
         // Find position of next space
         part_end = source.find( ' ', part_start );

         // Calculate the part length
         std::string::size_type part_length( 0 );
         part_length = part_end - part_start;

         // Obtain part substring
         std::string part( source.substr(part_start, part_length) );

         // append to parts vector
         parts.push_back( part );

         // Move start ready to find next part
         part_start = part_end + 1;
       }
       while ( std::string::npos != part_end );

This is the basic logic; however it does not take into account the following situations:

1/ There are no more spaces in the string and find return std::string::npos.

2/ Empty parts will be added to the parts vector if, for example, there are more than one space between parts of the string.

The first problem is critical and has to be fixed. It adds a little complexity to the length calculation. When the final part is looked for the end of the string will be found before a space is located, and std::string::npos is returned. We have to check for this and calculate the length of the final part based on the length of the source string rather than the part end value:

         // Calculate the part length
         std::string::size_type part_length( 0 );
         if ( std::string::npos == part_end )
         {
         part_length = source.length() - part_start;
         }
         else // space found
         {
         part_length = part_end - part_start;
         }

The second problem is not critical but will make the function more resilient to badly formed source strings. If we have an empty part, in which case its length will be 0, we should not add it to the parts array but go immediately to the next part:

         if ( part_length > 0 )
         {
         // Obtain part substring
         std::string part
         ( source.substr(part_start, part_length) );

         // append to parts vector
         parts.push_back( part );
         }

This is the final explode_string function (less the comments - AllExperts have a limit on the length of answers):

   void explode_string
   ( std::string const & source
   , string_vector_type & parts
   )
   {
       std::string::size_type part_start( 0 );
       std::string::size_type part_end( 0 );

       do
       {
         part_end = source.find( ' ', part_start );

         std::string::size_type part_length( 0 );
         if ( std::string::npos == part_end )
         {
         part_length = source.length() - part_start;
         }
         else
         {
         part_length = part_end - part_start;
         }

         if ( part_length > 0 )
         {
         std::string part
         ( source.substr(part_start, part_length) );
         parts.push_back( part );
         }

         part_start = part_end + 1;
       }
       while ( std::string::npos != part_end );
   }

I tested explode_string with a modified version of your test string to which I added some spaces:

       std::string sentence( "    my   name is blah   ");

There are several things that you could do to improve explode_string. It ignores various error conditions most notably being out of memory. The character used to split source strings into parts is hard coded to be space, so an obvious improvement would be to allow this to be varied, e.g. by adding a third parameter to explode_string, defaulting to a space:

   void explode_string
   ( std::string const & source
   , string_vector_type & parts
   , char break_char = ' '
   )
   {
     // ...
         part_end = source.find( break_char, part_start );
     // ...
   }

To locate a string in the vector you can use the find algorithm from the C++ standard library (include <algorithm>). For example you could add the following to main after the call to explode_string to see if sentence contained the word "is" (you will have to include <algorithm> for std::find and <iostream> for std::cout):

       if ( std::find(words.begin(), words.end(), "is") != words.end() )
       {
         std::cout << "is in sentence.\n";
       }
       else
       {
         std::cout << "is NOT in sentence.\n";

       }

For more information on the C++ standard library I suggest you obtain a good reference book. I use "The Standard C++ Library A Tutorial and Reference" by Nicolai M. Josuttis. There are various references, tutorials and articles online, for example: http://www.sgi.com/tech/stl/, http://www.digilife.be/quickreferences/PT.htm, http://www.cplusplus.com/ (not all there yet).

Hope this helps you.

C++

All Answers


Answers by Expert:


Ask Experts

Volunteer


Ralph McArdell

Expertise

I am a software developer with more than 15 years C++ experience and over 25 years experience developing a wide variety of applications for Windows NT/2000/XP, UNIX, Linux and other platforms. I can help with basic to advanced C++, C (although I do not write just-C much if at all these days so maybe ask in the C section about purely C matters), software development and many platform specific and system development problems.

Experience

My career started in the mid 1980s working as a batch process operator for the now defunct Inner London Education Authority, working on Prime mini computers. I then moved into the role of Programmer / Analyst, also on the Primes, then into technical support and finally into the micro computing section, using a variety of 16 and 8 bit machines. Following the demise of the ILEA I worked for a small company, now gone, called Hodos. I worked on a part task train simulator using C and the Intel DVI (Digital Video Interactive) - the hardware based predecessor to Indeo. Other projects included a CGI based train simulator (different goals to the first), and various other projects in C and Visual Basic (er, version 1 that is). When Hodos went into receivership I went freelance and finally managed to start working in C++. I initially had contracts working on train simulators (surprise) and multimedia - I worked on many of the Dorling Kindersley CD-ROM titles and wrote the screensaver games for the Wallace and Gromit Cracking Animator CD. My more recent contracts have been more traditionally IT based, working predominately in C++ on MS Windows NT, 2000. XP, Linux and UN*X. These projects have had wide ranging additional skill sets including system analysis and design, databases and SQL in various guises, C#, client server and remoting, cross porting applications between platforms and various client development processes. I have an interest in the development of the C++ core language and libraries and try to keep up with at least some of the papers on the ISO C++ Standard Committee site at http://www.open-std.org/jtc1/sc22/wg21/.

Education/Credentials

©2016 About.com. All rights reserved.