You are here:

C++/fast vector search

Advertisement


Question
QUESTION: This is a follow up to a question I asked you earlier about reading wordlists into memory.

Yes, I believe you are correct about compiler optimizations being turned off. I compile code in Visual C++ 2008 express Edition, and if it has a single function call in the internal loop, it runs miserably slow. Compiling the same code with g++ yields 2x the benchmark, in some cases! But eliminate function calls and the two benchmarks tie. So I am pretty sure that inlining is turned off in Visual C++. Do you have any idea how I might turn it on? Looked in manuals, menus, etc - can't find anything.

Thank you for your detailed response. I implemented your binary search function and received a large speed boost. As far as sorting words by length, all of the words in the wordlists I am working with already have the same length. Unfortunately, the code still isn't quite fast enough.

I am implementing the vector search in two different ways. The one described above, in which I am searching a vector of strings for a string, and another way, in which each element of the vector contains a binary value and I use memcmp() to search a vector for a specific binary value.

Here is an example of implementation #2:

/* the goal of this implementation is to read MD4 hashes from a dump file (ex. pwdump format) and store them in a vector. The MD4 cracker will encrypt a password, and scan through the vector to see if the resulting MD4 hash matches any of the MD4 hashes in the dump file. If so, It will print the plaintext password that resulted in that hash*/

#include <string>
#include <vector>
#include <iostream>
#include <fstream>
#include <ctime>
#include <algorithm>

using namespace std;

typedef unsigned char u8;

u8 ntlm_hash[16]={ 0xA6, 0x04, 0x67, 0x77, 0xA8, 0xE0, 0x5F, 0x53, 0x7F, 0x4C, 0xA1, 0x47, 0x9C, 0x26, 0x73, 0x3F };

u8 md4_hash[16]={ 0xCD, 0xD1, 0xC3, 0x28, 0x07, 0x1B, 0x79, 0xD9, 0xCE, 0xEE, 0xC3, 0x3F, 0x66, 0x3E, 0xC9, 0x65 };

u8 K[16]={ 0xA6, 0x04, 0x67, 0x77, 0xA8, 0xE0, 0x5F, 0x53, 0x7F, 0x4C, 0xA1, 0x47, 0x9C, 0x26, 0x73, 0x3F };//match

u8 K1[16]={ 0xA6, 0x04, 0x67, 0x77, 0xA8, 0xE0, 0x5F, 0x53, 0x7F, 0x4C, 0xA1, 0x47, 0x9C, 0x26, 0x73, 0x3A };//mismatch

void dumptwo(u8 *digest)
{
    for(int i = 0;i < 16;i++)
        printf("%02X",digest[i]);
}

int main( int argc, char** argv )
{
     clock_t begin = clock() ;
     
   vector< u8* > word_list ;
   word_list.push_back(ntlm_hash) ;
   word_list.push_back(md4_hash) ;
   cout << word_list.size() << " words\n" ;
   
   int loopnum = 500000000;
   
for(int c = 0; c<loopnum ; c++)
{
   for( size_t i = 0 ; i < word_list.size() ; ++i )
   {
        if(!memcmp(K1,word_list[i],16))
        {
         cout << "K1 and ";
         dumptwo(word_list[i]);
         cout << " are the same!\n";
        }
   }
}
   clock_t end = clock() ;
   cout << "\n" << loopnum << " scans in " << double(end-begin) / CLOCKS_PER_SEC
         << " seconds\n" ;
return 0;
}


this code runs at the rate of a sorted binary search for a string in a vector (~8 724 785 vector scans per second, for a vector containing a single element). I need a search function that constitutes a negligible portion of the overall runtime. I was hoping for something that could scan a large vector several hundred thousand times per second, at least. But I will take what I can get.

If there is no other way to speed up this code than with the "hash_map and fancier algorithm" you mentioned, could you please send me some links to research these optimizations? And I am assuming the C++09 and TR1 implementations are newer standards that are not widely supported, yet? Does Visual C++ support them? Could it be configured to support them?

Thank you for your time.










































Yes, I believe you are correct about compiler optimizations being turned off. I compile code in Visual C++ 2008 express Edition, and if it has a single function call in the internal loop, it runs miserably slow. Compiling the same code with g++ yields 2x the benchmark, in some cases! But eliminate function calls and the two benchmarks tie. So I am pretty sure that inlining is turned off in Visual C++. Do you have any idea how I might turn it on? Looked in manuals, menus, etc - can't find anything.

Thank you for your detailed response. I implemented your binary search function and received a large speed boost. As far as sorting words by length, all of the words in the wordlists I am working with already have the same length. Unfortunately, the code still isn't quite fast enough.

I have modified my code a little bit. Now, each element of the vector contains a binary value and I use memcmp() to search a vector for a specific binary value.

Here is an example of the modified code:

/* the goal of this implementation is to read MD4 hashes from a dump file (ex. pwdump format) and store them in a vector. The MD4 cracker will encrypt a password, and scan through the vector to see if the resulting MD4 hash matches any of the MD4 hashes in the dump file. If so, It will print the plaintext password that resulted in that hash*/

#include <string>
#include <vector>
#include <iostream>
#include <fstream>
#include <ctime>
#include <algorithm>

using namespace std;

typedef unsigned char u8;

u8 ntlm_hash[16]={ 0xA6, 0x04, 0x67, 0x77, 0xA8, 0xE0, 0x5F, 0x53, 0x7F, 0x4C, 0xA1, 0x47, 0x9C, 0x26, 0x73, 0x3F };

u8 md4_hash[16]={ 0xCD, 0xD1, 0xC3, 0x28, 0x07, 0x1B, 0x79, 0xD9, 0xCE, 0xEE, 0xC3, 0x3F, 0x66, 0x3E, 0xC9, 0x65 };

u8 K[16]={ 0xA6, 0x04, 0x67, 0x77, 0xA8, 0xE0, 0x5F, 0x53, 0x7F, 0x4C, 0xA1, 0x47, 0x9C, 0x26, 0x73, 0x3F };//match ntlm

u8 K1[16]={ 0xA6, 0x04, 0x67, 0x77, 0xA8, 0xE0, 0x5F, 0x53, 0x7F, 0x4C, 0xA1, 0x47, 0x9C, 0x26, 0x73, 0x3A };//mismatch ntlm

void dumptwo(u8 *digest)
{
    for(int i = 0;i < 16;i++)
        printf("%02X",digest[i]);
}

int main( int argc, char** argv )
{
     clock_t begin = clock() ;
     
   vector< u8* > word_list ;
   word_list.push_back(ntlm_hash) ;
   word_list.push_back(md4_hash) ;
   cout << word_list.size() << " words\n" ;
   
   int loopnum = 500000000;
   
for(int c = 0; c<loopnum ; c++)
{
   for( size_t i = 0 ; i < word_list.size() ; ++i )
   {
        if(!memcmp(K1,word_list[i],16))
        {
         cout << "K1 and ";
         dumptwo(word_list[i]);
         cout << " are the same!\n";
        }
   }
}
   clock_t end = clock() ;
   cout << "\n" << loopnum << " scans in " << double(end-begin) / CLOCKS_PER_SEC
         << " seconds\n" ;
return 0;
}


this code runs at the rate of a sorted binary search for a string in a vector (~8 724 785 vector scans per second, for a vector containing a single element). I need a search function that constitutes a negligible portion of the overall runtime. I was hoping for something that could scan a large vector several hundred thousand times per second, at least. But I will take what I can get.

If there is no other way to speed up this code than with the "hash_map and fancier algorithm" you mentioned, could you please send me some links to research these optimizations? And I am assuming the C++09 and TR1 implementations are newer standards that are not widely supported, yet? Does Visual C++ support them? Could it be configured to support them?

What about SIMD/SSE2? You probably don't want to get into that :)

Thank you for your time.


ANSWER: Ignore any earlier reply you might have got - I think I hit the wrong key by mistake.

"the goal of this implementation is to read MD4 hashes from a dump file (ex. pwdump format) and store them in a vector. The MD4 cracker will encrypt a password, and scan through the vector to see if the resulting MD4 hash matches any of the MD4 hashes in the dump file. If so, It will print the plaintext password that resulted in that hash"

Is this the only search you have to perform? (ie. as per you modified code for searching for MD5 hashes from a dump file). If so, we could exploit the fact that a MD5 hash is of a fixed, known length.

Using a hash table with a good hash function would definitely be the fastest way to do this. I would need some time to look into the alternatives available and develop the suitable algorithms - and I'm rushed for time right now. I'll start cracking on this asap, once i get your reply to the earlier question - but expect a delay of at least a few days, at most about a week or so.

For Visual C++ 2008 express Edition, inlining and other optimizations, you have to do a release build (instead of the default debug build).



---------- FOLLOW-UP ----------

QUESTION: Sorry, I didn't realize I sent you both drafts. I only meant to send the second one.

Yes, I am just searching 16-byte MD4/NTLM/MD5 hashes to see if my encrypted hash is in there. This is for a proof-of-concept cracker utilizing advanced human behavior analysis techniques. I originally had other intentions for the search function, but I have simplified my code and now I just need a quick hash lookup. This is for cryptanalysis, so speed is essential.

Thank you for your time. Sorry to offload this on you, but it is so much easier to ask questions to a human being rather than Google. I greatly appreciate it. :)

ANSWER: Preliminary investigations suggest that a google dense_hash_set is a good choice as a hash table implementation for you. It is also available (in a somewhat kludged form) for Visual C++ express 2008.

Download the source code from: http://google-sparsehash.googlecode.com/files/sparsehash-1.5.2.zip
The download includes minimal documentation.
Do not use any of the other tar balls; this is the only one with VC++ project files.

Extract the zip file into a directory of your choice. For example,
C:\sparsehash-1.5.2

You *must* create your vc++ projects in the vsprojects subdirectory here. In the above example, C:\sparsehash-1.5.2\vsprojects

In addition to my code, you need to add a #include "config.h" header to make it work with vc++.

I've written two versions of the lookup; one for MD5 digests and one for strings. The interesting parts are the hash functions.

On the following platform:

>uname -spr
FreeBSD 7.2 i386

>dmesg | grep CPU:
CPU: Intel(R) Core(TM)2 Duo CPU     T7100  @ 1.80GHz (1795.51-MHz 686-class CPU)

>dmesg | grep memory
real memory  = 2137718784 (2038 MB)
avail memory = 2082193408 (1985 MB)
agp0: detected 7676k stolen memory

> c++ --version | grep GCC
g++44 (GCC) 4.4.1 20090519 (prerelease)

these are the numbers:

>c++ -O3 md5_lookup.cc && ./a.out
inserted 8388608 random md5 digests into hash table, vector
3324162 look-ups per second

>c++ -O3 string_lookup.cc && ./a.out
235882 strings read
3019289 look-ups per second


////////////////////////// md5_lookup.cc /////////////////////////

#include <google/dense_hash_set>
#include <vector>
#include <ctime>
#include <cstdlib>
#include <iostream>
#include <stdint.h> // posix

// for VC++ 2008 Express 32 bit
// replace #include <stdint.h> withe the next two lines
// typedef unsigned char uint8_t ;
// typedef unsigned int uint32_t ;

struct MD5
{
   enum { DIGEST_NBYTES = 16, DIGEST_NWORDS = DIGEST_NBYTES / sizeof(uint32_t) } ;

   union
   {
       uint8_t bytes[ DIGEST_NBYTES ] ;
       uint32_t words[ DIGEST_NWORDS ] ;
   };

   explicit MD5( uint32_t a, uint32_t b, uint32_t c, uint32_t d )
   { words[0] = a ; words[1] = b ; words[2] = c ; words[3] = d ; }

   MD5() {}
};


inline uint32_t rotate( uint32_t x, uint32_t k )
{
   return ( x << k ) | ( x >> ( 32U - k ) ) ;
}

inline void prelim_mix( uint32_t& a, uint32_t& b, uint32_t& c )
{
   a -= c ; a ^= rotate( c, 4U ) ; c += b ;
   b -= a ; b ^= rotate( a, 6U ) ; a += c ;
   c -= b ; c ^= rotate( b, 8U ) ; b += a ;
   a -= c ; a ^= rotate( c, 16U ) ; c += b ;
   b -= a ; b ^= rotate( a, 19U ) ; a += c ;
   c -= b ; c ^= rotate( b, 4U ) ; b += a ;
}

inline void final_mix( uint32_t& a, uint32_t& b, uint32_t& c )
{
   c ^= b ; c -= rotate( b, 14U ) ;
   a ^= c ; a -= rotate( c, 11U ) ;
   b ^= a ; b -= rotate( a, 25U ) ;
   c ^= b ; c -= rotate( b, 16U ) ;
   a ^= c ; a -= rotate( c, 4U ) ;
   b ^= a ; b -= rotate( a, 14U ) ;
   c ^= b ; c -= rotate( b, 24U ) ;
}


struct hash_md5
{
   uint32_t operator() ( const MD5& md5 ) const ;
};

// hash a 128 bit MD5 digest to generate a 32 bit hash
uint32_t hash_md5::operator() ( const MD5& value ) const
{
   static uint32_t seed = 0U ;

   uint32_t a,b,c ;
   a = b = c = 0xdeadbeef + ( MD5::DIGEST_NWORDS << 2U ) + seed ;

   a += value.words[0] ;
   b += value.words[1] ;
   c += value.words[2] ;
   prelim_mix( a, b, c) ;

   a += value.words[3] ;
   final_mix( a, b, c ) ;

   return seed = c ;
}

struct equal_md5
{
   inline bool operator() ( const MD5& first, const MD5& second ) const
   { return std::memcmp( first.bytes, second.bytes, sizeof(MD5) ) == 0 ; }
};

int main() // small test driver for hash of random md5
{
    std::srand( std::time(0) ) ;
    enum { N = 1024 * 1024 * 8 } ;

    google::dense_hash_set< MD5, hash_md5, equal_md5 >  md5_set ;
    md5_set.set_empty_key( MD5() ) ; // empty key is null md5
    std::vector< MD5 > vec ;
    vec.reserve(N) ;

    // generate N (8 million) random md5 digests, insert into hash table, vector
    for( int i = 0 ; i < N ; ++i )
    {
         MD5 digest( std::rand(), std::rand(), std::rand(), std::rand() ) ;
         md5_set.insert( digest ) ;
         vec.push_back( digest ) ;
    }
    std::cout << "inserted " << N << " random md5 digests into hash table, vector\n" ;

    std::clock_t begin = clock() ;
    // look up in the hash table for each of the N (8 million) md5 digests in vector
    for( int i = 0 ; i < N ; ++i )
         md5_set.find( vec[i] ) ;
    std::clock_t end = clock() ;

    double secs = double(end-begin) / CLOCKS_PER_SEC ;
    std::cout << std::fixed << int( N / secs ) << " look-ups per second\n" ;
}



////////////////////////// string_lookup.cc /////////////////////////

#include <google/dense_hash_set>
#include <string>
#include <functional>
#include <ctime>
#include <vector>
#include <iostream>
#include <fstream>
#include <stdint.h> // posix

// for VC++ 2008 Express 32 bit
// replace #include <stdint.h> withe the next two lines
// typedef unsigned char uint8_t ;
// typedef unsigned int uint32_t ;

struct murmur_hash_str
{
   inline uint32_t operator() ( const std::string& str ) const ;
};

uint32_t murmur_hash_str::operator() ( const std::string& str ) const
{
   uint32_t size = str.size() ;
   if( size == 0U ) return 0 ;

   static uint32_t seed = 0 ;
   const uint32_t multiplier = 0x5bd1e995 ;
   const uint32_t rsh_nbits = 24U ;

   uint32_t hash = seed ^ size ;

   std::string::const_iterator iterator = str.begin() ;
   while( size >= 4U )
   {
       uint32_t word = *reinterpret_cast< const uint32_t* >( &*iterator ) ;

       word *= multiplier ;
       word ^= word >> rsh_nbits ;
       word *= multiplier ;

       hash *= multiplier ;
       hash ^= word ;

       iterator += 4U ;
       size -= 4U ;
   }

   switch(size)
   {
       case 3: hash ^= iterator[2] << 16U ;
       case 2: hash ^= iterator[1] << 8U ;
       case 1: hash ^= iterator[0] ;
         hash *= multiplier ;
   };

   // a few final mixes to ensure the last few bytes are well-incorporated.
   hash ^= hash >> 13U ;
   hash *= multiplier ;
   hash ^= hash >> 15U ;

   return seed = hash ;
}

int main() // small test driver for hash of random md5
{
    google::dense_hash_set< std::string, murmur_hash_str >  str_set ;
    str_set.set_empty_key( std::string() ) ; // empty key is null string
    std::vector< std::string > vec ;

    std::ifstream file( "/usr/share/dict/words" ) ;

    // read words (lines) from file into the set, vector
    std::string word ;
    while( std::getline( file, word ) )
    {
        str_set.insert( word ) ;
        vec.push_back(word) ;
    }
    std::cout << vec.size() << " strings read\n" ;

    // look up in the hash table for each of the strings
    std::clock_t begin = clock() ;
    for( std::size_t i = 0 ; i < vec.size() ; ++i )
        str_set.find( vec[i] ) ;
    std::clock_t end = clock() ;

    double secs = double(end-begin) / CLOCKS_PER_SEC ;
    std::cout << std::fixed << int( vec.size() / secs ) << " look-ups per second\n" ;
}


//////////////////////   end  ///////////////////////////////////


---------- FOLLOW-UP ----------

QUESTION: Hello. Sorry it took me so long to respond. Been doing lots of research into this lately.

The documentation for dense_hash was terrible. I had no idea what they were trying to say. I tried to modify your code, but because I couldn't understand dense_hash well enough, I didn't know what I could do. Correct me if I'm wrong, but from what I can tell, dense_hash is just a super-fast vector.

what I want to do is read a list of password hashes from a text file, one hash per line. I want to convert these hashes to binary and store them in dense_hash vector. I hope I am not limited to md5 hashes, I had wanted to do md4, ntlm, and even Kerberos encrypted timestamps. All are 16 bytes long. I would like to store and search for hashes without regard to what algorithm they have been encrypted by.

Here is pseudo code for what I'm trying to accomplish:

//read hashes from hash list, convert from string to binary using getHash
//function, and store each word in dense_hash vector.

//for each new plaintext candidate, encrypt using hash algorithm, and scan
//through dense_hash vector looking for a matching binary value. If found,
//print the hash and the plaintext that resulted in that hash...

#include <iostream>
#include <fstream>
#include <vector>
#include <string>
using namespace std;

typedef unsigned char u8;

size_t getHash(char szHash[], unsigned char hash[])
{
      unsigned int number;
      size_t nHashLen;

      if(((nHashLen = strlen(szHash)) % 2) != 0)
         return(0);

      for(size_t i = 0;i < nHashLen;i++) {
         if(isxdigit(szHash[i]) == 0)
         return(0);
      }

      nHashLen /= 2;

      for(size_t i = 0;i < nHashLen;i++) {
         sscanf(&szHash[i*2],"%2x",&number);
         hash[i] = (unsigned char)number;
      }
      return(nHashLen);
}

void dump(u8 *digest)
{
    for(int i = 0;i < 16;i++)
        printf("%02X",digest[i]);
}

int main (int argc, char *argv[])//takes 1 argument (hash list file location)
{
   //make a buffer to store hash from wordlist in temporarily...
   u8 enc_hash[16];
   //make a buffer to hold each candidate hash...
   u8 candidate_hash[16];
   
   //open hash file
   ifstream hashfile(argv[1]);
   //check to ensure that hash list file opened correctly
   if (!hashfile.is_open())
   {
       cout<<"Error opening file " << (argv[1]);
       exit(1);
   }
   //read each hash from hashfile into dense_hash vector...
   char word[33];
   while(!hashfile.eof())
   {
       hashfile.getline(word,32); //get as string
       getHash(word,enc_hash); //string-->binary

       //store enc_hash in dense_hash vector...
   }
   
   hashfile.close();
   
   for(;;)//plaintext candidate password generation loop (simplified)...
   {
       candidate_hash=hash_algorithm(candidate);//encrypt 'candidate' using
       //hash algorithm - store in u8 'candidate_hash'
       
       if(hash_set.find( candidate_hash ))//I'm sure I'm slaughtering usage...
       {
         cout << "plaintext of ";
         dump(candidate_hash);
         cout << " is " << candidate << ".\n";
         //quit program...
       }
       
       candidate++;//increment word (ex. aaaa-->aaab)
   )

return 0;
}

I was wondering, since you seem to understand dense_hash much better than I, could you take me the rest of the way? (in regards to dense_hash usage)

1) I need to learn how to store each line of the hashfile in the dense_hash vector.

2) I need to learn how to search the dense_hash vector (fast) for a hash (x), and if that hash (x) is in the vector, return a bool TRUE.

Thank you. I think that dense_hash is well suited for my purpose, if I can just learn how to accomplish these two basic tasks.

Answer
> dense_hash is just a super-fast vector.

No, dense_hash is a super-fast hash table implementation.

dense_hash_map maps keys to values and syntactically is like a vector, though the implementation is quite different.

The dense_hash_set that we are using contains only the keys, it is used for fast key lookup.

here is some basic introductory material on hashing.
http://en.wikipedia.org/wiki/Hash_table
http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_hashtable.aspx


Before I try to answer your questions (the answers for which are quite trivial), let me ask: were you able to download the software and compile the sample programs that come with it under VC++?

Once you are able to do that, rest of the stuff is easy.

C++

All Answers


Answers by Expert:


Ask Experts

Volunteer


vijayan

Expertise

my primary areas of interest are generic and template metaprogramming, STL, algorithms, design patterns and c++11. i would not answer questions about gui and web programming.

Experience

about 15 years or so

Education/Credentials
post graduate engineer

©2016 About.com. All rights reserved.