C++/C++

Advertisement


Question
Hi,

I need to implement an algorithm for using c++ data structures.

I have got two files file1 and file2.

In file1, the words are split across individual lines.  For example, each word in the sentence "I live in London" appears in separate lines. Each word in the file1 has their equivalents patterns. For example the word "live” has its equivalent pattern "VB  KL  BL". Normally, the number of lines in the file will be file in terms of thousands (4 or 5 thousands)


The way lines are stored in file1 as follows:
=============================================
Letter/word                            patterns1
I                                 SJ      DL          KP   
live                              VB      KL          BL
in                                IJ      DL          ML
London                            OJ      KL          BL
......                            ..      ..          ..
......                            ..      ..          ..
etc.....


file2 is rule file which contains the input patterns and their corresponding output patterns

file2
------
input pattern                           output pattern
DL       KP                                MN       OP
KL       BL                                QR       QP
SJ       DL       KP                       AB       DF
VB       DL       BL                       FG       KE
IJ       DL       ML                       KQ       PT
OJ       KL       BL                       AK       BM
DL       ML       BT               
--       --       --                       ---      --

I need to parse file1 line by line and identify their corresponding patterns ( Ex: In the first line  "SJ DL KP" is the pattern for letter "I"). Once the pattern1 (say "SJ DL KP" in file1) is identified then I have to search this pattern (pattern1) in the file2 (“input pattern" column). So in the file2 the equivalent pattern for "SJ DL KP" is "AB DF"("output pattern" column)

My requirement is to parse the file1 for each word and identify the corresponding pattern in patterns1 column. Once the pattern is identified then search this pattern in the file2 (under input pattern column). Then match the pattern identified in "input pattern" column with the "output pattern" column. Then I have to generate file3 with new pattern selected from "output pattern" column of file2.

I need to implement an efficient (low memory,less iterations, more speed) algorithm using c++. Can any one from here help me with proper design, data structures and iterators

Regards
Devamani

Answer
> parse the file1 for each word and identify the corresponding pattern in patterns1 column.
This is trivial.

> then search this pattern in the file2
This is the key step. You could use a string searching algorithm like the Boyer–Moore string search algorithm:
http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

The rest of the steps just involve table look-ups.  

C++

All Answers


Answers by Expert:


Ask Experts

Volunteer


vijayan

Expertise

my primary areas of interest are generic and template metaprogramming, STL, algorithms, design patterns and c++09. i would not answer questions about gui and web programming.

Experience

over 15 years

Education/Credentials
post graduate engineer

©2012 About.com, a part of The New York Times Company. All rights reserved.