Unix/Linux OS/awk and file editing
Expert: mkitwrk - 2/9/2008
QuestionQUESTION: Hi there, I'm trying to edit a text file using AWK. Its been so long since I've used it I cant remember how.
Simply put I want to strip back a lot of the data in my file and save the output even if I have to do it bit by bit.
I've installed cygwin on my XP box and most of the functions are there except I cant find a file editing tool like vi or edit? any ideas (its in bash) and so far I cant find any other shell.
Thanks, Ollie
ANSWER: It would help to know a bit more detail...
But, here are a couple of sources for the tools you are looking for:
http://unxutils.sourceforge.net/
http://sourceforge.net/projects/wintools/
I keep a C:\bin folder on every windows box I use and add it to the default system PATH. (and I keep a copy of C:\bin on a USB stick..)
And, I add the following regedit to be able to right-click on a folder and open a windows command shell within the folder:
==============
Windows Registry Editor Version 5.00
[HKEY_CLASSES_ROOT\Directory\shell\Command]
@="Command &Prompt"
[HKEY_CLASSES_ROOT\Directory\shell\Command\command]
@="cmd.exe /k cd \"%1\" && prompt $N:$G"
==============
So then I have awk, sed, vi, grep, etc. available as needed.
Hope that helps...
If not, give me a bit more detail about the content before and after.
(i.e. how to identify what to strip out...)
---------- FOLLOW-UP ----------
QUESTION: thanks, I've managed to install vim so I have an editor now. In truth I may not need this as awk (gawk) should be able to parse the data and output to a new file...the data to strip is ugly, here is an example:
"BLKA","33","","","HARRISON","ALAN","22 ALBION STREET","LANCASTER","LA1 1DY","","","","","","22","","ALBION STREET","LANCASTER","","","","LA1 1DY"
That is one line. I only want this from it:
HARRISON ALAN 22 ALBION STREET LANCASTER LA1 1DY
Do you think using AWK is the better way?
AnswerAwk will let you format the output in a specific way, but has more syntax (like 'c' syntax) involved in order to format the output. I
I think 'cut' (and possibly using sed also) is the fastest way:
cut -f5-9 -d"," < sourcefile > destfile
If you have any ","s in the data fields, this will fail on those lines.
To get around that:
Find a char NOT in the file using grep (like | or ~)
grep "|" sourcefile
Hopefully you get nothing on screen. Then...
Replace the contiguous quotes and commas delimiters with | (or ~):
sed -e "s/\",\"/|/g" < sourcefile > destfile1
Get rid of the leading quote:
sed -e "s/^\"//" < destfile1 > destfile2
Get rid of the trailing quote:
sed -e "s/\"$//" < destfile2 > destfile3
now you have:
BLKA|33|||HARRISON|ALAN|22 ALBION STREET|LANCASTER|LA1
1DY||||||22||ALBION STREET|LANCASTER|LA1 1DY
And then use cut on the new delimiter:
cut -f5-9 -d"|" < destfile3 > destfile4
And then if you want to get rid of the |'s:
sed -e "s/|//g" < destfile4
Or, with awk, now that you have a single delimiter:
awk '
BEGIN { FS="[|]" }
{
printf "%-15s %-15s %-25s %-15s %-7s\n", $5, $6, $7, $8, $9
}
' < destfile4