Unix/Linux OS/awk and file editing

Advertisement


Question
QUESTION: Hi there, I'm trying to edit a text file using AWK.  Its been so long since I've used it I cant remember how.  
Simply put I want to strip back a lot of the data in my file and save the output even if I have to do it bit by bit.
I've installed cygwin on my XP box and most of the functions are there except I cant find a file editing tool like vi or edit?  any ideas (its in bash) and so far I cant find any other shell.
Thanks, Ollie

ANSWER: It would help to know a bit more detail...

But, here are a couple of sources for the tools you are looking for:
http://unxutils.sourceforge.net/
http://sourceforge.net/projects/wintools/

I keep a C:\bin folder on every windows box I use and add it to the default system PATH. (and I keep a copy of C:\bin on a USB stick..)

And, I add the following regedit to be able to right-click on a folder and open a windows command shell within the folder:
==============
Windows Registry Editor Version 5.00

[HKEY_CLASSES_ROOT\Directory\shell\Command]
@="Command &Prompt"
[HKEY_CLASSES_ROOT\Directory\shell\Command\command]
@="cmd.exe /k cd \"%1\" && prompt $N:$G"
==============

So then I have awk, sed, vi, grep, etc. available as needed.

Hope that helps...
If not, give me a bit more detail about the content before and after.
(i.e. how to identify what to strip out...)

---------- FOLLOW-UP ----------

QUESTION: thanks, I've managed to install vim so I have an editor now. In truth I may not need this as awk (gawk) should be able to parse the data and output to a new file...the data to strip is ugly, here is an example:
"BLKA","33","","","HARRISON","ALAN","22 ALBION STREET","LANCASTER","LA1 1DY","","","","","","22","","ALBION STREET","LANCASTER","","","","LA1 1DY"

That is one line.  I only want this from it:
HARRISON ALAN 22 ALBION STREET LANCASTER LA1 1DY

Do you think using AWK is the better way?

Answer
Awk will let you format the output in a specific way, but has more syntax (like 'c' syntax) involved in order to format the output. I

I think 'cut' (and possibly using sed also) is the fastest way:
cut -f5-9 -d"," < sourcefile > destfile
If you have any ","s in the data fields, this will fail on those lines.

To get around that:
Find a char NOT in the file using grep (like | or ~)
grep "|" sourcefile
Hopefully you get nothing on screen. Then...
Replace the contiguous quotes and commas delimiters with | (or ~):
sed -e "s/\",\"/|/g" < sourcefile > destfile1
Get rid of the leading quote:
sed -e "s/^\"//" < destfile1 > destfile2
Get rid of the trailing quote:
sed -e "s/\"$//" < destfile2 > destfile3
now you have:
BLKA|33|||HARRISON|ALAN|22 ALBION STREET|LANCASTER|LA1
1DY||||||22||ALBION STREET|LANCASTER|LA1 1DY
And then use cut on the new delimiter:
cut -f5-9 -d"|" < destfile3 > destfile4
And then if you want to get rid of the |'s:
sed -e "s/|//g" < destfile4
Or, with awk, now that you have a single delimiter:
awk '
BEGIN  { FS="[|]" }
{
 printf "%-15s %-15s %-25s %-15s %-7s\n", $5, $6, $7, $8, $9
}
' < destfile4  

Unix/Linux OS

All Answers


Answers by Expert:


Ask Experts

Volunteer


mkitwrk

Expertise

Expert: Creating and managing *nix database/application servers for use with dl4/unibasic/mysql/apache/thoroughbred applications, especially in medical environments. Strengths: scripting, backup and disaster recovery, mysql, apache2, routing, samba/smbfs/cifs, LPRng, CUPS, telnet/ssh/sftp, vsftp, rsync, new system preparation, system duplication, database design, system conversions (AIX/SCO-OS5/Linux) Currently working on scripted setup of LAMP servers using PDO for MySQL and Oracle. Compiling Apache2, openssl, php and libxml2 from source and linking to libraries for MySQL and Oracle InstantClient. Works great so far! Familiar With: php, c, awk, sed, gnome, nfs and lots of other *nix tools

Experience

I've been head of development at our company since 1984. Our OS's at that time were Point 4's IRIS and Altos' Xenix. Then: SCO Xenix, SCO Unix, AIX, SCO-OS5, Caldera, RedHat 7, Debian Sarge, RedHat ES4, Debian Etch, Redhat ES5, Debian Lenny, RedHat ES6, Debian Squeeze. I've migrated our clients through those various versions with minimal interruption while preserving their investments in hardware and staff knowledge over time.

Education/Credentials
1980 BSBA Washington University, Saint Louis, Missouri

©2012 About.com, a part of The New York Times Company. All rights reserved.