Saturday, August 23, 2008

Delete multiple lines in a text file using regular expressions and sed

I recently found myself in quite a predicament. I had a very large flat file in a specific format and I needed to delete certain record types. Now, obviously this is very easy with regular expressions if the records are only on one line. However, that was not the case.  The record types I needed to remove took up multiple lines.  Below is an example of the type of record I needed to delete (however since the file spec is proprietary I have changed the actual values):
START PAYMENT_RECORD
123456789 80000 JOHN DOE BANK OF ANYWHERE USA 123 ANY STREET ANYTOWN USA
END PAYMENT_RECORD

However with sed, you can still do a multiple line search and delete from a file with the following command:  
sed '/pattern/{N;N;N;d}' filename  

So using the sed command in my particular record example, I accomplished this task with

sed '/^START PAYMENT_RECORD/{N;N;N;d}' payment_data_file.txt

This command tells sed to find the text START PAYMENT_RECORD at the beginning of the line, then delete that line and the next two lines.  The N portion is the number of lines to be removed, and the d portion tells sed to delete the data matching the pattern.

0 comments:

© 2010 Confessions of a Java Programmer, All Rights Reserved