Saturday, February 21, 2009

How to read and parse flat files in Java

Parsing files and their formats can be pretty painful no matter what programming language you are using. One open source project, Flatworm, looks to make reading, parsing, and writing files, much easier in Java. You define the file format in XML and Flatworm will break out the records into Java beans for you. You can read large files, file formats that have multiple-line records, and any other flat file format in existence today with this Java API.

Simple Example

Let's look at a simple example where we need to parse out a flat file into Java objects for processing. In our example we will need to parse client data using the file format below.

NameStartEndLengthType
Type122Char
First32725Char
Middle285225Char
Last537725Char
Acct. ID789215Char

Below is the sample flat file we will be parsing.
CDJOHN                     MARK                     DOE                      111111111111111
CDPAUL                     RICHARD                  STEPHENS                 222222222222222
CDRINGO                    JACK                     ERICSON                  333333333333333
Now that we know the file format of our flat file and we have some sample data to parse we'll need to create an XML document describing our file format for Flatworm.
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE file-format SYSTEM "http://www.blackbear.com/dtds/flatworm-data-description_1_0.dtd">
<file-format>
<converter name="char" class="com.blackbear.flatworm.converters.CoreConverters" method="convertChar" return-type="java.lang.String"/>
<record name="clientData">
<record-ident>
<field-ident field-start="0" field-length="2">
<match-string>CD</match-string>
</field-ident>
</record-ident>
<record-definition>
<bean name="client" class="org.javaconfessions.sample.Client"/>
<line>
<record-element length="2"/>
<record-element length="25" beanref="client.firstName" type="char">
<conversion-option name="justify" value="left"/>
</record-element>
<record-element length="25" beanref="client.middleName" type="char">
<conversion-option name="justify" value="left"/>
</record-element>
<record-element length="25" beanref="client.lastName" type="char">
<conversion-option name="justify" value="left"/>
</record-element>
<record-element length="15" beanref="client.accountId" type="char">
<conversion-option name="justify" value="left"/>
</record-element>
</line>
</record-definition>
</record>
</file-format>
Closer Look at the Descriptor File

<file-format> - This tag is required and serves as the root node of our descriptor file.

<converter> - This tag is used to declare new converters to be used by the flatworm parser.

Looking further at the XML descriptor file, you can see it is rather simple to describe our file format for Flatworm. The record tag is the beginning of describing our client data records. Within the record tag, we have our record-ident tag. This is so Flatworm knows how to identify the types of records in a flat file. Most flat file formats have different types of records including header, footer, detail, batch headers, batch footers, etc. This mechanism allows Flatworm to parse out all of these different record types from the same file. The field-ident tag gives the specifics on how to identify the record. Field-start and field-length identifies what to test to identify the record type. Within the match-string tags is where the text that would be used to identify this record as a clientData record. In the descriptor above, we have described clientData records as starting with the characters CD.

The next section of the record description is the record-definition tag. This is where we actually map out each record element to a bean property for Flatworm. This section of the document starts with a bean definition that tells Flatworm which Java class to use when parsing this record type. The record-element tags setup where each field in the record is located, the data type, and where to plug it into the Java bean during parsing.

Here is the source code for my Client bean.
package org.javaconfessions.sample;

public class Client {

private String firstName;
private String middleName;
private String lastName;
private String accountId;

public String getFirstName() {
return firstName;
}

public void setFirstName(String pFirstName) {
firstName = pFirstName;
}

public String getMiddleName() {
return middleName;
}

public void setMiddleName(String pMiddleName) {
middleName = pMiddleName;
}

public String getLastName() {
return lastName;
}

public void setLastName(String pLastName) {
lastName = pLastName;
}

public String getAccountId() {
return accountId;
}

public void setAccountId(String pAccountId) {
accountId = pAccountId;
}

public String toString() {
return "First Name: " + firstName + "\nMiddleName: " + middleName
+ "\nLastName: " + lastName + "\nAccount ID: " + accountId
+ "\n";
}

}
Now that we have all of our data model code setup, the step is to write the code that will be responsible for populating our Java bean with the parsed data. I have posted a simple parsing Class below that will parse the flat file using Flatworm, and print each item out to the console.
package org.javaconfessions.sample;

import com.blackbear.flatworm.ConfigurationReader;
import com.blackbear.flatworm.FileFormat;
import com.blackbear.flatworm.MatchedRecord;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.logging.Level;
import java.util.logging.Logger;

public class ClientDataParser {

public static void main(String[] args) {
ConfigurationReader parser = new ConfigurationReader();
try {
FileFormat ff = parser.loadConfigurationFile(args[0]);
InputStream in = new FileInputStream( args[1] );
BufferedReader bufIn = new BufferedReader( new InputStreamReader( in ) );
MatchedRecord results;
while( ( results = ff.getNextRecord(bufIn)) != null ) {
System.out.println( results.getBean("client") );
}
} catch (Exception ex) {
Logger.getLogger(ClientDataParser.class.getName()).log(Level.SEVERE, null, ex);
}

}

}
Now after compiling, we just need to execute java ClientDataParser /path/to/format.xml /path/to/datafile.txt and you should get the following output:
First Name: JOHN
MiddleName: MARK
LastName: DOE
Account ID: 111111111111111

First Name: PAUL
MiddleName: RICHARD
LastName: STEPHENS
Account ID: 222222222222222

First Name: RINGO
MiddleName: JACK
LastName: ERICSON
Account ID: 333333333333333
Summary
Now you have an idea of how Flatworm works to simplify parsing flat files. Next learn how to write flat files using Flatworm.

Wednesday, February 4, 2009

Switching from Windows to a Macbook

I have never owned a Mac before and recently made the  plunge into the Mac world.  I recently purchased the new aluminum case macbook and I have to say as a long time Windows user it is an amazing device. However, there are some differences that take some used to getting used to and I wanted to help out others who may be diving into the Mac world for the first time.

Common Keyboard Shortcuts
The first thing you'll notice on your new macbook is that the keyboard is different and therefore some of your well known Windows keyboard shortcuts are useless. For most common functions, you can simply replace the Ctrl with the Command key. The table below gives a simple illustration.

FunctionWindowsMac
CopyCtrl-CCommand-C
CutCtrl-XCommand-X
PasteCtrl-VCommand-V


Not So Obvious Keyboard Operations

You'll notice that there are some keys missing on your new macbook. When you started using it, you may have said, "Why does my backspace say delete, and by the way, where is the delete key?" Well, this table answers those questions.


FunctionWindowsMac
Move to the Next Page (Page Down)Page-DnFn-Down Arrow
Move to the Previous Page (Page Up)Page-UpFn-Up Arrow
Move to the End of the current lineEndCommand-Right Arrow
Move to the Beginning of the Current LineHomeCommand-Left Arrow
Move to the End of a DocumentCntrl-EndCommand-Down Arrow
Move to the Beginning of a DocumentCntrl-HomeCommand-Up Arrow
Delete Text to the right of the cursor.DeleteFn-Delete
Find NextF3Command-G
Switch Running ApplicationsAlt-TabCommand-Tab

© 2010 Confessions of a Java Programmer, All Rights Reserved