This page describes how you can Read, wRrite and perform aRithmetic on flat files using The Spring Batch Framework. We will take a comma separated file (csv) that contain employee information, add some information to it, and write it back to the file system.
Background
The basic building blocks of any batch process is
- Reading a Item
- Performing an operation on it
- Writing the Item back
Please take some time to review The Domain Language of Batch before proceeding. It covers much of the fundamental concepts we will be covering here.
Batch Steps
This page is focused on an individual step of the batch process.
The following is from the spring batch documentation
A Step is a domain object that encapsulates an independent, sequential phase of a batch job. Therefore, every Job is composed entirely of one or more steps. A Step contains all of the information necessary to define and control the actual batch processing. This is a necessarily vague description because the contents of any given Step are at the discretion of the developer writing a Job. A Step can be as simple or complex as the developer desires. A simple Step might load data from a file into the database, requiring little or no code. (depending upon the implementations used) A more complex Step may have complicated business rules that are applied as part of the processing.
Step Processing types
There are 2 ways a step can process data,
Tasklet
If the step requires only to execute a single task then you can use a tasklet. Typical use case for this is when you need to run a stored procedure, or copy a file from one location to the other. In the “Hello World” example we used a Tasklet to print the message to the console.
Chunk oriented
Chunk oriented processing involves specifying a reader, processor and writer. The input is read one item at a time in sequence and passed to the processor and eventually to the writer in chunks within a transaction boundary. Once the commit interval is reached the items are committed to the writer. Chunk oriented processing is what we will cover on this page.
Requirements
- Java 5
- Maven 2
- Review and implementation of the following article.
- And this one:
http://numberformat.wordpress.com/2011/10/01/hello-world-with-spring-batch-2-1-x/
Project Setup
We will be modifying an existing project so please review the articles listed in the section above.
Input Data
The following is the input csv file that will be read. Please create the following file in the projects resource directory.
src/main/resources/input_data.txt
7876,ADAMS,CLERK,1100 7499,ALLEN,SALESMAN,1600 7698,BLAKE,MANAGER,2850 7782,CLARK,MANAGER,2450 7902,FORD,ANALYST,3000 7900,JAMES,CLERK,950 7566,JONES,MANAGER,2975 7839,KING,PRESIDENT,-5000 7654,MARTIN,SALESMAN,1250 7934,MILLER,CLERK,1300 7788,SCOTT,ANALYST,3000 7369,SMITH,CLERK,800 7844,TURNER,SALESMAN,1500 7521,WARD,SALESMAN,1250
Employee Bean
This is a simple bean that represents a single Employee.
src/main/java/com/test/Employee.java
package com.test;
public class Employee {
private Integer empId;
private String lastName;
private String title;
private Integer salary;
private String rank;
public Integer getEmpId() {
return empId;
}
public void setEmpId(Integer empId) {
this.empId = empId;
}
public String getLastName() {
return lastName;
}
public void setLastName(String lastName) {
this.lastName = lastName;
}
public String getTitle() {
return title;
}
public void setTitle(String title) {
this.title = title;
}
public Integer getSalary() {
return salary;
}
public void setSalary(Integer salary) {
this.salary = salary;
}
public void setRank(String rank) {
this.rank = rank;
}
public String getRank() {
return rank;
}
@Override
public String toString() {
return "Employee [empId=" + empId + ", lastName=" + lastName
+ ", title=" + title + ", salary=" + salary + ", rank=" + rank
+ "]";
}
}
Reading
In order to read data from a file we will only need to write the FieldSetMapper class that takes a FieldSet object and maps its contents into an bean.
src/main/java/com/test/EmployeeFieldSetMapper.java
package com.test;
import org.springframework.batch.item.file.mapping.FieldSetMapper;
import org.springframework.batch.item.file.transform.FieldSet;
import org.springframework.validation.BindException;
public class EmployeeFieldSetMapper implements FieldSetMapper<Employee> {
public Employee mapFieldSet(FieldSet fieldSet) throws BindException {
if(fieldSet == null) return null;
Employee emp = new Employee();
// unlike jdbc the index is 0 based
emp.setEmpId(fieldSet.readInt(0));
emp.setLastName(fieldSet.readString(1));
emp.setTitle(fieldSet.readString(2));
emp.setSalary(fieldSet.readInt(3));
return emp;
}
}
Arithmetic
Not really! All we are doing is assigning a Rank based on the salary amount. The item processor takes an input Bean and converts it to an output bean. In this case the beans are the same but they don’t have to be.
src/main/java/com/test/EmployeeProcessor.java
package com.test;
import org.springframework.batch.item.ItemProcessor;
public class EmployeeProcessor implements ItemProcessor<Employee, Employee> {
public Employee process(Employee emp) throws Exception {
// if salary >= 2500 then set rank as "Director"
if(emp.getSalary() >= 2500 ) {
emp.setRank("Director");
} else {
emp.setRank("N/A");
}
return emp;
}
}
Writing
All the objects needed for writing the file are configured using xml. See the xml file below for more details.
Job Configuration
Modify the job xml file to define the new beans.
src/main/resources/simpleJob.xml
<?xml version="1.0" encoding="UTF-8"?>
<beans:beans xmlns="http://www.springframework.org/schema/batch"
xmlns:beans="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://www.springframework.org/schema/batch
http://www.springframework.org/schema/batch/spring-batch-2.1.xsd
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-2.0.xsd">
<beans:import resource="applicationContext.xml"/>
<!-- Tokenizer - Converts a delimited string into a Set of Fields -->
<beans:bean name="defaultTokenizer"
class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer"/>
<!-- FieldSetMapper - Populates a bean's attributes with using the FieldSet -->
<beans:bean name="employeeFieldSetMapper" class="com.test.EmployeeFieldSetMapper"/>
<!-- LineMapper - Uses the tokenizer and Mapper to create instances of a Bean. -->
<beans:bean name="employeeLineMapper" class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<beans:property name="lineTokenizer" ref="defaultTokenizer"/>
<beans:property name="fieldSetMapper" ref="employeeFieldSetMapper"/>
</beans:bean>
<!-- Reader - used by the tasklet to process one Item from the input. -->
<beans:bean name="empReader" class="org.springframework.batch.item.file.FlatFileItemReader">
<beans:property name="lineMapper" ref="employeeLineMapper"/>
<!-- use spring integrations for the following, but for now filename is hard coded -->
<beans:property name="resource" value="input_data.txt"/>
</beans:bean>
<!-- Processor -->
<beans:bean name="empProcessor" class="com.test.EmployeeProcessor">
</beans:bean>
<!-- Writer -->
<beans:bean id="empWriter" class="org.springframework.batch.item.file.FlatFileItemWriter">
<beans:property name="resource" value="file:target/output_data.txt" />
<beans:property name="lineAggregator">
<beans:bean class="org.springframework.batch.item.file.transform.DelimitedLineAggregator">
<beans:property name="delimiter" value=","/>
<beans:property name="fieldExtractor">
<beans:bean class="org.springframework.batch.item.file.transform.BeanWrapperFieldExtractor">
<beans:property name="names" value="empId,lastName,title,salary,rank"/>
</beans:bean>
</beans:property>
</beans:bean>
</beans:property>
</beans:bean>
<job id="helloWorldJob">
<step id="step1" next="step2">
<tasklet ref="helloWorldTasklet" />
</step>
<step id="step2">
<tasklet>
<chunk reader="empReader" processor="empProcessor" writer="empWriter" commit-interval="1"/>
</tasklet>
</step>
</job>
<beans:bean name="helloWorldTasklet" class="com.test.HelloWorldTasklet"/>
<!--
To run the job from the command line type the following:
mvn clean compile exec:java -Dexec.mainClass=org.springframework.batch.core.launch.support.CommandLineJobRunner -Dexec.args="simpleJob.xml helloWorldJob"
-->
</beans:beans>
Execute the job
Go to the command line and type the following:
mvn clean compile exec:java -Dexec.mainClass=org.springframework.batch.core.launch.support.CommandLineJobRunner -Dexec.args="simpleJob.xml helloWorldJob"
View the Results
The output file will appear in the target/ folder of the project.
Further Reading
To keep things simple we were reading and writing files located in the project own folders. There are many enterprise design patterns that describe the best practices for feeding data into the batch programs. For further reading on this topic please see the Spring Integrations Framework Homepage.
Excellent tutorial. I have picked more from your tutorials than I have from a couple of books. Thanks for writing blog posts.
hi, good work dude! Thanks a lot..I understood the ItemProcessor now…
Nice tutorial easy to follow and concise.Many Thanks