Skip to content

Spring Batch

Vishnu Garg edited this page Aug 22, 2018 · 1 revision

Spring Batch

Spring Batch, is an open source framework for batch processing – execution of a series of jobs. Spring Batch provides classes and APIs to read/write resources, transaction management, job processing statistics, job restart and partitioning techniques to process high-volume of data.

Architecture

Architecture and Flow

Primary Components

1. Job : A single execution unit that summarises a series of processes for batch application in Spring Batch. Step A unit of processing which constitutes Job. 1 job can contain 1~N steps Reusing a process, parallelization, conditional branching can be performed by dividing 1 job process in multiple steps. Step is implemented by either chunk model or tasket model(will be described later).

3. JobLauncher An interface for running a Job. JobLauncher can be directly used by the user, however, a batch process can be started simply by starting CommandLineJobRunner from java command. CommandLineJobRunner undertakes various processes for starting JobLauncher.

4. JobRepository A system to manage condition of Job and Step. The management information is persisted on the database based on the table schema specified by Spring Batch.

5. TaskLet In Tasket model, ItemReader/ItemProcessor/ItemWriter substitutes a single Tasklet interface implementation. Since ItemReader and ItemWriter responsible for data input and output are often the processes that perform conversion of database and files to Java objects and vice versa, a standard implementation is provided by Spring Batch. In general batch applications which perform input and output of data from file and database, conditions can be satisfied just by using standard implementation of Spring Batch as it is. ItemProcessor which is responsible for processing data implements input check and business logic.

Process Flow

  1. JobLauncher is initiated from the job scheduler.
  2. Job is executed from JobLauncher.
  3. Step is executed from Job.
  4. Step fetches input data by using ItemReader.
  5. Step processes input data by using ItemProcessor.
  6. Step outputs processed data by using ItemWriter.

A flow for persisting job information

  1. JobLauncher registers JobInstance in Database through JobRepository.
  2. JobLauncher registers that Job execution has started in Database through JobRepository.
  3. JobStep updates miscellaneous information like counts of I/O records and status in Database through JobRepository.
  4. JobLauncher registers that Job execution has completed in Database through JobRepository.

JobInstance Spring Batch indicates "logical" execution of a Job. JobInstance is identified by Job name and arguments. In other words, execution with identical Job name and argument is identified as execution of identical JobInstance and Job is executed as a continuation from previous activation. When the target Job supports re-execution and the process was suspended in between due to error in the previous execution, the job is executed from the middle of the process. On the other hand, when the target job does not support re-execution or when the target JobInstance has already been successfully processed, exception is thrown and Java process is terminated abnormally. For example, JobInstanceAlreadyCompleteException is thrown when the process has already been completed successfully.

JobExecution ExecutionContext JobExecution indicates "physical" execution of Job. Unlike JobInstance, it is termed as another JobExecution even while re-executing identical Job. As a result, JobInstance and JobExecution shows one-to-many relationship. ExecutionContext is considered as an area for sharing metadata such as progress of a process in identical JobExecution. ExecutionContext is primarily used for enabling Spring Batch to record framework status, however, means to access ExecutionContext by the application is also provided. The object stored in the JobExecutionContext must be a class which implements java.io.Serializable.

** StepExecution ExecutionContext** StepExecution indicates "physical" execution of Step. JobExecution and StepExecution shows one-to-many relationship. Similar to JobExecution, ExecutionContext is an area for sharing data in Step. From the viewpoint of localization of data, information which is not required to be shared by multiple steps should use ExecutionContext of target step instead of using ExecutionContext of Job. The object stored in StepExecutionContext must be a class which implements java.io.Serializable.

JobRepository A function to manage and persist data for managing execution results and status of batch application like JobExecution or StepExecution is provided. In general batch applications, the process is started by starting a Java process and Java process is also terminated along with termination of process. Hence, since the data is likely to be referred across Java process, it is stored in volatile memory as well as permanent layers like database. When data is to be stored in the database, database objects like table or sequence are required for storing JobExecution or StepExecution. It is necessary to generate a database object based on schema information provided by Spring Batch.

Refrence

Example Links http://www.mkyong.com/tutorials/spring-batch-tutorial/

Clone this wiki locally