Pentaho data integration scaling

3/14/2024

The first part of this chapter deals with the parallelism inside a transformation and the various ways to make use of it to make it scale up. Both these approaches are part of ETL subsystem #31, the Parallelizing/Pipelining System. Scaling out is using the resources of multiple machines and have them operate in parallel. Scaling up is using the most of a single server with multiple CPU cores.

In this chapter, we unravel the secrets behind making your transformations and jobs scale up and out. Whether you have a single personal computer or hundreds of large servers at your disposal you want to make Kettle use all available resources to get results in an acceptable timeframe.

When you have a lot of data to process it's important to be able to use all the computing resources available to you.

Chapter 16. Parallelization, Clustering, and Partitioning

0 Comments

Pentaho data integration scaling

Leave a Reply.

Author

Archives

Categories