Suitable for applications that require a series of independent computations to be performed on a stream of data.
Consists of:
- Filters
- Transforms data input to data output
- No shared state between filters (Pure Functions)
- Outputs can begin when all inputs are consumed
- Pipes:
- Carry data between filters
Pros and Cons
Pros:
- Filters can be replaced or improved locally as long as contracts remain the same
- Supports concurrency naturally through parallelization
- Performing throughput and deadlock analysis is possible Cons:
- Deserialization across pipes can be expensive
- Adding pipe variants adds complexity
- Debugging end to end behaviours can be non-trivial
- Not ideal for interactive systems
Variants
Pipelines:
- Requires a linear sequence of filters, where each filter is responsible for its own domain (each components)
- See
|in unix shells Batch-Sequence: - Based on the pipeline variants, requires that each filter processes all input before producing output
An important example is distributed data processing systems which use pipe filters to perform a graph of transformations across a large dataset. This includes Map Reduce.