Friday, 22 November 2013

Partition Components

A partition is a file that is a portion of a multifile. 2. A partition is a segment of a parallel computation. To partition data is to divide it into segments, so the data can run in parallel. Some components partition data. There are number of partition components likely” partition by

Partition by key


Partition by Key reads records from the in port and distributes data records to its output flow partitions according to key values.
Picture

In the parameter field key has to be mentioned

A partition by key component is generally followed by a sort
See the example below
Picture

[In the above example in Join component sort parameter is used as input must be sorted ]

Partition by Round Robin 


Partition by round-robin distributes blocks of data records evenly to each output flow in round-robin fashion. Partitioning key is not required.
The difference between Partition by Key and Partition by Round Robin is the 1st one may not distribute data uniformly across the all partition in a multi file system but the latter does.

Partition by Expression

Partition by Expression distributes data records to its output flow partitions according to a specified expression.
Picture

In the function parameter we need to mention the required expression
For example
((next_in_sequence()*number_of_partition() + this_partition())/number_of_partition)/1000
expression will distribute all the records in block of 1000 records in round robin fashion across all partition
For example
if (record_sub_typ=="cg1") 0
else if (record_sub_typ=="cg2") 1
else 3
expression suggess all the records having value record_sub_typ is “cg1” will pass through flow 0 and if value record_sub_typ is cg2all the records will pass through flow 1 else rest of the records will pass through flow 2.


Partition by Range 


Partition by Range distributes data records to its output flow partitions according to the ranges of key values specified for each partition. This component is not frequently used
Use the same key specifier for both components.
Make the number of partitions on the flow connected to the out port of Partition by Range the same as the value (n) in the num_partitions parameter of Find Splitters.

This component
Reads splitter records from the split port, and assumes that these records are sorted according to the key parameter.
Determines whether the number of flows connected to the out port is equal to n (where n-1 represents the number of splitter records).If not, Partition by Range writes an error message and stops the execution of the graph.
Reads data records from the flows connected to the in port in arbitrary order.
Distributes the data records to the flows connected to the out port according to the values of the key field(s), as follows:
a) Assigns records with key values less than or equal to the first splitter record to the first output flow.
b) Assigns records with key values greater than the first splitter record, but less than or equal to the second splitter record to the second output flow, and so on.

Partition with Load Balance 

Partition with Load Balance distributes data records to its output flow partitions, writing more records to the flow partitions that consume records faster. This component is not frequently used.

No comments:

Post a Comment

Thanks for your comments