Mirror Maker has been used extensively to replicate data between different cluster of Kafka . Its time to expand horizon and create a more stable data transfer framework which removes shortcomings of Mirror Maker.
Mirror Maker at the core is a kaka publisher subscriber taking data from one cluster and pushing to another . It does the job decently but has a basic drawback of not being scalable . For e.g. if a server running Mirror Maker dies , then it a responsibility to get it up on another server .
Linked in seems to have gone a step further with Brooklin. A good architecture description is here .
Benefits of using Brooklin as replacement of Mirror Maker:
- Scalable : Like any other Kafka component, its scalable, so if one brooklin server goes down, other one takes over. As snippet shown below which is taken from server.properties of config folder of brooklin install-able , you provide zookeeper or zookeeper cluster address which it should register itself to . If you are planning to start brooklin on multiple servers to form a cluster , make sure the cluster name is same , as in this case its ‘brooklin-cluster’
brooklin.server.coordinator.cluster=brooklin-cluster brooklin.server.coordinator.zkAddress=localhost:2181 brooklin.server.httpPort=32311
- Pause and Resume transmission : You can anytime pause transferring data from one cluster to another and resume it on later stage by using curl commands as it exposes rest end point . As in example shown above , its on 32311.
- Monitoring : You can monitor the data stream using the rest end point it provides.
Limitations of Brooklin
- Topics are created with defaults : As in the case of Mirror Maker , if the destination cluster doesn’t have the topic being replicated , Brooklin creates the topic for you but with default number of partition or config’s . For e.g. if source cluster has topic with 2 portions , it would create destination with just 1 partition. Found similar problem with config’s like expiration time are not getting replicated.
- Cannot provide destination while creating data stream : At this point you cannot provide destinations while creating stream . You can provide the source but destination is always picked by server.properties file provided while starting the brooklin server as shown below .
But over all its worth moving from Mirror Maker to Brooklin for the benefits it does provide.
As per the performance , tests didn’t show any degradation of performance as compared to Mirror Maker .
Modifying consumer properties
There are times when we want to control various aspects of kafka consumer/producer used by brooklin to transfer the data. For e.g. poll time , ssl configuration .
All this can be done by modifying the properties in server.properties file of brooklin under config folder.
For e.g. if you want to change the poll time of consumer, you can modify it in server.properties as
The general concept is
brooklin.server.connector.<connector name mentioned in connectorNames property at top>.<property name> = <value>
You can find many more more properties for kafka mirroring connector in com.linkedin.datastream.connectors.kafka.KafkaBasedConnectorConfig.java file.