Lambda architecture: No Silver Bullet

I been reading a lot of criticism about the lambda architecture lately, and it reminded me a lot about that famous essay about Software Engineering. And this doesn’t mean the Lambda architecture is not good, but that just because one architectural pattern exists doesn’t mean you have to use it in every single case.

Diagram_of_Lambda_Architecture_(generic)

My brief romance with Lambda

I’ve only worked in a couple of small projects related with Big Data / Modern Data Architecture that involved a Lambda architecture. The one that comes to mind was a proof of concept for IoT using the Azure Platform: Azure IoT hub, Azure Stream Analytic Jobs, SQL Azure Data Warehouse and Azure ML.

Win10IoT-Arch

The goal was to capture telemetry data generated by Raspberry Pi devices (using Windows 10 IoT) and sent to an IoT hub. The data was then read by a Stream Analytics Job that sent it to the SQL DWH (batch layer), a PowerBI dasbhoard and also to an Event Hub for posterior feedback back to the Pi (speed layer). This case was particularly simple because the dashboard only had to show data from the speed layer (so there were no joins done with the speed layer) and also the batch layer only reprocessed the AzureML model on a daily basis. So far so good, lambda was my friend.

But this is only a rare case where both the batch and speed layer go separate ways. Usually they are combined at the end to show data in a dashboard, thus requiring to assemble data from both worlds. Plus, the logic on the batch layer has to be rewritten using speed layer tools.

What are the alternatives?

If you haven’t heard about the ‘Kappa architecture‘, you can take a look here. Basically, they propose using streaming as the common layer for both the speed and the batch layer

Another alternative is using micro-batches. This is probably the one I feel more comfortable with, coming from the world of BI, Data warehouses and batch processing.

Another interesting idea is presented here by folks from Uber in an O’Reilly data post: change Hadoop API to add mini-batches by basically adding just to primitives: Upserts and Incremental Consumption.

As you can see, not everything has to be Lambda-fied. As with all architectural patterns in software engineering, you just have to make the design the right solution for the problem at hand.