Azure Machine Learning: Data Mining 2.0

Azure Machine Learning (aka AzureML) is one of the new products/services in this new bold world of ‘cloud first, mobile first’ that Microsoft is endeavouring. It helps you create predictive analytics from your data in a very quick and simple way, and easily integrate this with allyour applications. And you can do that armed just with your browser!

But I think I’ve heard about this before… Haven’t I?

Remember a couple of years ago everything was 2.0? Web 2.0 was the paradigm everyone swore by, adding ‘social’ and ‘services’ around all we already knew by then.

That is how I feel about Azure Machine Learning: it is a great, improved 2.0 version of the old Data Mining concept we’ve known for years (SQL Server implemented this with its SSAS Data Mining feature). Don’t take me wrong, I’m not saying that because this already existed one will quickly discard it. I think Microsoft took a page of its own book, and put a lot of thinking on how to bring that into 2015. And that is great!

Out with the old…

If you remember, Analysis Services Data Mining always had a couple of algorithms you can use:

  • Classification algorithms predict one or more discrete variables, based on the other attributes in the dataset.
  • Regression algorithms predict one or more continuous variables, such as profit or loss, based on other attributes in the dataset.
  • Segmentation algorithms divide data into groups, or clusters, of items that have similar properties.
  • Association algorithms find correlations between different attributes in a dataset. The most common application of this kind of algorithm is for creating association rules, which can be used in a market basket analysis.
  • Sequence analysis algorithms summarize frequent sequences or episodes in data, such as a Web path flow.

To use them you would create a model in SSAS, load data (with help provided by SSIS) to train the model, and then you can use them through DMX (Data Mining eXtension) queries. Doing DMX queries involved connecting to SSAS using native windows-only proprietary drivers and then sending these queries to get back your results.

… and in with the new!

The principle behind AzureML is pretty much the same. Couple of notorious diferences here:

– You don’t need SSAS: In fact, you don’t even need SQL Server at all: no database, no SSIS, no SSAS. This is a pure online service, born into and for the cloud. There’s been talks about bringing it to on-premise, but honestly I don’t think that is going to happen any time soon (and nobody would blink an eye either).

– Data loading and manipulation inside the tool: As mentioned before, you don’t need SSIS. Your expermient designer in AzureML has a workflow view that resembles SSIS in the sense that you have components to scrub and manipulate data before loading into your model. One less thing to worry about.

– No DMX or weird query languages to use: As this is a cloud service, the output of your model is a web service. Anybody (with the correspondingAPI key) can call it and make use of your model. This makes your model available and online-ready in really no time.

– Integration with R: R is ‘THE’ language to create models. In the old world, you could still create your own models using the SSAS Data Mining SDK (using C++ or C#) but they would still have to be compiled into native windows code, deployed, managed and available only through SSAS. Being able to take any R algorithm available and use as a component makes this very much open for experimentation.

– One click deployment to Azure: To deploy your old data mining model used to require creating some kind of component (or service) to wrap the SSAS DMX call. Deploying to the cloud is literally done in one click, and you are ready to go. There’s even boilerplate code provided for you to call the production-ready web service from C#, Python and R.

– Really low entrance barrier: No infrastructure setup, no licensing costs, no development tools setup. The only thing you need to do is register to the AzureML service online and pay for the processing cost when you run your model. That’s it!

Summary

AzureML is one of those products (services?) that makes me excited about the future of Business Intelligence. So easy to setup, work with and deploy that is kind of a crime not using it!

Now, this is still a 1.0 version of a product. Features that are still not there or missing:

– Heavy data encryption: Training models often involve highly sensitive / private data. Everybody requires a trusted and heavily encrypted transport for this data. This is where most of the asks are going to come from: people coming from the Enterprise world concerned about their data travelling through public networks.

– Easy model retrain: Model re training is something it should be done frequently. Once you train your model, you need to keep it up to date to respond to environment changes and also potential decreasing accuracy. There is no easy way to automate this right now.

– More algorithms: This is mitigated by the fact that you can infinitely expand by using R, but still this is where most of the grow will come from. Also, Microsoft recently bought Revolution Analytics, so I would expect more algorithms and features added.

Your next steps

If you’re interested in using AzureML, just register a new account (there’s a 1 month, $200 trial) and just start using it. Some resources you can use to start learning it are:

Books

Predictive Analytics with Microsoft Azure Machine Learning: Build and Deploy Actionable Solutions in Minutes

By: Roger Barga; Valentine Fontama; Wee Hyong Tok
Publisher: Apress
Pub. Date: November 26, 2014
Print ISBN-13: 978-1-484-20445-0
Pages in Print Edition: 188

Videos

– If you only have 5 minutes or less, watch this: Azure ML Overview: this is a great 5 minutes overview of what AzureML is.

https://www.youtube.com/watch?v=uJhVZ58b8Fs&list=PL8nfc9haGeb4SjrnQWPuJsSitvxN9hSdc

– If you have one hour, watch this: Intro to Azure Machine Learning: The full product tour, with demos, from TechEd 2014.

https://www.youtube.com/watch?v=kZ04LnSjWek

If you have more time, you can start watching this YouTube video playlist.