Paul Bakker bio photo

Paul Bakker

Software architect, author of Modular Cloud Apps with OSGi and Java 9 Modularity

Java 9 Modularity Email Twitter Google+ LinkedIn Github Stackoverflow

I spent this week at a conference way out of my comfort zone: Microsoft Ignite. Because we are recently working more and more with Azure, we wanted to get more familiar with it’s ecosystem at this conference. To jump to the conclusion immediatly: Microsoft really seems to get it, they are doing all the right things when it comes to Azure. I’m impressed.

The “Microsoft loves Linux” message was repeated continiously. This doesn’t necessarily mean anything without any backing, but I have not had the feeling that this was just a marketing message even once. All the news around Azure is about embracing open source tools and giving developers choice. Looking back a few years this is a huge change. Java has been “available” on Azure for a long time, but this was pretty much useless. It all felt very much like trying to onboard Java developers on the platform just to convince them to start using .NET. Times have changed…

Highlights

We’re already using most of the basic IaaS services, running (CoreOS) clusters, managing resource groups and the new portal all work well. At the conference I was mostly looking to learn more about some of the more advanced new services and ways to automate more of the infrastructure.

One of the really new services is Azure Machine Learning. This service makes it relatively easy to define, train and validate models. Relatively, because you obviously still need at least some basic understanding of the algoritms. This is normally the world of tools like R, but although R is a great tool, it’s difficult to integrate such a model in an application. Azure ML makes it possible to export a trained model (which includes R support) as a web service, which makes it trivial to use.

Another interesting service is HD Insight, basically hosted Storm and Hadoop clusters. This makes it much more accessible to use these kind of tools. It’s also interesting because there is nothing Azure specific about working with these clusters; it’s Storm and Hadoop as you know it, just without the trouble of setting up the infrastructure. This is exactly what I would hope for in cloud services.

A service of a different type is the new DocumentDB. DocumentDB is a document store that “speaks” JSON, while queries are written in SQL like syntax. DocumentDB is probably best compared to MongoDB, although there are differences. Although a very interesting service, I would feel a bit more troubled to build on top of a service like this. In contrast to Storm and Hadoop, DocumentDB only runs in the Azure cloud, and building on top of it will lock you in to the service. This is not too different from using other Document stores, because there is no standard to build on top of, but I’m not exactly excited to only be able to run in the cloud. At the same time, it does save a lot of infrastructure complexity when you don’t have to manage clusters of db servers. It’s a difficult tradoff to make.

There were also some very interesting talks about Azure Resource Manager, a way to declartively define a cluster including machines, networks, storage etc. When completely automating cluster configuration this takes care of the step of getting all the resources ready. Mechanisms such as CloudInit and systemd can than handle setting up the different components once the machines and other resources are available.

Finally I really enjoyed a deep dive session about Azure Storage. Azure Storage is not so much a service on itself, but is about the underlying infrastructure of the Blobstore, Table storage and attachable storage. The talk gave a lot of insights in partitioning mechanics, which are very helpful when optimizing performance. As always, performance in the cloud is not something magical. You application and data needs to be designed in a way to maximize performance.

Docker

I was a bit disappointed by (lack of) coverage about Docker related topics. There were some talks, which were good, but still quite basic. My team is working on making Docker based clustered infrastructure available to our customers, including logging and monitoring features. This really does seem to hit a sweet spot which is currently still missing in most offerings.