June 14, 2017
Cloudera recently introduced Cloudera Altus, a Hadoop-in-the-cloud offering with an interesting processing model:
- Altus manages jobs for you.
- But you actually run them on your own cluster, and so you never have to put your data under Altus’ control.
Thus, you avoid a potential security risk (shipping your data to Cloudera’s service). I’ve tentatively named this strategy light-touch managed services, and am interested in exploring how broadly applicable it might or might not be.
For light-touch to be a good approach, there should be (sufficiently) little downside in performance, reliability and so on from having your service not actually control the data. That assumption is trivially satisfied in the case of Cloudera Altus, because it’s not an ordinary kind of app; rather, its whole function is to improve the job-running part of your stack. Most kinds of apps, however, want to operate on your data directly. For those, it is more challenging to meet acceptable SLAs (Service-Level Agreements) on a light-touch basis.
Let’s back up and consider what “light-touch” for data-interacting apps (i.e., almost all apps) would actually mean. The basics are:
- The user has some kind of environment that manages data and executes programs.
- The light-touch service, running outside this environment, spawns one or more app processes inside it.
- Useful work ensues …
- … with acceptable reliability and performance.
- The environment’s security guarantees ensure that data doesn’t leak out.
Cases where that doesn’t even make sense include but are not limited to:
- Transaction-processing applications that are carefully tuned for efficient database access.
- Applications that need to be carefully installed on or in connection with a particular server, DBMS, app server or whatever.
On the other hand:
- A light-touch service is at least somewhat reasonable in connection with analytics-oriented data-management-plus-processing environments such as Hadoop/Spark clusters.
- There are many workloads over Hadoop clusters that don’t need efficient database access. (Otherwise Hive use would not be so prevalent.)
- Light-touch efforts seem more likely to be helped than hurt by abstraction environments such as the public cloud.
So we can imagine some kind of outside service that spawns analytic jobs to be run on your preferred — perhaps cloudy — Hadoop/Spark cluster. That could be a safe way to get analytics done over data that really, really, really shouldn’t be allowed to leak.
But before we anoint light-touch managed services as the NBT (Next Big Thing/Newest Bright Thought), there’s one more hurdle for it to overcome — why bother at all? What would a light-touch managed service provide that you wouldn’t also get from installing packaged software onto your cluster and running it in the usual way? The simplest answer is “The benefits of SaaS (Software as a Service)”, and so we can rephrase the challenge as “Which benefits of SaaS still apply in the light-touch managed service scenario?”
The vendor perspective might start, with special cases such as Cloudera Altus excepted:
- The cost-saving benefits of multi-tenancy mostly don’t apply. Each instance winds up running on a separate cluster, namely the customer’s own. (But that’s likely to be SaaS/cloud itself.)
- The benefits of controlling your execution environment apply at best in part. You may be able to assume the customer’s core cluster is through some cloud service, but you don’t get to run the operation yourself.
- The benefits of a SaaS-like product release cycle do mainly apply.
- Only having to support the current version(s) of the product is a little limited when you don’t wholly control your execution environment.
- Light-touch doesn’t seem to interfere with the traditional SaaS approach of a rapid, incremental product release cycle.
When we flip to the user perspective, however, the idea looks a little better.
Bottom line: Light-touch managed services are well worth thinking about. But they’re not likely to be a big deal soon.