Thomas Dinsmore’s ML/DL blog recently concluded a look back on significant advancements in data science, machine learning and deep learning — many of which involved R and/or Microsoft. Here are those highlights (reproduced with permission):
The R Project
R and Python maintained their leadership as primary tools for open data science. The Python versus R debate continued amid an emerging consensus that data scientists should consider learning both. R has a stronger library of statistics and machine learning techniques and is agiler when working with small data. Python is better suited to developing applications, and the Python open source license is less restrictive for commercial application development.
- The R user community continued to expand in 2016. It ranked second only to SQL in the 2016 O’Reilly Data Science Salary Survey; first in the KDNuggets poll; and first in the Rexer survey. R ranked fifth in the IEEE Spectrum ranking.
- R functionality grew at a rapid pace. In April, Microsoft’s Andrie de Vries reported that there were more than 8,000 packages in CRAN, R’s primary repository for contributed packages. As of mid-December, there are 9,737 packages. Machine learning packages in CRAN continued to grow in number and functionality.
- The R Consortium, a Collaborative Project of the Linux Foundation, made some progress in 2016. IBM and ESRI joined the Consortium, whose membership now also includes Alteryx, Avant, DataCamp, Google, Ketchum Trading, Mango Solutions, Microsoft, Oracle, RStudio, and TIBCO. There are now three working groups and eight funded projects.
- Hadley Wickham had a good year. One of the top contributors to the R project, Wickham co-wrote R for Data Science and released tidyverse 1.0.0 in September. In The tidy tools manifesto, Wickham explained the four basic principles to a tidy API.
- Max Kuhn, the author of Applied Predictive Modeling and developer of the caret package for machine learning, joined RStudio in November. RStudio previously hired Joseph Rickert away from Microsoft.
- AT&T Labs is doing some impressive work with R, including the development of a distributed back-end for out-of-core processing with Hadoop and other data platforms. At the UseR! Conference, Simon Urbanek presented a summary.
- It is impossible to enumerate all of the interesting analysis performed in R this year. David Robinson’s analysis of Donald Trump’s tweets resonated; using tidyverse, tidytext, and twitteR, Robinson was able to distinguish between the candidate’s “voice” and that of his staffers on the same account.
- On the Revolutions blog, Microsoft’s David Smith surveyed the growing role of women in the R community.
Microsoft had quite a year in machine learning and deep learning. As I noted in Parts One and Two, in 2016 MSFT launched cognitive APIs in Azure for vision, speech, language, knowledge, and search; a managed service for Spark in Azure HDInsight; enhancements to Azure Machine Learning and Version 2.0 of its deep learning framework, rebranded as Microsoft Cognitive Toolkit.
That’s just for starters.
- In January, Microsoft announced Microsoft R Server, a rebranding of the product it acquired with Revolution Analytics in 2015. Microsoft R Server includes an enhanced R distribution, a scalable back-end, and integration tools. During the year, Microsoft two major releases for R Server. In Release 8, the company added push-down integration with Spark. Release 9 updated the Spark integration for Spark 2.0, and added MicrosoftML, a new R package for machine learning.
- Microsoft announced SQL Server 2016 in March with embedded SQL Server R Services. On the Revolutions blog, David Smith reports on the launch. Tomaž Kaštrun explains what you can do with R services in SQL Server.
- In November, after an extended preview, Microsoft announced the general availability of R Server for Azure HDInsight, a scale-out implementation of R integrated with Spark clusters created from HDInsight.
- Also in Azure, Microsoft added a Linux version of the Data Science Virtual Machine (DSVM). Previously available as a Windows instance, DSVM includes Revolution R Open, Anaconda, Visual Studio Community Edition, PowerBI Desktop, SQL Server Express and the Azure SDK.
- PowerBI, Microsoft’s powerful visualization tool, added R support in August. In ComputerWorld, Sharon Machlis, an R user, enthused. More here, on the Revolutions blog.
- R Tools for Visual Studio launched to public preview in March, and to general availability in September. Also in September, Microsoft released the Microsoft R Client, a free data science tool that works with Microsoft R Open and the ScaleR distributed back end.
- Microsoft data scientists Gopi Krishna Kumar, Hang Zhang and Jacob Spoelstra developed a methodology for data science, which they presented at the Microsoft Machine Learning and Data Science Summit 2016 in September. David Smith reports. The method, which the authors call Team Data Science Process, includes a standard directory structure for managing project artifacts using a system such as Git. It also includes open source utilities to support the process.
- A Microsoft team developed a system that recognizes conversational speech as well as humans do. The team used convolutional and long short-term memory (LSTM) neural networks built with Microsoft Cognitive Toolkit (CNTK).
- Released in 2015 as CNTK, Microsoft rebranded its deep learning framework as Microsoft Cognitive Toolkit (MCT) and released Version 2.0, with a new Python API and many other enhancements. In VentureBeat, Jordan Novet reports.
- The company also launched 22 cognitive APIs in Azure for vision, speech, language, knowledge, and search. Separately, MSFT released its managed service for Spark in Azure HDInsight and continued to enhance Azure Machine Learning.
- MSFT announced the Azure N-Series compute instances powered by NVIDIA GPUs for general availability in December.
Other than that, it was a quiet year in Redmond. [ha! — ed.]
Catch up on the other roudups from the ML/DL blog here:
- Part 1: General trends in machine learning and deep learning
- Part 2: Open source machine learning and deep learning projects, including R, Python, Spark, H20 and Tensorflow
- Part 3: Machine learning and deep learning initiatives from major vendors, including SAS, IBM, Microsoft, Oracle, SAP and Teradata
- Part 4: Startups involved in machine learning and deep learning, including Alpine, Continuum, Databricks, DataRobot, KNIME, and RapidMiner