- Home
- Tag: Hadoop
Posts tagged Hadoop
Software that enables distributed processing for big data by using clusters and simple programming models. For more information, see http://hadoop.apache.org.
Tag: Hadoop
6 ‘data’ buzzwords you need to understand

Take one major trend spanning the business and technology worlds, add countless vendors and consultants hoping to cash in, and what do you get? A whole lot of buzzwords with unclear definitions. In the world of big data, the surrounding hype has spawned a brand-new lingo. Need a little clarity? Read on for a glossary of sorts highlighting some of the main data types you should understand. 1. Fast data The shining star in this constellation of terms is "fast data," which is popping up with increasing frequency. It refers to "data whose utility is going to decline over time," said ... Read More
PolyBase Setup Errors and Possible Solutions
Blog Authors: Murshed Zaman and Sumin Mohanan Reviewer(s): Barbara Kess Prologue PolyBase is a new feature in SQL Server 2016. It was popularized by APS (Microsoft Analytics Platform System) and Azure SQL DW. PolyBase allows access to relational and non-relational data from SQL Server using familiar T-SQL language. It allows you to run queries on external data that resides in Hadoop or Azure blob storage. Optionally, it can push query operations to Hadoop. If you are interested in learning more about PolyBase, you can look at the PolyBase Guide on MSDN. PolyBase setup is well documented on MSDN. But since ... Read More
How big data is changing the game for backup and recovery

It's a well-known fact in the IT world: Change one part of the software stack, and there's a good chance you'll have to change another. For a shining example, look no further than big data. First, big data shook up the database arena, ushering in a new class of "scale out" technologies. That's the model exemplified by products like Hadoop, MongoDB, and Cassandra, where data is distributed across multiple commodity servers rather than packed into one massive one. The beauty there, of course, is the flexibility: To accommodate more petabytes, you just add another inexpensive machine or two rather than ... Read More
Meet Microsoft’s ‘planet scale’ NoSQL database

Given the strength of SQL Server in business, you might be surprised to learn that Microsoft has spent the last five years building a distributed NoSQL database – until you remember that services like Power BI, Bing and the Office Web apps face the same challenges as services like Netflix. They’re problems more and more enterprises have to deal with too: the deluge of data, the demands of mobility and the need for low latency even though you’re relying on cloud services. That’s why Microsoft’s Dharma Shukla, who previously built key technologies like Windows Workflow Foundation (and worked on both ... Read More
Using Apache Spark to predict attack vectors among billions of users and trillions of events

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science: Stitcher, TuneIn, iTunes, SoundCloud, RSS. In this episode of the O’Reilly Data Show, I spoke with Fang Yu, co-founder and CTO of DataVisor. We discussed her days as a researcher at Microsoft, the application of data science and distributed computing to security, and hiring and training data scientists and engineers for the security domain. DataVisor is a startup that uses data science and big data to detect fraud and malicious users across many different application domains in the U.S. and China ... Read More
Metadata services can lead to performance and organizational improvements

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science: Stitcher, TuneIn, iTunes, SoundCloud, RSS. In this episode of the O’Reilly Data Show, I spoke with one of the most popular speakers at Strata+Hadoop World: Joe Hellerstein, professor of Computer Science at UC Berkeley and co-founder/CSO of Trifacta. We talked about his past and current academic research (which spans HCI, databases, and systems), data wrangling, large-scale distributed systems, and his recent work on metadata services. Data wrangling and preparation The most interactive tasks that people do with data are essentially data ... Read More
Building a business that combines human experts and data science

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science. In this episode of the O’Reilly Data Show, I spoke with Eric Colson, chief algorithms officer at Stitch Fix, and former VP of data science and engineering at Netflix. We talked about building and deploying mission-critical, human-in-the-loop systems for consumer Internet companies. Knowing that many companies are grappling with incorporating data science, I also asked Colson to share his experiences building, managing, and nurturing, large data science teams at both Netflix and Stitch Fix. Augmented systems: “Active learning,” “human-in-the-loop,” and “human ... Read More
SQL Polybase to the test
Feed: Henk's tech blog. Author: Henk. Introduction The Microsoft Analytics Platform System (APS) comes with a very powerful feature that’s called Polybase. Polybase has been introduced over 2.5 years ago and extended ever since to integrate the world of structured and unstructured data, either on-premise as well in the Microsoft Azure cloud. The concept is simple: within an APS database you create an ‘External’ table that points to data located in a hadoop hdfs file system or in Windows Azure Blob Storage enabling hybrid data access. It allows you to seamlessly import, export & access data even with a small ... Read More
Columbia data science course, week 1: what is data science?

I’m attending Rachel Schutt’s Columbia University Data Science course on Wednesdays this semester and I’m planning to blog the class. Here’s what happened yesterday at the first meeting. Syllabus Rachel started by going through the syllabus. Here were her main points: The prerequisites for this class are: linear algebra, basic statistics, and some programming. The goals of this class are: to learn what data scientists do. and to learn to do some of those things. Rachel will teach for a couple weeks, then we will have guest lectures. The profiles of those speakers vary considerably, as do their backgrounds. Yet they are all ... Read More
Franz Tech Corner – July 2014

Feed: Allegro CL General / Technical Announcements. Author: cnorvell. Franz Tech Corner News July, 2014 In this issue Tech Corner Article: New Universal Date/Time Parser Facility Tech Corner Article: Loop Over Sequence Extension to Loop Macro International Lisp Conference, 2014 - August 15-17, Montreal Video - Gabor Melis' talk at ELS'14 - "Sending Beams into the Parallel Cube Free Webcast Series: Graph vs. Semantic Graph Databases - Selecting the Right Database for Your Next Project Discovering the Social Networks in your Customer Data Gruff v5.3 Now Available AllegroGraph 4.14 Available July 14th YouTube - The Allegro CL and AllegroGraph Channels ... Read More
Recent Comments