Managing the tricky balance between data pooling and data retention with predictive platforms
Feed: Featured Blog Posts - Data Science Central. Author: Jean-Cyril Schütterlé. It’s only when there’s enough representative data from the field they’re applied to, that process automation by Machine Learning technologies can really be harnessed, says Jean-Cyril Schütterlé, VP Product and Data Science at Sidetrade. Likewise, spam detection is most effective when the learning algorithm has been populated with relevant examples. The steering system of a self-driving car is the same story. It won’t function properly until it has learned to recognise other vehicles and road signage via traffic imaging. Similarly, a healthcare diagnostic support tool depends upon medical image matching. And an automatic translation tool ... Read More
Apache Flink: The Next Distributed Data Processing Revolution?

Feed: Featured Blog Posts - Data Science Central. Author: Kevin Jacobs. By Kevin Jacobs, Data Blogger. Disclaimer: The results are valid only in the case when network attached storage is used in the computing cluster. The amount of data is growing significantly over the past few years. It is not feasible for only one machine to process large amounts of data. Therefore, the need of distributed data processing frameworks is growing. It all started back in 2011 when the first version of Apache Hadoop was released (version 1.0.0). The Hadoop framework is capable of storing a large amount of data on a ... Read More
Why embracing disruptive technology = increased £
Feed: Featured Blog Posts - Data Science Central. Author: Fabiola Pinheiro. Having been in the tech sector for many years and starting in a technical role I have always had an affinity for the technical side of the industry. I was a programmer in a variety of languages from Cobol, Pascal and C through REXX, Basic and Z80a. I have worked for the past 12 years in the Cloud sector and circled around, presented on and been close to the edge of a wide range of new emerging technologies from Cloud, Big Data, IOT, AI and Drones.I present often at ... Read More
Top IT Training Trends to Watch Out For in 2017

Feed: Featured Blog Posts - Data Science Central. Author: Venkatesan M. At myTectra we have been continuously monitoring the Information Technologies (IT) courses trends year on year.Personally, I’m amazed at the technology we have available to us and how frequently new technology adapted in the IT industry. There’s always something new on the horizon, and we can’t help but wait and wonder what technological marvels are coming next. The way I see it, there are seven major tech trends we’re in store for in 2017. If you’re eyeing a IT job being a freshers or an experienced ... Read More
Are You Ready For IoT – Internet of Things

Feed: Featured Blog Posts - Data Science Central. Author: Venkatesan M. Come 2020 and millions or even billions of smart electronic devices, linked by the Internet, would interact with each other independent of human intervention. This network of interacting electronic devices is named as the Internet of Things (IoT). Looking at it from our times (2013), one could expect the IoT to consist of PCs, tablet computers, digital cameras, e-Book readers, mobile phones, robots, private and public computer networks and whatever new smart electronic devices that would be developed between now and 2020. What would that mean to ... Read More
Data Cleaning and Wrangling With R

Feed: Featured Blog Posts - Data Science Central. Author: Michael Grogan. One of the big issues when it comes to working with data in any context is the issue of data cleaning and merging of datasets, since it is often the case that you will find yourself having to collate data across multiple files, and will need to rely on R to carry out functions that you would normally carry out using commands like VLOOKUP in Excel. The 10 tips I give below for data manipulation in R are not exhaustive - there are a myriad of ways in which ... Read More
Me, Myself and Digital Twins

Feed: Featured Blog Posts - Data Science Central. Author: Bill Schmarzo. It’s hard to get into the world of the Internet of Things (IOT) without eventually talking about Digital Twins. I was first exposed to the concept of Digital Twins when working with GE. Great concept. But are Digital Twins only relevant to physical machines such as wind turbines, jet engines, and locomotives? What can we learn about the concept of digital twins that we can apply more broadly – to other physical entities (like contracts and agreements) and even humans?A Digital Twin is a digital representation of an industrial ... Read More
Ph.D. Interns at Cloudera: Bringing Big Data Back to School
Feed: Cloudera Engineering Blog » Hadoop. Author: Justin Kestelyn. The following is a series of stories from people who in the recent past worked as Engineering Interns at Cloudera. These experiences concretely illustrate how collaboration between commercial companies like Cloudera and academia, such as in the form of these internships, helps promote big data research at universities. (These experiences were previously published in the ACM student journal, XRDS.) Yanpei Chen (Intern 2011) I Interned with Cloudera during my last summer of grad school. My dissertation was on “Workload Driven Design and Evaluation of Large-Scale Data-Centric Systems”, and I already had collaborations ... Read More
Understanding MapReduce via Boggle, Part 2: Performance Optimization
Feed: Cloudera Engineering Blog » Hadoop. Author: Jesse Anderson. In Part 1 of this series, you learned about MapReduce’s ability to process graphs via the example of Boggle*. The project’s full source code can be found on my GitHub account. The example comprised a 4×4 matrix of letters, which doesn’t come close to the number of relationships in a large graph. To calculate the number of possible combinations, we turned off the Bloom Filter with “-D bloom=false“. This enters a brute-force mode where all possible combinations in the graph are traversed. In a 4×4 or 16-letter matrix, there are 11,686,456 combinations, and a ... Read More
Webinar: Introduction to Hadoop Developer Training (Jan. 31)
Feed: Cloudera Engineering Blog » Hadoop. Author: Ryan Goldman (@ClouderaU). Are you new to Apache Hadoop and need to start processing data fast and effectively? Have you been playing with CDH and are ready to move on to development supporting a technical or business use case? Are you prepared to unlock the full potential of all your data by building and deploying powerful Hadoop-based applications? If you’re wondering whether Cloudera’s Developer Training for Apache Hadoop is right for you and your team, sign up for this webinar now! In this live session on Thurs., Jan. 31, at 11am PT, you ... Read More
Recent Comments