Category: Cloudera
Use Your Favorite Editor in Cloudera Data Science Workbench 1.6

Feed: Cloudera Engineering Blog. Author: Bethann Noble. Since the launch of Cloudera Data Science Workbench (CDSW) in 2017 we’ve focused on accelerating enterprise data science from research to production. We’re helping hundreds of customers like IQVIA and Deutsche Telekom build their own AI factories by enabling large data science teams with secure, self-service access to the business data, compute resources, and open-source tools and libraries they need to innovate and impact business faster. Improving data science team productivity with a joyful user experience remains a key focus in our mission to empower customers to industrialize machine learning and AI, and ... Read More
Apache Phoenix for CDH

Feed: CDH – Cloudera Engineering Blog. Author: Krishna Maheshwari. Apache Phoenix for CDH: Best New Feature for DBMS Cloudera is adopting and will be supporting Apache Phoenix for CDH while it integrates it for its Cloudera Data Platform on a go-forward basis. Cloudera’s CDH releases have included Apache HBase which provides a resilient, NoSQL DBMS for customers operational applications that want to leverage the power of big-data. These applications have grown into mission important and mission critical applications that drive top-line revenue and bottom-line profitability. These applications include customer facing applications, ecommerce platforms, risk & fraud detection used behind the ... Read More
How-to: Automate the Systems Security Services Daemon Installation and Troubleshoot it with Ansible – Part 2

Feed: Cloudera Engineering Blog. Author: Gabor Roczei. Background We summarized the technical details about the Systems Security Services Daemon’s configuration and installation in the previous blog post: Best Practices Guide for Systems Security Services Daemon Configuration and Installation (Part 1). Manual installation, configuration, and troubleshooting can be exceptionally time consuming and run the risk of inconsistencies because work needs be replicated individually on each host. This leaves us with one final question: How can these tasks be automated on all hosts? One possible solution is Ansible [1]. It is a widely-used and accepted automation tool, which is part of the ... Read More
YuniKorn: a universal resource scheduler

Feed: Hadoop – Cloudera Engineering Blog. Author: Wangda Tan. Hello world, it’s been a while! We are super excited today to announce the open-sourcing of one of the exciting new projects we’ve been working behind the scenes at the intersection of big-data and computation platforms – YuniKorn! Yunikorn is a new standalone universal resource-scheduler responsible for allocating/managing resources for big-data workloads including batch jobs and long-running services. Let’s dive right in! Introduction YuniKorn is a light-weight, universal resource scheduler for container orchestrator systems. It is created to achieve fine-grained resource sharing for various workloads efficiently on large scale, multi-tenant environments ... Read More
Diagnostic Data Processing on Cloudera Altus

Feed: Cloud – Cloudera Engineering Blog. Author: Shelby Khan. Fig 1 – Architecture Introduction Many of Cloudera’s customers set up Cloudera Manager to collect their clusters’ diagnostic data on a regular schedule and automatically send that data to Cloudera. Cloudera analyzes this data, identifies potential problems in the customer’s environment, and alerts customers, requiring fewer back-and-forths with our customers when they file a support case and provides Cloudera with critical information to improve future versions of all of Cloudera’s software. If Cloudera discovers a serious issue, Cloudera searches this diagnostic data and proactively notifies Cloudera customers who might encounter problems ... Read More
Best Practices Guide for Systems Security Services Daemon Configuration and Installation – Part 1

Feed: Cloudera Engineering Blog. Author: Gabor Roczei. Background Authentication is a basic security requirement for any computing environment. In simple terms, users and services must prove their identity (authenticate) to the system before they can use system features. Kerberos provides strong authentication which is used in the exchange between requesting user or process and service during authentication. When a user authenticates to a particular Hadoop component, the user’s Kerberos principal is presented. The principal is presented in the form user@REALM. The Kerberos principal is mapped [1] to a short name after authentication. For example: user@EXAMPLE.COM --> user user@EXAMPLE.COM --> user ... Read More
Cloudera Fast Forward Labs Quarterly Updates – July 2019
Feed: Cloudera Engineering Blog. Author: Bethann Noble. Cloudera Fast Forward Labs is an applied machine learning research and consulting services group within Cloudera, which helps enterprises accelerate data value creation through the adoption of emerging ML techniques, cutting-edge technical architectures and industry leading ML best practices. We focus on expert knowledge transfer and skills development, empowering organizations to continually evolve, differentiate themselves and ultimately own the future of their business by leveraging open technologies and data. Enabling ethical and responsible ML outcomes to our customers, at scale, is our highest priority. We like to think of ourselves as your data ... Read More
Cloudera at ACM SIGMOD/PODS 2019

Feed: Cloudera Engineering Blog. Author: Jesus Camacho Rodriguez. The annual ACM SIGMOD/PODS Conference is a leading international forum for database researchers, practitioners, developers, and users to explore cutting-edge ideas and results, and to exchange techniques, tools, and experiences. This year ACM SIGMOD/PODS will be held in Amsterdam, The Netherlands on June 30th – July 5th, 2019, and Cloudera will be present in the conference, contributing to and learning from the broader research community. Last year, Apache Hive was recognized with the SIGMOD Software Systems Award “for developing seminal software systems that served to bring relational-style declarative programming to the Hadoop ... Read More
Putting Machine Learning Models into Production

Feed: Cloudera Engineering Blog. Author: Jeff Fletcher. Once the data science is done (and you know where your data comes from, what it looks like, and what it can predict) comes the next big step: you now have to put your model into production and make it useful for the rest of the business. This is the start of the model operations life cycle. The key focus areas (detailed in the diagram below) are usually managed by machine learning engineers after the data scientists have done their work. ML Engineering includes (but isn’t necessarily limited to): the data pipeline (the ... Read More
HDFS Erasure Coding in Production

Feed: CDH – Cloudera Engineering Blog. Author: Shelby Khan. HDFS erasure coding (EC), a major feature delivered in Apache Hadoop 3.0, is also available in CDH 6.1 for use in certain applications like Spark, Hive, and MapReduce. The development of EC has been a long collaborative effort across the wider Hadoop community. Including EC with CDH 6.1 helps customers adopt this new feature by adding Cloudera’s first-class enterprise support. While previous versions of HDFS achieved fault tolerance by replicating multiple copies of data (similar to RAID1 on traditional storage arrays), EC in HDFS significantly reduces storage overhead while achieving similar ... Read More
Recent Comments