- Home
- Tag: BigInsights
Posts tagged BigInsights
Tag: BigInsights
Troubleshooting Map Reduce Errors
Feed: Hadoop Dev. Author: Nailah Bissoon. The Big SQL LOAD HADOOP command uses a Map Reduce framework, for database administrators who are not so familiar with combing through map reduce logs, this blog hopes to shed some light on how to troubleshoot such issues. In this blog I will focus on one error I experienced on my environment and the steps I took to debug the issue. Most likely you won’t hit this exact issue but you can use these steps to figure out the root cause of your problem. Example Error load hadoop using file url '/user/bigsql/hadoopds1000g/store_sales' with source ... Read More
Big SQL Automatic Catalog Synchronization (Part 3 – Problem Determination)
Feed: Hadoop Dev. Author: Shay Roe. IntroductionThis blog is part of a series outlining all you need to know to start working with Big SQL’s Automatic Catalog Synchronization (Auto-Sync). Part 1 of the series provided an introduction to Auto-Sync, discussing its significance, the problem it addresses and how it can be enabled/disabled in Ambari. Part 2 presented a high-level view of the Auto-Sync architecture and provided details on the feature’s main configuration parameters. In this third and final blog of the series, we’ll take a look at problem determination and explain what you can do if experiencing issues related to ... Read More
Announcing Big SQL 5.0.1

Feed: Hadoop Dev. Author: JessicaLeeYau. Announcing the immediate availability of Big SQL v5.0.1 – maintenance release Big SQL, a SQL engine on Hadoop, has been making strides with the fast-evolving open source ecosystem. The core capabilities of Big SQL focusses on federation, SQL compatibility, scalability, performance, and of course enterprise security, making it a desirable query engine to seek insights from disparate data sources including Hadoop. We announced Big SQL v5.0 in July, 2017 following our partnership announcement with Hortonworks. Following that, we now announce the release of Big SQL v5.0.1 which focuses on consumability (support for – Zeppelin notebook, ... Read More
Big SQL Automatic Catalog Synchronization (Part 1 – Introduction)
Feed: Hadoop Dev. Author: Shay Roe. IntroductionAutomatic synchronization of the Hive metastore and Big SQL catalog was introduced in Big SQL 4.2 and is a significant enhancement to how Big SQL manages its catalog tables. With this feature enabled, Big SQL will automatically synchronize Hive metastore changes into the Big SQL catalog, so that, any Hive DDL operations (CREATE, ALTER, DROP), will be automatically reflected in the Big SQL catalog. If a new table is created in Hive, for example, that table will automatically be available in Big SQL. This blog is the first in a three-part series that will ... Read More
Announcing IBM Big Replicate 2.1.1

Feed: Hadoop Dev. Author: Vinayak Agrawal. Today (Aug 11, 2017) we are announcing the GA of IBM Big Replicate 2.1.1 ... Read More
Microsoft Active Directory (AD) Integration for Linux on a Hadoop Cluster
Feed: Hadoop Dev. Author: Linda.Liu. Introduction In a previous post, it demonstrated how to configure LDAP integration with IBM Open Platform on a BigInsights Cluster. Here is the link.In this post, it concentrates on the missing content from the previous post for the Microsoft Active Directory (AD) integration. Objective This technical document is intended to show viewers step by step instructions on how to setup AD on RedHat and/or CentOS Operating System and the integration on a Hadoop Cluster. Version Tested RedHat v7.x, CentOS v7.x Ambari v2.4.x Lessons Learned Don’t include spaces in a bind id (ex. “Service, EnterpriseLdap”) AD ... Read More
Scheduling a SparkSQL or SparkML job written in Java or Scala on YARN with Oozie

Feed: Hadoop Dev. Author: DONGYINGJIAO. Apache Oozie is a workflow scheduler that is used to manage Apache Hadoop jobs. Oozie combines multiple jobs sequentially into one logical unit of work as a directed acyclic graph (DAG) of actions. Oozie is reliable, scalable, extensible, and well integrated with the Hadoop stack, with YARN as its architectural center. It provides several types of Hadoop jobs out of the box, such as Java map-reduce, Pig, Hive, Sqoop, SSH, and DistCp, as well as system-specific jobs, such as Java programs and shell scripts. Apache Spark is a fast general purpose cluster computing system. It ... Read More
Starting multiple Titan servers on IOP

Feed: Hadoop Dev. Author: HuiHugoCao. The following directions detail the manual installation of software into IBM Open Platform for Apache Hadoop. These directions, and any binaries that may be provided as part of this article (either hosted by IBM or otherwise), are provided for convenience and make no guarantees as to stability, performance, or functionality of the software being installed. Product support for this software will not be provided (including upgrade support for either IOP or the software described). Questions or issues encountered should be discussed on the BigInsights StackOverflow forum or the appropriate Apache Software Foundation mailing list for ... Read More
Configure Logical Big SQL Workers from Ambari to boost performance
Feed: Hadoop Dev. Author: Nailah Bissoon. Adding logical Big SQL workers should be done in non-peak hours when there are no Big SQL applications running on the system. Big SQL maybe restarted one or more times so that the underlying memory can be shuffled correctly within each compute node during logical worker operations. An introduction to this feature is presented in Logical Big SQL Workers Boost Performance. Important: Add and reconfigure logical Big SQL workers when all compute nodes (or Big SQL Workers from Ambari) are active. In ‘Add Services Wizard’->‘Customize Services’-> ‘Advanced bigsql-env’ drop down menu there are 2 ... Read More
When Data Met Science and Anything Became Possible

Feed: Hortonworks Blog – Hortonworks. Author: Frank Mong. Big Data was challenged from the day it was coined…it’s a problem looking for an answer. Likewise, data science presents an amazing opportunity to answer our biggest questions, but its fate has been tied to the breadth, quantity, freshness and speed of the data it relies upon. For years, these competing issues have been left for organizations to solve on their own. No more. Hortonworks executed a major coup in the battle for big data dominance today with IBM selecting the Hortonworks Data Platform (HDP) distribution for their existing and future customers ... Read More
Recent Comments