- Tag: Hadoop
Posts tagged Hadoop
Software that enables distributed processing for big data by using clusters and simple programming models. For more information, see http://hadoop.apache.org.
Feed: Neo4j Graph Database Platform. Author: Corin Gainey. Editor’s Note: This presentation was given by Eric Wespi and Eric Spiegelburg at GraphConnect New York in September 2018.Presentation SummaryWe’re going to talk about some project themes that make sense when you’re going about developing a graph project. We’ll dive into the the business problem and discuss the specifics of the business problem.We are going to cover manufacturing quality, how graphs apply to that problem and how this helped us get some business value through performance improvements and new capabilities. We’ll also talk about the evolution of our graph model. We will ... Read More
Feed: Big Data Made Simple. Author: Guest. Modern society is rapidly turning data-centric. Moreover, digitalization has already gained much dominance in all aspects of professionalism. In such a condition, you can expect to get an outstanding professional career in being a big data engineer.Leading MNCs in the country are always eager to hire promising professionals who can work as big data engineers. If you are currently on a break and desiring to kick-start your professional career, big data engineering is the pick for you. As an engineer of big data, you will need to deal with massive chunks of data ... Read More
Feed: AWS Big Data Blog. Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud that offers fast query performance using the same SQL-based tools and business intelligence applications that you use today. Many customers also like to use Amazon Redshift as an extract, transform, and load (ETL) engine to use existing SQL developer skillsets, to quickly migrate pre-existing SQL-based ETL scripts, and—because Amazon Redshift is fully ACID-compliant—as an efficient mechanism to merge change data from source data systems.In this post, I show how to use AWS Step Functions and AWS Glue Python Shell to orchestrate tasks ... Read More
Feed: Planet PostgreSQL. While managing a small team of development resources working on PostgreSQL development, I sometimes get resources in my team that have good development experience but are new to PostgreSQL. I have developed a short set of training instructions in order to get these resources started with PostgreSQL and get them to familiarise themselves with Postgres and its internals. The purpose of this blog is to share these instructions so it can benefit others in a similar situation. The instructions involve going through a lot of documentation, white-papers, online books, it also includes few development exercises that can ... Read More
Feed: Featured Blog Posts - Data Science Central. Author: Vincent Granville. Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week. To subscribe, follow this link. Featured Resources and Technical Contributions Free Book: A Comprehensive Guide to Machine Learning (Berkeley University) How exactly do you determine causation? Significance Level vs Confidence level vs Confidence Interval Introduction and New Open-sourced Tool for Tableau Surprising Uses of Synthetic Random Data Sets + Visualizing AWS Cost and Usage with Amazon Athena and QuickSight Basics of Hive and Impala for ... Read More
Operacionalización de la Analítica: ¿cómo lograr que los modelos analíticos realmente apoyen el éxito de los negocios?
Feed: SAS Blogs. Author: Javier Alexander Rengifo. Por Javier Rengifo Gerente de Customer Advisory para SAS Colombia y Ecuador El éxito en el desarrollo e implementación de las iniciativas analíticas empresariales requiere que se tengan propósitos claros, una alineación con los objetivos del negocio, una adecuada captura y calidad de datos, una gestión y mejoramiento continuo de los modelos analíticos desarrollados y no menos importante, la operacionalización o puesta en producción de los modelos, para apalancar el proceso de decisiones de las diferentes áreas de negocio. Aunque la palabra operacionalización no existe formalmente en el diccionario de la lengua española, ... Read More
Feed: Featured Blog Posts - Data Science Central. Author: Stephanie Shen. The evolution of the technologies in Big Data in the last 20 years has presented a history of battles with growing data volume. The challenge of big data has not been solved yet, and the effort will certainly continue, with the data volume continuing to grow in the coming years. The original relational database system (RDBMS) and the associated OLTP (Online Transaction Processing) make it so easy to work with data using SQL in all aspects, as long as the data size is small enough to manage. However, when ... Read More
Feed: IBM Big Data & Analytics Hub - All Content; Author: alex-fleischer; Today, machine learning (ML), artificial intelligence (AI) and decision optimization are not just buzzwords found all over the news. They are urgent requirements for many companies that fear disruption, want to perform pragmatic analysis and make better decisions with their data. Data has been called the next natural resource, like oil. But just as with oil, it must be refined to be valuable, and its end value must exceed the cost of refining it. With data, the value of AI is the cost of investing in collecting, organizing ... Read More
Feed: Radar. Author: Mac Slocum. Pete Warden has spent the last few years laser focused on building a useful TinyML platform, a scheme to combine battery-powered, small, low-powered CPUs and controllers with machine learning algorithms. Why the push? Many of the tasks in the billions of microcontrollers and CPUs included in small, battery-powered devices can be improved or made more useful with machine learning. For example, tasks like voice recognition, compression, sensor data, anomaly detection, and imminent failure alerts can be improved or reasonably added to many devices and scenarios. Building TinyML presents an array of difficult, but solvable hurdles, ... Read More
Feed: SAS Blogs. Author: Kumar Thangamuthu. In part 1 of this post, we looked at setting up Spark jobs from Cloud Analytics Services (CAS) to load and save data to and from Hadoop. Now we are moving on to the next step in the analytic cycle, scoring data in Hadoop and executing SAS code as a Spark job. The Spark scoring jobs execute using SAS In-Database Technologies for Hadoop. The integration of the SAS Embedded Process and Hadoop allows scoring code to run directly on Hadoop. As a result, publishing and scoring of both DS2 and Data step models occur ... Read More