- Home
- Big Data
Category: Big Data
Accelerating Your Data Culture Journey: Customer Best Practices in Data Governance

Feed: Alation. Author: Michelle Cloutier. June 30, 2022 — Data Governance – The Pillar No One Wants to Talk About This blog builds on a previous blog that explored our customers’ best practices in building the data culture pillars of Search & Discovery and Data Literacy. Now, we’ll dive into what many consider the most difficult, but most critical, pillar of data culture: Data Governance. Every organization that takes data seriously must embrace data governance. However, for many organizations, the term data governance has earned a negative reputation that must be overcome. Done right, data governance simply ensures that the ... Read More
Scaling for Complexity – Architecting for Performant Embedded Devices at the Edge – Part 2
Feed: The Internet of Things on AWS – Official Blog. Author: Channa Samynathan. The following is a survey paper, published and presented to the Academic Congress of Embedded World 2022 at Nuremberg, Germany on June 21st 2022. Part 1 – Scaling for Complexity – Architecting for Performant Embedded Devices at the Edge – Part 1 Provisioning LayerThe provisioning layer of your IoT workloads consists of the Public Key Infrastructure (PKI) used to create unique device identities and the application workflow that provides configuration data to the device. The provisioning layer is also involved with ongoing maintenance and eventual decommissioning of devices ... Read More
Scaling for Complexity – Architecting for Performant Embedded Devices at the Edge – Part 1
Feed: The Internet of Things on AWS – Official Blog. Author: Channa Samynathan. The following is a survey paper, published and presented to the Academic Congress of Embedded World 2022 at Nuremberg, Germany on June 21st 2022. Abstract Embedded edge devices with multi-sensor data sources are proliferating at an accelerating rate. Devices must be designed, manufactured, installed, connected, and controlled through seven distinct logical layers to securely connect and interact with complementary cloud-based and edge-based components to deliver business value. These Internet of Things (IoT) applications must gather, process, analyze, and act on data generated by the connected devices. In ... Read More
Is Big Data Useful to Anyone? Has it Got Intrinsic Value?
Feed: Actian. Author: Traci Curran. Embarking on a big data project can be daunting. If you believe that you need insight into your business activity, that you have to collect a large “big data” stash, and that you need to analyze it all to generate worthwhile insight, you will likely feel swamped by the task ahead to get any meaningful value from this project.Big data, especially open data, has huge potential. Many businesses hold the same data – that is, data gathered from open-data sites – and combine it with their own data to find something unique. Being able to ... Read More
Apache Iceberg: An Introduction from Rackspace on Running the New Open Table Format on AWS

Feed: AWS Partner Network (APN) Blog. Author: Chaitanya Varma Mudundi. By Chaitanya Varma Mudundi, Professional Services Big Data Engineer – Rackspace Rackspace Data-driven decision making is accelerating and defining the way organizations work. With this transformation, there has been a rapid adoption of data lakes across the industry. To fuel this transformation, data lakes have evolved over the last decade. Apache Hive is a standard for data lakes, but while Apache Hive can solve some of the issues with the processing of data, it falls short at a few other objectives for next-generation data processing. In this post, I will ... Read More
Migrate from Snowflake to Amazon Redshift using AWS Glue Python shell

Feed: AWS Big Data Blog. As the most widely used cloud data warehouse, Amazon Redshift makes it simple and cost-effective to analyze your data using standard SQL and your existing ETL (extract, transform, and load), business intelligence (BI), and reporting tools. Tens of thousands of customers use Amazon Redshift to analyze exabytes of data per day and power analytics workloads such as BI, predictive analytics, and real-time streaming analytics without having to manage the data warehouse infrastructure. It natively integrates with other AWS services, facilitating the process of building enterprise-grade analytics applications in a manner that is not only cost-effective, ... Read More
Disaster recovery considerations with Amazon EMR on Amazon EC2 for Spark workloads

Feed: AWS Big Data Blog. Amazon EMR is a cloud big data platform for running large-scale distributed data processing jobs, interactive SQL queries, and machine learning (ML) applications using open-source analytics frameworks such as Apache Spark, Apache Hive, and Presto. Amazon EMR launches all nodes for a given cluster in the same Amazon Elastic Compute Cloud (Amazon EC2) Availability Zone to improve performance. During an Availability Zone failure or due to any unexpected interruption, Amazon EMR may not be accessible, and we need a disaster recovery (DR) strategy to mitigate this problem. Part of architecting a resilient, highly available Amazon ... Read More
Fraud Detection with Cloudera Stream Processing Part 1

Feed: Cloudera Blog. Author: André Araújo. Posted in Technical | June 28, 2022 9 min read In a previous blog of this series, Turning Streams Into Data Products, we talked about the increased need for reducing the latency between data generation/ingestion and producing analytical results and insights from this data. We discussed how Cloudera Stream Processing (CSP) with Apache Kafka and Apache Flink could be used to process this data in real time and at scale. In this blog we will show a real example of how that is done, looking at how we can use CSP to perform real-time ... Read More
How to Build a Successful Metadata Management Framework

Feed: Alation. Author: Anthony Zumpano. June 28, 2022 — Collecting and using data to make informed decisions is the new foundation for businesses. The key term here is usable: Anyone can be data rich, and collect vast troves of data. The real challenge lies in getting people to access, manage, and search for it appropriately. This is where metadata, or the data about data, comes into play. Having a data catalog is the cornerstone of your data governance strategy, but what supports your data catalog? Your metadata management framework provides the underlying structure that makes your data accessible and manageable ... Read More
Microsoft Cost Management updates – June 2022

Feed: Microsoft Azure Blog. Author: Michael Flanakin. Whether you're a new student, a thriving startup, or the largest enterprise, you have financial constraints, and you need to know what you're spending, where, and how to plan for the future. Nobody wants a surprise when it comes to the bill, and this is where Microsoft Cost Management comes in. We're always looking for ways to learn more about your challenges and how Microsoft Cost Management can help you better understand where you're accruing costs in the cloud, identify and prevent bad spending patterns, and optimize costs to empower you to do ... Read More
Recent Comments