- Home
- Tag: partitions
Posts tagged partitions
Tag: partitions
Processing satellite imagery with serverless architecture

Feed: AWS Compute Blog. Author: James Beswick. This post was written by Justin Downes, Machine Learning Consultant. The amount of satellite imagery publicly available is growing and images from satellites tend to be large. Architectures for processing those images for machine learning must scale to meet this demand. Since many machine learning models need smaller images of a fixed size to make predictions, these images are broken into smaller sections in a process known as chipping. This post explains a serverless approach to chipping images and sending the results to an inference engine for predictions. These predictions use the smaller ... Read More
Amazon EC2 now allows you to copy Amazon Machine Images across AWS GovCloud, AWS China and other AWS Regions
Feed: Recent Announcements. You can now quickly and conveniently copy Amazon Machine Images (AMIs) to AWS GovCloud (US) Regions, AWS China Regions and other AWS regions (also known as partitions) to ensure that your AMIs are available and consistent globally. Previously, to copy AMIs across these AWS regions, you had to rebuild the AMI in each of them. These partitions enabled data isolation but often made this copy process complex, time-consuming and expensive. With this feature, you no longer need to maintain complex processes to duplicate AMIs across partitions. This feature provides a packaged format that allows AMIs of size ... Read More
Journey to Adopt Cloud-Native Architecture Series: #2 – Maximizing System Throughput

Feed: AWS Architecture Blog. In the last blog, Preparing your Applications for Hypergrowth, we talked about hypergrowth and the technical challenges it presents to companies. As a reminder, we presented an example ecommerce company running a monolithic application on Elastic Compute Cloud (Amazon EC2). This application connects with Amazon Relational Database Service (Amazon RDS). The company recently experienced a hypergrowth event where user traffic grew exponentially (10 times) within a few days. During this event, we observed degraded performance at peak times. In this blog, we talk about improving performance by maximizing system throughput through incremental improvements to application and ... Read More
Easily ingest and analyze Google Analytics data with Upsolver and Amazon AppFlow

Feed: AWS Big Data Blog. This post is co-written by Mei Long at Upsolver. Software as a service (SaaS) based applications are in demand today, and customers have growing need for adopting many of them in their use cases. As adoption grows, extracting data within these various SaaS applications and running analytics across them gets complicated. Although there are several common use cases, in this post, we focus on a solution for easily ingesting, transforming, and analyzing Google Analytics data using Amazon AppFlow and Upsolver. We walk you through the architecture and detailed steps to ingest data from Google Analytics ... Read More
How 1Strategy simplified their spreadsheet ETL process using AWS Glue DataBrew

Feed: AWS Big Data Blog. This is a guest blog post by Pat Reilly and Gary Houk at 1Strategy. In their own words, “1Strategy is an APN Premier Consulting Partner focusing exclusively on AWS solutions. 1Strategy consultants help businesses architect, migrate, and optimize their workloads on AWS, creating scalable, cost-effective, secure, and reliable solutions. 1Strategy holds the AWS DevOps, Migration, Data & Analytics, and Machine Learning Competencies, and is a member of the AWS Well-Architected and the AWS Public Sector partner programs.” Accurately reporting hours billed to each customer is critical to 1Strategy’s business operations. Each consultant is responsible for keeping records up to date. To promote ... Read More
Automate dynamic mapping and renaming of column names in data files using AWS Glue: Part 2

Feed: AWS Big Data Blog. In Part 1 of this two-part post, we looked at how we can create an AWS Glue ETL job that is agnostic enough to rename columns of a data file by mapping to column names of another file. The solution focused on using a single file that was populated in the AWS Glue Data Catalog by an AWS Glue crawler. However, for enterprise solutions, ETL developers may be required to process hundreds of files of varying schemas, even files that they might not have seen previously. Manually crawling and cataloging these tables may not be ... Read More
Table Partitioning In MySQL NDB Cluster And What’s New (Part IV)

Feed: Planet MySQL; Author: Saroj Tripathy; Whats new in NDB Cluster 8.0 version (8.0.23)With new configuration variables introduced in NDB cluster version 8.0.23, user now have more control in table partitioning. Below are the new config variables that can influence the table partitioning scheme:PartitionsPerNodeClassicFragmentationPartitionsPerNode:In earlier cluster versions, the default number of table partitions is based on the number of LDM threads running on a node multiplied by the number of data nodes in the cluster. User can not set any random values to MaxNoOfExecThreads (#LDM) rather the value should be less than or equal to NoOfFragmentLogParts. With cluster version 8.0.23, ... Read More
Emil Shkolnik: Is Greenplum Database “just a big sharded PostgreSQL”?

Feed: Planet PostgreSQL. 29 Mar Is Greenplum Database “just a big sharded PostgreSQL”? Post Views: 641IntroductionWhat is Greenplum Database? This is on of PostgreSQL forks optimized for OLAP and analytics workloads. In my opinion the second life of GreenplumDB began in 2015 year. In this year Greenplum became the open source project. The current 6 version based on PostgreSQL 9.4, and the Greenplum Community is actively developing the 7 version, which should be compatible with PostgreSQL 13! So, this is really cool! But what prompted me to write this article? The fact is that sometimes we are confronted with the opinion ... Read More
Simplify data integration pipeline development using AWS Glue custom blueprints

Feed: AWS Big Data Blog. Organizations spend significant time developing and maintaining data integration pipelines that hydrate data warehouses, data lakes, and lake houses. As data volume increases, data engineering teams struggle to keep up with new requests from business teams. Although these requests may come from different teams, they’re often similar, such as ingesting raw data from a source system into a data lake, partitioning data based on a certain key, write data from data lakes to a relational database, or assigning default values for empty attributes. To keep up with these requests, data engineers modify pipelines in a ... Read More
Migrate terabytes of data quickly from Google Cloud to Amazon S3 with AWS Glue Connector for Google BigQuery

Feed: AWS Big Data Blog. The cloud is often seen as advantageous for data lakes because of better security, faster time to deployment, better availability, more frequent feature and functionality updates, more elasticity, more geographic coverage, and costs linked to actual utilization. However, recent studies from Gartner and Harvard Business Review show multi-cloud and intercloud architectures are something leaders need to be prepared for as data management, governance, and integration become more complex. To make sure your data scientist has access to the right data to build their analytics processes, no matter where the data is stored, it’s imperative that ... Read More
Recent Comments