- Home
- Tag: partition
Posts tagged partition
Tag: partition
How to partition your geospatial data lake for analysis with Amazon Redshift

Feed: AWS Public Sector Blog. Author: Jeff DeMuth. Data lakes are becoming increasingly common in many different workloads and geospatial is no exception. A data lake allows you to store all your structured and unstructured data at any scale, as-is, which can help you break down data silos between data and data types and incorporate multiple types of analytics features, like machine learning, to get the most insight from your data. But while the concept and applications of data lakes have been around for a while, data lakes have seen mixed adoption in the geospatial world. Geospatial analysts were quick ... Read More
AWS GovCloud (US) or standard? Selecting the right AWS partition

Feed: AWS Public Sector Blog. Author: Christopher Smith. There are many options to consider when deploying workloads onto Amazon Web Services (AWS). Public sector organizations must consider how to meet their compliance, security, cost, and availability requirements in the cloud. AWS has over 200 fully featured services offered across 26 Regions over six different continents. With so many choices, it’s understandable for public sector organizations and businesses to have questions about what AWS features are right for their missions’ needs. This blog post explores the options US public sector customers and their business partners should evaluate when selecting an AWS ... Read More
Laurenz Albe: Automatic partition creation in PostgreSQL

Feed: Planet PostgreSQL. © Laurenz Albe 2022Table partitioning is one of the best-liked features out of the more recent PostgreSQL developments. However, there is no support for automatic partition creation yet. This article shows what you can do to remedy that.Use cases for automatic partition creationThere are essentially two use cases:Create partitions triggered by time, for example for the next month at the end of the current month.Create partitions on demand if a row is inserted that does not fit in any existing partition.I will call the first option time-triggered partitioning and the latter on-demand partitioning.Automatic partition creation for time-triggered ... Read More
Amazon Athena accelerates queries with AWS Glue Data Catalog partition indexes
Feed: Recent Announcements. Today, we're excited to announce that Amazon Athena supports AWS Glue Data Catalog partition indexes to optimize query planning and reduce query runtime. When you query a table containing a large number of partitions, Athena retrieves the available partitions from the AWS Glue Data Catalog and determines which are required by your query. As new partitions are added, the time needed to retrieve the partitions increases and can cause query runtime to increase. AWS Glue Data Catalog allows customers to create partition indexes which reduce the time required to retrieve and filter partition metadata on tables with tens ... Read More
Improve Amazon Athena query performance using AWS Glue Data Catalog partition indexes

Feed: AWS Big Data Blog. The AWS Glue Data Catalog provides partition indexes to accelerate queries on highly partitioned tables. In the post Improve query performance using AWS Glue partition indexes, we demonstrated how partition indexes reduce the time it takes to fetch partition information during the planning phase of queries run on Amazon EMR, Amazon Redshift Spectrum, and AWS Glue extract, transform, and load (ETL) jobs. We’re pleased to announce Amazon Athena support for AWS Glue Data Catalog partition indexes. You can use the same indexes configured for Amazon EMR, Redshift Spectrum, and AWS Glue ETL jobs with Athena ... Read More
The partition problem: An optimization approach

Feed: SAS Blogs. Author: Rick Wicklin.
I previously wrote about one way to solve the partition problem in SAS. In the partition problem,
you divide (or partition) a set of N items into two groups of size k and N-k
such that the sum of the items' weights is the same in each group.
For example, if the weights of six items are X = {0.4, 1.0, 1.2, 1.7, 2.6, 2.7} and k=3,
you can put the weights {0.4, 1.7, 2.7} in one group and the weights {1.0, 1.2, 2.6} in the other group.
Both groups contain 4.8 ... Read More
The partition problem
Feed: SAS Blogs. Author: Rick Wicklin. Photograph by
Poussin Jean, license
CC BY-SA 3.0, via Wikimedia Commons
The partition problem has many variations, but recently I encountered it as an interactive puzzle on a computer. (Try a similar game yourself!) The player is presented with an old-fashioned pan-balance scale and a set of objects of different weights. The challenge is to divide (or partition) the objects into two group. You put one group of weights on one side of the scale and the remaining group on the other side so that the scale balances.
Here's a canonical ... Read More
Improve query performance using AWS Glue partition indexes

Feed: AWS Big Data Blog. While creating data lakes on the cloud, the data catalog is crucial to centralize metadata and make the data visible, searchable, and queryable for users. With the recent exponential growth of data volume, it becomes much more important to optimize data layout and maintain the metadata on cloud storage to keep the value of data lakes. Partitioning has emerged as an important technique for optimizing data layout so that the data can be queried efficiently by a variety of analytic engines. Data is organized in a hierarchical directory structure based on the distinct values of ... Read More
Speed up your Amazon Athena queries using partition projection
Feed: AWS Big Data Blog. This post is co-written with Steven Wasserman of Vertex, Inc. Amazon Athena is an interactive query service that makes it easy to analyze data stored in Amazon Simple Storage Service (Amazon S3) using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. Athena is easy to use—simply point to your data in Amazon S3, define the schema, and start querying using standard SQL. Athena has added support for partition projection, a new functionality that you can use to speed up query processing ... Read More
Naive Bayes Classification in R
Feed: R-bloggers. Author: finnstats. [This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Naive Bayes Classification in R, In this tutorial, we are going to discuss the prediction model based on Naive Bayes classification. Naive Bayes is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. The Naive Bayes model is easy to build and particularly useful for very large ... Read More
Recent Comments