- Home
- Tag: cluster
Posts tagged cluster
A logical grouping of container instances that you can place tasks on. Amazon Elasticsearch Service (Amazon ES): A logical grouping of one or more data nodes, optional dedicated master nodes, and storage required to run Amazon Elasticsearch Service (Amazon ES) and operate your Amazon ES domain.
Tag: cluster
Continuous Deployment to Amazon ECS using AWS CodePipeline, AWS CodeBuild, Amazon ECR, and AWS CloudFormation

Feed: AWS Compute Blog. Author: Chris Barclay. Thanks to my colleague John Pignata for a great blog on how to create a continuous deployment pipeline to Amazon ECS.—Delivering new iterations of software at a high velocity is a competitive advantage in today’s business environment. The speed at which organizations can deliver innovations to customers and adapt to changing markets is increasingly a pivotal attribute that can make the difference between success and failure. AWS provides a set of flexible services designed to enable organizations to embrace the combination of cultural philosophies, practices, and tools called DevOps that increases an organization’s ... Read More
MySQL group replication: installation with Docker

Feed: Planet MySQL. Author: Giuseppe Maxia. OverviewMySQL Group Replication was released as GA with MySQL 5.7.17. It is essentially a plugin that, when enabled, allows users to set replication with this new way.There has been some confusion about the stability and usability of this release. Until recently, MySQL Group Replication (MGR) was only available in the Labs, which traditionally denotes a preview or an use-at-your-own-risk feature. Several months ago we saw the release of Group Replication as a Docker image, which allowed users to deploy a peer-to-peer cluster (every node is a master.) However, about one month after such release, ... Read More
Real World AWS Scalability | AWS Compute Blog

Feed: AWS Compute Blog. Author: Stefano Buliani. This is a guest post from Linda Hedges, Principal SA, High Performance Computing.—–One question we often hear is, “How well will my application scale on AWS?” For high performance computing (HPC) workloads that cross multiple nodes, the cluster network is at the heart of scalability concerns. AWS uses advanced Ethernet networking technology, which, like all things AWS, is designed for scale, security, high availability, and low cost. This network is exceptional and continues to benefit from Amazon’s rapid pace of development. Again and again, customers find that the most demanding applications run very ... Read More
Postgres Autovacuum is Not the Enemy

Feed: Planet PostgreSQL. It’s a common misconception that high volume read-write workloads in PostgreSQL inevitably causes database inefficiency. We’ve heard of cases where users encounter slowdowns doing only a few hundred writes per second and turn to systems like Dynamo or Cassandra out of frustration. However PostgreSQL can handle these workloads without a problem as long as it is configured correctly. The problems stem from what’s known as “bloat,” a phenomenon of PostgreSQL and other MVCC databases which causes increased space usage and decreased performance. We’ll see how autovacuum, a tool to combat bloat, is typically misunderstood and misconfigured. By ... Read More
Open-sourcing Rocksplicator, a real-time RocksDB data replicator

Feed: Planet MySQL. Author: Pinterest Engineering. Pinterest’s stateful online systems process tens of petabytes of data every day. As we build products and scale billions of Pins to 150 million people, we need new applications that work in a way where computation co-locates with data. That’s why we adopted RocksDB. It’s adaptable, supports basic and advanced database operations with high performance and meets the majority of requirements for building large-scale, production-strength distributed stateful services. Yet two critical pieces were missing for us: real-time data replication and cluster management for RocksDB-based stateful services. To fill this gap, we built a RocksDB ... Read More
Announcing availability of PostgreSQL instance level encryption

Feed: Planet PostgreSQL. There are couple of different ways to implement database encryption – commonly on operating system, filesystem, file or column level, leaving out transport level encryption which is supported since 15 years. Each of those approaches counters a different threat model, and one can easily imagine that in the case of databases, where the systems were originally not designed with encryption in mind, it is not exactly easy to first agree on a certain way of doing things – and then there would be for sure a lot of technical pitfalls on the way. But today, after a ... Read More
Much Kudu About Something

Feed: Planet big data. Author: dereksdata. Apache Kudu is now out in the wild and already making waves. Here’s why you should care.In short, Apache Kuduhttp://kudu.apache.org/is a distributed datastore-as-a-filesystem.Kudu essentially takes over at the Avro/Parquet/ORC layer for what is nowadays called ‘lambda architectures’ (because we need more jargon, yeah?). There is considerable engineering, operations and data munging being performed on a mature Hadoop cluster that can be attributed to working around the limitations of HDFS, which is designed and optimised for analysis of large scale read-only type data. Solutions to support near-real time work such as HBase work around this ... Read More
Context Matters When Text Mining
Feed: Featured Blog Posts - Data Science Central. Author: Dalila Benachenhou. Many times the most followed approach can result in failure. The reason has more to do with thinking that one approach works in all cases. This is specially true in text mining. For instance, a common approach in clustering documents is to create tf-idf matrix for all documents, use SVD or other dimension reduction algorithm and then use a clustering. In most cases, this will work; However, as I will present here, there are instances where this process will not provide the intended result. It will not work because ... Read More
Introducing pg_squeeze: auto-rebuild bloated tables

Feed: Planet PostgreSQL. One of the few areas where out-of-the-box functionality by PostgreSQL is not 100% satisfying is the “bloat problem”. Combating bloat or just trying to ensure that your table data is physically ordered according to some column(s) (a.k.a. clustering) until now required accepting some inconvenient compromises. Extended periods of full table locking (no read or write activities) with built-in VACUUM FULL or CLUSTER commands or involving third party tooling, usually meaning “pg_repack”, were necessary. “pg_repack” offers good benefits like a lot smaller full-lock time, ordering by specific columns, but needs a bit of fiddling around – installing the ... Read More
7 Cases Where Big Data Isn’t Better
Feed: Featured Blog Posts - Data Science Central. Author: William Vorhies. Summary: It’s become almost part of our culture to believe that more data, particularly Big Data quantities of data will result in better models and therefore better business value. The problem is it’s just not always true. Here are 7 cases that make the point. Following the literature and the technology you would think there is universal agreement that more data means better models. With the explosion of in-memory analytics, Big Data quantities of data can now realistically be processed to produce a variety of different predictive models and ... Read More
Recent Comments