Choosing PostgreSQL Bloom Index Parameters

Feed: Planet PostgreSQL. PostgreSQL 9.6 bloom is an extension contributed by Teodor Sigaev, Alexander Korotkov and Oleg Bartunov which provides a new index type for integer and text columns. There is some coverage on how to use it and how it works, which is good because documentation is scarse. This blog entry describes briefly how the index works and discusses how to choose the parameters associated to this index, namely the signature size (default 80, allocated in chunks of 16 bits) and the number of bits for each indexed column (default 2), with a rule of thumb approach based on ... Read More
The Value of R’s Open Source Ecosystem

Feed: Planet big data. Author: David Smith. I was thrilled to be invited to speak at the Monktoberfest conference, held this past October in Portland, Maine. Not only have I been a great fan of the analysis from the Redmonk team for many years, I'd heard that it was one of the most interesting and diverse tech conferences around. (Also, beer.) And indeed, it turned out to be all of those things, and one of the most memorable and interesting conferences I've ever been to. You can find many of the talks on Redmonk's Youtube channel. There were so many great talks it's ... Read More
Do you have the ability to tell stories with your data?

Feed: Planet big data. Author: Corinium. "The ability to take data…understand it, process it, extract value from it, visualize it, and communicate it, that’s going to be a hugely important skill in the next decades." - Dr. Hal R.Varian, Chief Economist, Google, 2009 In today’s rapidly advancing analytical world, I think it is fair to say that Dr. Varian was and still is very right.Embedding an analytics strategy into an organisation doesn’t fail because we don’t know what to do or how to execute it. It often fails to resonate because we fail to create a story that is meaningful in ... Read More
Android 7.0 for Developers: New Features, Performance Upgrades & Other Stuff You Won’t Care About
Feed: Featured Posts - Hadoop360. Author: Irina Papuc. Google formally announced Android 7.0 Nougat a few weeks ago, but as usual, you’ll have to wait. Most users won’t get their over-the-air (OTA) updates until early next year. Many others will receive them a week from never, as some device vendors simply don’t bother.This may sound like a snarky pet peeve of mine, but Android fragmentation is no joke; it’s been a serious headache for users and developers for years. Android 7.0 won’t solve that issue, which is a shame because it enables a number of new features and performance improvements ... Read More
Introduction to Apache Spark with Examples and Use Cases
Feed: Featured Posts - Hadoop360. Author: Irina Papuc. I first heard of Spark in late 2013 when I became interested in Scala, the language in which Spark is written. Some time later, I did a fun data science project trying to predict survival on the Titanic. This turned out to be a great way to get further introduced to Spark concepts and programming. I highly recommend it for any aspiring Spark developers looking for a place to get started. Today, Spark is being adopted by major players like Amazon, eBay, and Yahoo! Many organizations run Spark on clusters with thousands of nodes. According ... Read More
Digital Transformation in Utilities sector

Feed: Planet big data. Author: Sandeep Raut. It is easy to take for granted the technology we have at our disposal. We flick a switch and the lights go on, we turn on the tap and clean water comes out. We don’t have to worry about gas for cooking. But today the Utilities industry is under pressure to simultaneously reduce costs and improve operational performance. Utilities sector is a bit late in digital innovations than Retail, Banking or Insurance. With energy getting on the digital bandwagon with online customer engagement, smart sensors and better use of analytics, Utilities are now ... Read More
pg_catalog visualized

Feed: Planet PostgreSQL. I couldn’t find any graph showing all the relations between all the pg_catalog tables,so just for fun I wrote a little script to parse the SGML and generate a graph using GraphViz.
#!/usr/bin/perl
use strict;
use warnings;
use File::Slurp qw(slurp);
use Data::Dumper;
open my $fh, "<:encoding(utf8)", './doc/src/sgml/catalogs.sgml' or die "$!";
my $table;
my $column;
my $references;
my $pg_catalog_fk_map = {};
while (my $line = <$fh>) {
chomp $line;
if ($line =~ m!^s+<title><structname>([^<>]+)</> Columns</title>$!) {
$table = $1;
} elsif ($line =~ m!^s+<entry><structfield>([^<>]+)</structfield></entry>$!) {
$column = $1;
} elsif ($line =~ m!^s+<entry><type>(oid|regproc)</type></entry>$!) {
} ... Read More
Objectification of Incidents
Feed: Featured Blog Posts - Data Science Central. Author: Don Philip Faithful. Probably like most people, I tend to recognize data as a stream of values. Notice that I use the term values rather than numbers although in practice I guess that values are usually numerical. A data-logger gathering one type of data would result in data all of a particular type. Perhaps the concept of “big data” surrounds this preconception of data of type except that there are much larger amounts. Consider an element of value in symbolic terms, which I present below: there is an index such as ... Read More
Create a clone database in Oracle Cloud
Feed: Kamran Agayev's Oracle Blog. Posted by Kamran Agayev A. on December 10th, 2016 In this step by step tutorial, we will create a clone database for the development or testing purposes. Using Oracle Database Cloud service you don’t need to configure and run DUPLICATE command of RMAN and create a clone of production database for developers team. All you need is to create a snapshot of your production database and clone it in a few minutes. So first of all, let’s create a new database. Open cloud.oracle.com, login with your credentials and create a new database service. Please check ... Read More
Announcing MapR Ecosystem Pack (MEP) 2.0!

Feed: Big Data Feed. Author: itsing. We’re pleased to announce the general release of the MapR Ecosystem Pack (MEP) version 2.0. This represents the second major release of a MapR Ecosystem Pack since the beginning of this new process of delivering ecosystem upgrades.If you’re new to this process, MapR Ecosystem Packs (MEPs) are a way to deliver ecosystem upgrades decoupled from core upgrades - allowing you to upgrade your tooling independently of your Converged Data Platform.Each MEP contains a subset of the greater supported Hadoop ecosystem certified to work fully and completely with the other components in each release. This ... Read More
Recent Comments