- Home
- Tag: statistics
Posts tagged statistics
Tag: statistics
Motivated Reasoning: What it is and how to avoid it in Data Analysis

Feed: Planet big data. Author: Thomas Maydon. Use versus abuse of statistics can often be characterised by the analytical approach adopted to the problem at hand. In this blog post, which is part of a series on Logical Fallacies to avoid in Data Analysis, I’ll be focusing on defining the motivated reasoning logical fallacy and how to avoid it in data analysis ... Read More
R for Big Data in One Picture
Feed: Featured Blog Posts - Data Science Central. Author: Vincent Granville. This picture originally posted here covers the following topics: Basic stack Integrated platforms Visualization Data formats Large & out-of-memory data Hadoop Glue As backend GPU in-database analytics Parallel Efficiency Packages To zoom in, view picture in the original article, or click on picture. The original article also provides a detailed listing of all the 100+ entities listed in the picture. Anyone interested in creating a clickable link for each of these entities? For instance, entity 1.6 (in the original article) is ggplot2, while 2.1.2 is shiny server. Other interesting pictures worth checking out: ... Read More
Data Scientist Vs Data Analyst, What’s the Big Difference?
Feed: Featured Posts - Hadoop360. Author: Sharma Niti. Big data is one of the biggest buzzwords & trends in IT world at the moment & the increasing demand of it shows that it’s not going to slow down anytime soon. With the emergence of Big Data, the roles such as Data scientists & Data analysts are the new job titles around for a while. There are plenty of job openings with titles such as data scientists & data analysts. Although they sound almost similar, but both have a lot of differences. Data Scientist The Data Scientist professional understands data with a commercial ... Read More
Linear, Machine Learning and Probabilistic Approaches for Time Series Analysis
Feed: Featured Blog Posts - Data Science Central. Author: Bohdan Pavlyshenko. In this post, we consider different approaches for time series modeling. The forecasting approaches using linear models, ARIMA alpgorithm, XGBoost machine learning algorithm are described. Results of different model combinations are shown. For probabilistic modeling the approaches using copulas and Bayesian inference are considered. INTRODUCTION Time series analysis, especially forecasting, is an important problem of modern predictive analytics. The goal of this study is to consider different aproaches for time series modeling. For our analysis, we used stores sales historical data from a particular competition “Rossmann Store Sales”. These ... Read More
Data Science, Machine Learning, BI Explained in a Amazing Few Pictures
Feed: Featured Blog Posts - Data Science Central. Author: Vincent Granville. Guest blog post by Rubens Zimbres, PhD. This article brings images from my work modeling with Mathematica, my experience as a Business Analyst and also my doctorate lessons. For me, the borders between a properly executed Business Intelligence and Data Science (with substantive knowledge in Management) are fuzzy. See the picture below: What is a Data Scientist ? In my understanding, someone can be a data scientist according to his domain expertise: Business management, physics, computer science, etc. DATA SCIENCE AND BUSINESS INTELLIGENCE PHASES 1) UNDERSTAND PROCESSES First of all, ... Read More
When order of appearance of indexes matters in MySQL

Feed: Planet MySQL. Author: Rick Pizzi. Sometimes MySQL surprises you in ways you would have never imagined.Would you think that the order in which the indexes appear in a table matters?It does. Mind you, not the order of the columns - the order of the indexes.MySQL optimizer can, in specific circumstances, take different paths, sometimes with nefarious effects. Please consider the following table: CREATE TABLE `mypartitionedtable ` ( `HASH_ID` char(64) NOT NULL, `RAW_DATA` mediumblob NOT NULL, `EXPIRE_DATE` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, KEY `EXPIRE_DATE_IX` (`EXPIRE_DATE`), KEY `HASH_ID_IX` (`HASH_ID`)) ENGINE=TokuDB DEFAULT CHARSET=latin1 ROW_FORMAT=TOKUDB_UNCOMPRESSED/*!50100 PARTITION BY RANGE (UNIX_TIMESTAMP(EXPIRE_DATE))(PARTITION p2005 VALUES LESS THAN (1487847600) ENGINE = TokuDB, PARTITION p2006 ... Read More
Breaking Down Communication Barriers in Tech – Silicon Valley Data Science

Feed: Planet big data. Author: Meg Blanchette. An Interview with Travis Oliphant | February 21st, 2017 In late 2016 I spoke with with Travis Oliphant, co-founder of Continuum Analytics. We covered many topics, including building a community and balancing enterprise with open source. I’ve broken our conversation up into a series of posts, which will be published over the next several weeks. In this first part of our interview, we discuss breaking down silos, the importance of effectively communicating about cutting-edge technology, and where Anaconda is going next. What are you most excited about right now? I’m looking for a ... Read More
The 7 Logical Fallacies to avoid in Data Analysis

Feed: Planet big data. Author: Thomas Maydon. “Lies, damned lies and statistics” is the frequently quoted adage attributed to former British Prime Minister Benjamin Disraeli. The manipulation of data to fit a narrative is a very common occurrence from politics, economics to business and beyond. In this blog post, we'll touch on the more common logical fallacies that can be encountered and should be avoided in data analysis ... Read More
Job trends for R and Python

Feed: Planet big data. Author: David Smith. When we last looked at job trends from indeed.com, job listings for "R statistics" were on the rise but were still around half the volume of listings for "SAS statistics". Three-and-a-half years later, R has overtaken SAS in job listings for "statistics". I added Python to the search this time; job listings for "Python statistics" have risen at a similar rate to those for R, but with a somewhat higher volume for R. Since data science is popular job role these days, let's do the same search for "data scientist": For "data scientist" jobs, R and Python ... Read More
Fighting Advanced Persistent Security Threats with Anomaly Detection: Sometimes More is More

Feed: Big Data Feed. Author: itsing. Does this sound disturbing? You try to reach a particular website only to find the site is down. But it’s not that simple. You try another site – also not reachable. And another and another… You look to social media for in-the-moment reports about what’s happening and while you are reading about a huge swath of the country under cyber attack, that social media site goes out, too.This was what many people in North America experienced on 21 October 2016 when a widespread DDoS (distributed denial of service) attack took place in several waves, ... Read More
Recent Comments