Posts by tomaztsql
Author: tomaztsql
Eight R Tidyverse tips for everyday data engineering
Feed: R-bloggers. Author: tomaztsql. [This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Tidyverse is a collection of R packages, primarily for data engineering and analytics. These packages are ggplot2, purrr, tibble, dplyr, tidyr, stringr, readr, and forcats. And all combine the same language, design and “grammar” structures. Collection of Tidyverse resources. Source: Tidyverse Piping (or chaining) is a great way to link the data manipulation ... Read More
Comparing performances of CSV to RDS, Parquet, and Feather file formats in R
Feed: R-bloggers. Author: tomaztsql. [This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. From the previous blogpost:– CSV or alternatives? Exporting data from SQL Server data to ORC, AVRO, Parquet, Feather files and store them into Azure data lake we have created Azure blob storage, connected secure connection using Python and started uploading files to blob store from SQL Server. Alongside, we compared the performance of ... Read More
Little useless-useful R functions – Animating datasets
Feed: R-bloggers. Author: tomaztsql. [This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. I firmly believe that animation and transition between different data states can give end-users much better insights and understanding of the data, than a single table with data points or correlation metrics. With help of ggplot, gganimate, you can quickly create an animation based on your needs. This is a simple IRIS ... Read More
Simple R merge method and how to compare it with T-SQL
Feed: R-bloggers. Author: tomaztsql. [This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Merge statement in R language is a powerful, simple, straightforward method for joining data frames. Nevertheless, it also serves with some neat features that give R users fast data wrangling. I will be comparing this feature with T-SQL language, to show the simplicity of the merge method. Creating data.frames and tables We ... Read More
Kadane’s algorithm – finding maximum sum in contigous sub-array
Feed: R-bloggers. Author: tomaztsql. [This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Great algorithm for analyzing timeseries data or array of numbers. An algorithm for finding the contiguous subarray within a one-dimensional array of numbers that has the largest sum. It is called Kadane’s algorithm. Largest sum How does the algorithm work? It looks for a global maximum of positive-sum on any sub-array, regardless ... Read More
Little useless-useful R functions – benchmarking vectors and data.frames on simple GroupBy problem
Feed: R-bloggers. Author: tomaztsql. [This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. After an interesting conversation on using data. frames or strings (as a vector), I have decided to put this to a simple benchmark test. The problem is straightforward and simple: a MapReduce or Count with GroupBy. Problem Given a simple string of letters: BBAARRRDDDAAAA Return a sorted string (per letter) as a ... Read More
A prerelease version of Jupyter Notebooks and unleashing features in JupyterLab
Feed: R-bloggers. Author: tomaztsql. [This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Jupyter notebook offers also the use of developers or prerelease versions of Jupyter notebooks. Improved Jupyter Notebook outlook What you need to do is simply run: python -m pip install notebook --pre --upgrade And with this prerelease version of the Jupyter notebook, you have in addition several options to enhance your workspace ... Read More
SQL vs. NoSQL for Data Science
Feed: R-bloggers. Author: tomaztsql. [This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Data come in variety of form, at different pace, and at different volume. And if all three criteria define the difference between SQL and NoSQL and there, all three are still irrelevant for data science. My theorem is, that no matter what shape, size, frequeny, value and trustworthiness, SQL type of presenting ... Read More
Little useless-useful R functions – Mastermind board game for R
Feed: R-bloggers. Author: tomaztsql. [This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Playing a simple guessing game with R. It’s called Mastermind game! This game was originally created for two people, but R version will be for single-player mode, when an R developer or R data scientists need a break. The gameplay is simple and so are the rules. The board contains 10 rows ... Read More
Little useless-useful R functions – Creating tiny Fireworks with R
Feed: R-bloggers. Author: tomaztsql. [This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. New Year’s eve is almost here and what best way to celebrate with fireworks. Snap, pop, crack, boom. This is the most peaceful, animal friendly, harmless, eco, children friendly, no-fire-needed, educative and nifty fireworks. To get the fireworks, fire up the following R function. ########################################## # # Tiny fireworks with R for ... Read More
Recent Comments