- Home
- Tag: datasets
Posts tagged datasets
Tag: datasets
Use AWS Nitro Enclaves to perform computation of multiple sensitive datasets

Feed: AWS Compute Blog. Author: Sheila Busser. This blog post is written by, Jeff Wisman, Principal Solutions Architect and Andrew Lee, Solutions Architect. Introduction Many organizations have sensitive datasets that they do not want to share with others because of stringent security and compliance requirements. However, they would still like to use each other’s data to perform processing and aggregation. For example, B2B (business to business) companies often want to augment their customer information dataset with additional demographic or psychographic signals. This enrichment of data is often done by one party sending customer information to be matched against another party’s ... Read More
Creating access control mechanisms for highly distributed datasets

Feed: AWS Public Sector Blog. Author: Erin Chu. Security is priority number one at Amazon Web Services (AWS). Data stored in Amazon Simple Storage Service (Amazon S3) is private by default. However, some datasets are made to be shared. Research organizations such as the Chan Zuckerberg Biohub (CZB) and the Allen Institute have missions to produce high quality, open access datasets for the research community. Other entities are required by law to openly share data for a period of time. On the AWS Open Data Team, we work with data providers who distribute datasets that follow a “one-to-many,” or even “one-to ... Read More
Amazon SageMaker Canvas accelerates onboarding with new interactive product tours and sample datasets
Feed: Recent Announcements. Amazon SageMaker Canvas accelerates onboarding with new interactive product tours and sample datasets for different use cases. Amazon SageMaker Canvas is a visual point-and-click interface that enables business analysts to generate accurate machine learning (ML) models for insights and predictions on their own — without requiring any machine learning experience or having to write a single line of code. Starting today, Amazon SageMaker Canvas introduces interactive product tours to help you get started quickly and easily. When you log into SageMaker Canvas, you can try the product tours guiding you with each step of the ML journey ... Read More
Analyze Amazon Ion datasets using Amazon Athena

Feed: AWS Big Data Blog. Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon Simple Storage Service (Amazon S3) using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. Amazon Ion is a richly typed, self-describing, hierarchical data serialization format offering interchangeable binary and text representations. The text format extends JSON (meaning all JSON files are valid Ion files), and is easy to read and author, supporting rapid prototyping. The binary representation is efficient to store, transmit, and skip-scan parse ... Read More
Data Cleaning in R: 2 R Packages to Clean and Validate Datasets
Feed: R-bloggers. Author: Dario Radečić. Real-world datasets are messy. Unless the dataset was created for teaching purposes, it’s likely you’ll have to spend hours or even tens of hours cleaning it before you can show it on a dashboard. That’s where two packages for data cleaning in R come into play – janitor and data.validator. And today you’ll learn how to use them together. If you’re a software engineer, think of data cleaning and validation as writing and testing code. Think of data cleaning as coding an app – it takes a huge amount of time to get it working ... Read More
Little useless-useful R functions – Animating datasets
Feed: R-bloggers. Author: tomaztsql. [This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. I firmly believe that animation and transition between different data states can give end-users much better insights and understanding of the data, than a single table with data points or correlation metrics. With help of ggplot, gganimate, you can quickly create an animation based on your needs. This is a simple IRIS ... Read More
MLDataR – Real-world Datasets for Machine Learning Applications
Feed: R-bloggers. Author: R Views. This is a guest post from Gary Hutson, lead of Machine Learning at Crisp Thinking, a company that provides AI solutions to moderate and detect offensive and abusive content online. His website is available at https://hutsons-hacks.info/ and he can be reached through Twitter, @StatsGary. MLDataR package motivation I love all things Machine Learning. The MLDataR package was driven by the need to have example datasets across the healthcare system for machine learning problems. I have been a machine learning practitioner for over nine years; however, I still find it interesting to explore new examples and ... Read More
New datasets available on the Registry of Open Data from Space Telescope Science Institute, DNAStack, National Archives and Records Administration, and others
Feed: Recent Announcements. Read below for the 16 new or updated datasets from Space Telescope Science Institute, DNAStack, National Archives and Records Administration, and others available on the Registry of Open Data in the following categories. Statistical and regulatory: 1950 Census from National Archives and Records Administration Looking to make your data available? The AWS Open Data Sponsorship Program covers the cost of storage for publicly available, high-value, cloud-optimized datasets. We work with data providers who seek to: Democratize access to data by making it available for analysis on AWS Develop new cloud-native techniques, formats, and tools that lower the ... Read More
Update of compiled datasets (2022)
Feed: R-bloggers. Author: R | msperlin. [This article was first published on R | msperlin, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Back in 2020 I started to compile and share financial data in dataverse. The data covers corporate finance events from the DFP and FRE systems. The available tables are the same I use for my research and teaching material, and will be updated once a year. Today I updated all ... Read More
Using Snowflake to Access and Combine Multiple Datasets Hosted by the Amazon Sustainability Data Initiative

Feed: AWS Partner Network (APN) Blog. Author: Aaron Soto. By Aaron Soto, Sustainability Solutions Architect – AWSBy Bosco Albuquerque, Partner Solutions Architect – AWSBy Andries Engelbrecht, Partner Solution Architect – Snowflake Snowflake In his book, Greening Through IT, sustainability professor Bill Tomlinson writes that a fundamental challenge with how humans understand and act on environmental issues is that humans are not naturally equipped to operate at the scales of time, space, and complexity that these issues exist. Throughout history, IT systems have helped humans broaden how we operate across time, space, and complexity. So, while Amazon CEO Andy Jassy states ... Read More
Recent Comments