- Home
- Tag: datasets
Posts tagged datasets
Tag: datasets
Copy large datasets from Google Cloud Storage to Amazon S3 using Amazon EMR
Feed: AWS Big Data Blog. Many organizations have data sitting in various data sources in a variety of formats. Even though data is a critical component of decision-making, for many organizations this data is spread across multiple public clouds. Organizations are looking for tools that make it easy and cost-effective to copy large datasets across cloud vendors. With Amazon EMR and the Hadoop file copy tools Apache DistCp and S3DistCp, we can migrate large datasets from Google Cloud Storage (GCS) to Amazon Simple Storage Service (Amazon S3). Apache DistCp is an open-source tool for Hadoop clusters that you can use ... Read More
New datasets available on the Registry of Open Data from University of Sydney, International Brain Laboratory, Taiwanese Central Weather Bureau, and others
Feed: Recent Announcements. Read below for 26 new or updated datasets from University of Sydney, International Brain Laboratory, Taiwanese Central Weather Bureau, and others are available on the Registry of Open Data in the following categories. Looking to make your data available? The AWS Open Data Sponsorship Program covers the cost of storage for publicly available, high-value, cloud-optimized datasets. We work with data providers who seek to: Democratize access to data by making it available for analysis on AWS Develop new cloud-native techniques, formats, and tools that lower the cost of working with data Encourage the development of communities that ... Read More
Create and reuse governed datasets in Amazon QuickSight with new Dataset-as-a-Source feature

Feed: AWS Big Data Blog. Amazon QuickSight is a fast, cloud-powered, business intelligence (BI) service that makes it easy to deliver insights to everyone in your organization. QuickSight recently introduced Dataset-as-a-Source, a new feature that allows data owners to create authoritative datasets that can then be reused and further extended by thousands of users across the enterprise. This post walks through an example of how QuickSight makes it easy to create datasets that are reusable and easy to govern, with Dataset-as-a-source. Introducing Dataset-as-a-Source Dataset-as-a-Source allows QuickSight authors and data owners to create authoritative datasets, as a single source of truth, ... Read More
Amazon Fraud Detector now supports event datasets
Feed: Recent Announcements. We are excited to announce event dataset storage for Amazon Fraud Detector. The new capability enables customers to easily send and store their production fraud data directly within Amazon Fraud Detector. Customers can use their event datasets to train machine learning (ML) models with higher predictive performance since the models can apply historical context to new events by automatically calculating values such as account age and purchase frequency. Customers can also move faster by retraining models without needing to upload a new training dataset to S3, and they can close the feedback loop from offline fraud investigations ... Read More
Prepare and visualize time series datasets in Amazon SageMaker Data Wrangler
Feed: Recent Announcements. Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes. With SageMaker Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow, including data selection, cleansing, exploration, and visualization from a single visual interface. Starting today, you can use new capabilities of Amazon SageMaker Data Wrangler that help make it easier and faster to prepare data for ML including a new collection of time series transformations and two new time series visualizations to quickly ... Read More
How Rapid7 built multi-tenant analytics with Amazon Redshift using near-real-time datasets

Feed: AWS Big Data Blog. This is a guest post co-written by Rahul Monga, Principal Software Engineer at Rapid7. Rapid7 InsightVM is a vulnerability assessment and management product that provides visibility into the risks present across an organization. It equips you with the reporting, automation, and integrations needed to prioritize and fix those vulnerabilities in a fast and efficient manner. InsightVM has more than 5,000 customers across the globe, runs exclusively on AWS, and is available for purchase on AWS Marketplace. To provide near-real-time insights to InsightVM customers, Rapid7 has recently undertaken a project to enhance the dashboards in their ... Read More
Celebrate Open Science Week with the Allen Institute and available open datasets

Feed: AWS Public Sector Blog. Author: Jenny Burns. The Allen Institute seeks to understand how our brains, cells, and immune systems work when we are healthy and, ultimately, how they go wrong in disease. In the course of their studies, Allen researchers have generated and shared atlases that map the brain, gene-edited stem cell lines, and many more publicly available resources that have been used by millions of scientists around the world to accelerate their research. The Allen Institute collaborates with Amazon Web Services (AWS) and the Registry of Open Data on AWS to make many of their datasets publicly ... Read More
Code performance in R: Working with large datasets
Feed: R-bloggers. Author: INWT-Blog-RBloggers. [This article was first published on INWT-Blog-RBloggers, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. This is the fourth part of our series about code performance in R. In the first part, I introduced methods to measure which part of a given code is slow. The second part lists general techniques to make R code faster. The third part deals with parallelization. In this part we are going to ... Read More
Announcing general availability of backup and restore for Power BI datasets
Feed: Microsoft Power BI Blog | Microsoft Power BI. Author: . We are very thrilled to announce the general availability (GA) of Backup and Restore for datasets in Power BI Premium and Premium per User (PPU). Whether you are migrating Azure AS workloads to Power BI or must consolidate Power BI tenants due to a merger or acquisition or simply want to backup Power BI datasets on a regular basis to meet the data retention and disaster recovery requirements of your organization, you can now rely on the Backup and Restore capabilities of Power BI as a fully supported feature ... Read More
Amazon Rekognition Custom Labels makes it easy for customers to learn how to train machine learning models by providing tutorial videos, and sample datasets
Feed: Recent Announcements. Amazon Rekognition Custom Labels introduces a simplified on-boarding experience with the ability to explore images, labels, and datasets by one-click creation of example projects. Amazon Rekognition Custom Labels provides out of the box video tutorials, and example projects with hundreds of images for single-class classification, multi-class classification, object detection, and logo detection. To get started, launch the Rekognition Custom Labels console and select the Get started button. You can select the option to learn through quick two-minute tutorial videos, or select the option to learn through example projects. Once you select an example project, Rekognition Custom Labels ... Read More
Recent Comments