- Home
- Tag: metadata
Posts tagged metadata
Information about other data or objects. In Amazon Simple Storage Service (Amazon S3) and Amazon EMR (Amazon EMR) metadata takes the form of name–value pairs that describe the object. These include default metadata such as the date last modified and standard HTTP metadata such as Content-Type. Users can also specify custom metadata at the time they store an object. In Amazon Elastic Compute Cloud (Amazon EC2) metadata includes data about an EC2 instance that the instance can retrieve to determine things about itself, such as the instance type, the IP address, and so on.
Tag: metadata
How to Build a Successful Metadata Management Framework

Feed: Alation. Author: Anthony Zumpano. June 28, 2022 — Collecting and using data to make informed decisions is the new foundation for businesses. The key term here is usable: Anyone can be data rich, and collect vast troves of data. The real challenge lies in getting people to access, manage, and search for it appropriately. This is where metadata, or the data about data, comes into play. Having a data catalog is the cornerstone of your data governance strategy, but what supports your data catalog? Your metadata management framework provides the underlying structure that makes your data accessible and manageable ... Read More
Persist and analyze metadata in a transient Amazon MWAA environment

Feed: AWS Big Data Blog. Customers can harness sophisticated orchestration capabilities through the open-source tool Apache Airflow. Airflow can be installed on Amazon EC2 instances or can be dockerized and deployed as a container on AWS container services. Alternatively, customers can also opt to leverage Amazon Managed Workflows for Apache Airflow (MWAA). Amazon MWAA is a fully managed service that enables customers to focus more of their efforts on high-impact activities such as programmatically authoring data pipelines and workflows, as opposed to maintaining or scaling the underlying infrastructure. Amazon MWAA offers auto-scaling capabilities where it can respond to surges in ... Read More
Matillion ETL Supports Data Lineage Tools with Metadata API Integrations

Feed: Matillion. Author: Julie Polito; At Matillion, we spend a lot of our time talking to users. We’re especially interested in hearing your thoughts and opinions on Matillion ETL; what you love about it, and how we could make it better. In these conversations, a topic that comes up time and time again is data governance. In particular, how Matillion handles data lineage. You’ve told us that you want Matillion ETL to be a platform for your chosen lineage tool (including in-house lineage reporting). We believe that Matillion ETL should work seamlessly with best of breed partner applications, and data lineage is ... Read More
Amazon EC2 Auto Scaling instance lifecycle states are now available via the Instance Metadata Service
Feed: Recent Announcements. For example, you may want to run additional initialization steps on your instance when it is transitioning between lifecycle states, such as downloading and installing software after instance launch. To do this, you can have your application perform the necessary action when it sees the appropriate state in IMDS, and use Amazon EC2 Auto Scaling Lifecycle hooks to wait for your initialization steps to complete before moving to the next lifecycle state ... Read More
Instance Tags now available on the Amazon EC2 Instance Metadata Service
Feed: Recent Announcements. You can now access your instance's tags from the EC2 Instance Metadata Service. Tags enable you to categorize your AWS resources in different ways, for example, by purpose, owner, or environment. This is useful when you have many resources of the same type—you can quickly identify a specific resource based on the tags that you've assigned to it. Previously, you could access your instance tags from the console or by using the describe-tags API. Now, by accessing tags from your instance metadata, you no longer need to use the DescribeInstance or DescribeTag API calls to retrieve tag ... Read More
Amazon Location Service adds metadata help customers reduce costs
Feed: Recent Announcements. Today, Amazon Location Service added metadata for tracking position updates to help developers reduce cost, improve accuracy, and simplify the development of tracking applications. Amazon Location Service Trackers already make it easy for developers to build highly scalable device-tracking applications by enabling them to retrieve the current and historical location of their tracked devices, and automatically evaluate device-positions relative to linked areas of interest (geofences). With the new metadata feature, developers can enrich these applications with additional information about each device’s position, for example the speed, direction, or engine temperature of vehicles, by including three user-defined key-values pairs ... Read More
Amazon SageMaker Model Registry now supports endpoint visibility, custom metadata and model metrics
Feed: Recent Announcements. SageMaker Model Registry, a purpose-built service which enables customers to catalog their ML models, now provides endpoint visibility from Studio UI, ability to store custom metadata and view/store broad array of metrics for a given model. SageMaker Model Registry catalogs customer’s models in a logical group (a.k.a. model group) and stores incremental versions of models as model package versions. Now, customers can associate custom metadata and custom metrics on a model package version. They can also store a broad array of metrics and baselines on a model package version; such as- data quality, model quality, model bias ... Read More
Swiftly Search Metadata with an Amazon S3 Serverless Architecture

Feed: AWS Architecture Blog. As you increase the number of objects in Amazon Simple Storage Service (Amazon S3), you’ll need the ability to search through them and quickly find the information you need. In this blog post, we offer you a cost-effective solution that uses a serverless architecture to search through your metadata. Using a serverless architecture helps you reduce operational costs because you only pay for what you use. Our solution is built with Amazon S3 event notifications, AWS Lambda, AWS Glue Catalog, and Amazon Athena. These services allow you to search thousands of objects in an S3 bucket ... Read More
Apache Ozone Metadata Explained
Feed: Cloudera Blog. Author: Xiaoyu Yao. Posted in Technical | June 02, 2021 11 min read Apache Ozone is a distributed object store built on top of Hadoop Distributed Data Store service. It can manage billions of small and large files that are difficult to handle by other distributed file systems. As an important part of achieving better scalability, Ozone separates the metadata management among different services: Ozone Manager (OM) service manages the metadata of the namespace such as volume, bucket and keys.Storage Container Manager(SCM) service manages the metadata of the cluster nodes and all the containers and pipelines.Datanode service ... Read More
Amazon EMR now supports Apache Spark SQL to insert data into and update Apache Hive metadata tables when Apache Ranger integration is enabled
Feed: Recent Announcements. This January, we launched Amazon EMR integration with Apache Ranger, a feature that allows you to define and enforce database, table, and column-level permissions when Apache Spark users access data in Amazon S3 through the Hive Metastore. Previously, with Apache Ranger is enabled, you were limited to only being able to read data using Spark SQL statements such as SHOW DATABASES and DESCRIBE TABLE. Now, you can also insert data into, or update the Apache Hive metadata tables with these statements: INSERT INTO, INSERT OVERWRITE, and ALTER TABLE. This feature is enabled on Amazon EMR 6.4 in the following ... Read More
Recent Comments