- Home
- Datawarehousing
- Polybase
Category: Polybase
Microsoft Ignite Announcements Nov 2021
Feed: James Serra's Blog. Author: James Serra. Microsoft Ignite has always announced many new products and new product features, and this year was no exception. Many exciting announcements, and below I list the major data platform and AI related announcements: SQL Server 2022: A new version of SQL Server has arrived! It is now in private preview. Some of the top new features are: Integration with Azure SQL Database Managed Instance — the Microsoft-managed, cloud-based deployment of the SQL Server box product. This integration supports migrations to Managed Instance through the use of Distributed Availability Group (DAG), which will enable near-zero-downtime database migrations ... Read More
Podcast and presentation decks on data architectures
Feed: James Serra's Blog. Author: James Serra. Tomorrow (Tuesday (8/10/21) I will be on a podcast for SaxonGlobal called “The Alphabet Soup of Data Architectures” where I will talk about the modern data warehouse, data fabric, data lakehouse, data mesh, and more. I hope you can check it out live here (I’ll post a link here when the recording is available). I have also created two more presentation decks: Data Lakehouse, Data Mesh, and Data Fabric So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to ... Read More
External tables vs T-SQL views on files in a data lake
Feed: James Serra's Blog. Author: James Serra. A question that I have been hearing recently from customers using Azure Synapse Analytics (the public preview version) is what is the difference between using an external table versus a T-SQL view on a file in a data lake? Note that a T-SQL view and an external table pointing to a file in a data lake can be created in both a SQL Provisioned pool as well as a SQL On-demand pool. Here are the differences that I have found: Overall summary: views are generally faster and have more features such as OPENROWSETVirtual ... Read More
Query options in Azure Synapse Analytics
Feed: James Serra's Blog. Author: James Serra. The public preview version of Azure Synapse Analytics has three compute options and four types of storage that it can access (mentioned in my blog at SQL on-demand in Azure Synapse Analytics). This gives twelve possible combinations of querying data. Not all of these combinations currently are supported and some have a few quirks of which I list below. (NOTE: I’ll demo these features at my sessions at European Digital Week on 9/25 (session info), SQL Bits on 10/3 (session info), PASS Summit on 11/10 (session info), and Big Data Conference Europe on ... Read More
Ways to access data in ADLS Gen2
Feed: James Serra's Blog. Author: James Serra. With data lakes becoming popular, and Azure Data Lake Store (ADLS) Gen2 being used for many of them, a common question I am asked about is “How can I access data in ADLS Gen2 instead of a copy of the data in another product (i.e. Azure SQL Data Warehouse)?”. The benefits of accessing ADLS Gen2 directly is less ETL, less cost, to see if the data in the data lake has value before making it part of ETL, for a one-time report, for a data scientist who wants to use the data to ... Read More
Big Data Workshop
Feed: James Serra's Blog. Author: James Serra. A challenge I have with customers who want to get hands-on experience with the Azure products that are found in a modern data warehouse architecture is finding a workshop that covers many of those products. To the rescue is a workshop created by my Microsoft colleagues Fabio Braga and Rod Colledge, explained in their blog post Azure Data Platform End2End with the GitHub located here. This is an on-demand workshop with labs that you can run at any time. The idea of this workshop is to give experienced BI professionals (but new to ... Read More
Where should I clean my data?
Feed: James Serra's Blog. Author: James Serra. As a follow-up to my blogs What product to use to transform my data? and Should I load structured data into my data lake?, I wanted to talk about where you should you clean your data when building a modern data warehouse in Azure. As an example, let’s say I have an on-prem SQL Server database and I want to copy one million rows from a few tables to a data lake (ADLS Gen2) and then to Azure SQL DW, where the data will be used to generate Power BI reports (for background on a ... Read More
What product to use to transform my data?
Feed: James Serra's Blog. Author: James Serra. If you are building a big data solution in the cloud, you will likely be landing most of the source data into a data lake. And much of this data will need to be transformed (i.e. cleaned and joined together – the “T” in ETL). Since the data lake is just storage (i.e. Azure Data Lake Storage Gen2 or Azure Blob Storage), you need to pick a product that will be the compute and will do the transformation of the data. There is good news and bad news when it comes to which ... Read More
Should I load structured data into my data lake?
Feed: James Serra's Blog. Author: James Serra. With data lakes becoming very popular, a common question I have been hearing often from customers is, “Should I load structured/relational data into my data lake?”. I talked about this a while back in my blog post What is a data lake? and will expand on it in this blog. Melissa Coates also talked about this recently, and I used her graphic below to illustrate: I would not say it’s common place to load structured data into the data lake, but I do see it frequently. In most cases it is not necessary to first ... Read More
SQL Server 2019 Big Data Clusters
Feed: James Serra's Blog. Author: James Serra. At the Microsoft Ignite conference, Microsoft announced that SQL Server 2019 is now in preview and that SQL Server 2019 will include Apache Spark and Hadoop Distributed File System (HDFS) for scalable compute and storage. This new architecture that combines together the SQL Server database engine, Spark, and HDFS into a unified data platform is called a “big data cluster”, deployed as containers on Kubernetes. Big data clusters can be deployed in any cloud where there is a managed Kubernetes service, such as Azure Kubernetes Service (AKS), or in on-premises Kubernetes clusters, such as AKS on ... Read More
Recent Comments