Microsoft Fabric: CI/CD considerations

This blog is the third post in a series around Microsoft Fabric, a relatively new Software-as-a-Service solution for anything concerning data platforms and data warehouses. In the previous blog posts, we explained the use of Link to Microsoft Fabric to seamlessly integrate Microsoft Dataverse into Fabric, and we discussed security options in Security Considerations when using Microsoft Fabric.

In this blog post we will have a look in how you can set up your Continuous Integration / Continuous Deployment when using Microsoft Fabric, what the different options are for your CI/CD including pro’s/cons and also specific things you need to be aware off before making your final decision on your CI/CD setup and Fabric components you will use. In the next blog post we will talk more about how to setup Fabric Workspaces for your solution and what you need to consider over there.

When first starting with working with Microsoft Fabric (January 2024) options were limited when working with DTAP environments, however lately Microsoft has improved support considerably. There are several topics that need to be addressed in order to have a successful deployment strategy, which we will discuss:

  1. Git integration, API’s and Fabric Deployment Pipelines.
  2. Different CI/CD scenarios.
  3. Supported Fabric components for CI/CD

Keep in mind that Microsoft is still improving and changing Microsoft Fabric so some drawbacks could already be out-of-scope when reading this blog. 

Git integration and Fabric Deployment Pipelines

The use of a DTAP environment is a must for any (Enterprise) data or software environment and for Microsoft Fabric this is no exception. Since almost 6 months now Microsoft has some options in Fabric to set this up, partly still in preview. There are a couple of services and API’s that you can use to develop your own setup which are explained next:

Git integration

It is possible to setup Git integration from within the Fabric portal but what does that actually mean? In Microsoft Fabric, the concept of Workspaces is used, where resources are part of the Workspace, and a developer team works in 1 or multiple Workspaces. It is easy to share data between Workspaces by directly connecting to data in another Workspace (Shortcuts). Git integration makes use of the concept of 1 Workspace per environment, which implies that your DTAP environments will be mapped over multiple Workspaces with the integration part on at least 1 Workspace. Instructions on how to setup Git integration can be found here. The most important part is that you can select which Branch to use and also to use Git folders (in case you have for example multiple Workspaces per environment).


You will have full control of syncing or pulling new and updated resource items from the Workspace to Git, also current statuses per resource item will be shown. It is only possible to see the actual differences after you have committed a change to Git though.


Other options present are branching out to new Workspaces or checking out to new Git branches.

Microsoft Fabric REST APIs

Microsoft Fabric REST APIs are supported from Azure DevOps and Github, and together with the Git integration option enable automatic deployments of both infrastructure (since it is SaaS think about deployment of Workspaces, Workspace roles, permissions) and Fabric resource items with the use of Fabric Item APIs. In addition, since OneLake supports the same APIs as ADLS Gen2, it is also possible to upload content like files and folders to Lakehouses. Lakehouse and Warehouse table contents can be updated with the use of general SQL statements which require Fabric Notebooks, Sparkjobs or T-SQL in the Warehouse. 

Fabric Deployment Pipelines

Deployment Pipelines already existed before in Power BI, however they have been extended to also include Fabric specific resources and have a new GUI for some time now (still in preview). Deployment Pipelines are part of the Software-as-a-Service (SaaS) experience which means they are integrated into the Fabric portal and only require configuration work. The Fabric deployment pipelines are always focused on a single ‘stage’ a.k.a. Workspace that needs to be deployed to a next stage.


The following main features are currently present:

  • Compare and Deployment: You can compare each resource item in your Workspace and deploy individually (or all at once) to the next stage.
  • Deployment history: There is an overview of previous attempts (and results).
  • Deployment rules: In deployment rules you can define which connection and parameters need to be changed during deployment.
  • Stage definitions: You can define your own stages for your OTAP.

Keep in mind, the Deployment Pipelines are still in preview and there are some things you can’t do (yet), the most important ones I found out are the following:

  • You cannot perform a rollback at the moment, you would either have the change the code in the original Workspace or when connected with Git work with the Branches to perform a rollback. In Azure DevOps Pipelines rollbacks are possible (with some coding).
  • There are still some components which lack support of Deployment rules, specifically connections to Data sources / Destinations outside of Microsoft Fabric cannot be changed dynamically. This is on the Microsoft Fabric roadmap for 2024 Q4, so hopefully deployment rules will also support that accordingly.
  • If your DTAP street has multiple Workspaces per environment, you also need multiple Deployment Pipelines to support deployment. This is a minor point, but something to keep in mind.  It is not possible to trigger a Deployment Pipeline from another Deployment Pipeline, orchestration needs to be done externally (e.g., Azure DevOps Pipelines).

Different CI/CD scenarios

With all the options mentioned above there are many different scenarios you could create, but for this blog post we will present 3 different ones and explain the pros and cons of each of them. It is up to you to ultimately decide how to set it up based on your own requirements. As a side note, we tested Azure DevOps in the examples, but you could also use Github instead. It has not been tested by us yet, since this integration is relatively new.

Also, recently Microsoft published some options for CI/CD in Fabric (18 September 2024) that you can use as input. Most of those options are similar to the ones in this blog post, however over here we also mention any specific functionalities that don’t work yet or other specifics you need to be aware of.

In the next scenarios we will have as example a simple Workspace containing the following elements:

  • A Lakehouse for storage of several managed tables.
  • A Data Pipeline retrieving data from some external Data source outside of Microsoft Fabric and updating the managed tables in the Lakehouse.
  • A custom semantic model with report referencing the Lakehouse (DirectLake connection)
 

We will have the following requirements to test the options:

  • There should be a Development, Test and Production environment.
    T
    he Data Pipeline connection to external source needs to be dynamically changed.
  • The connections to Lakehouse need to be dynamically changed.
  • All schemas in the Lakehouse need to be updated automatically.

Next, lets create 3 different scenarios and see if we can fulfil those requirements

Scenario 1: Git integration on Dev Workspace, Fabric Pipelines for orchestration


In this scenario Git integration is only used for syncing with the Development Workspace, and the Dev workspace is the source of truth.  Keep in mind that the Git branch is not the source of truth when using a Fabric Deployment Pipeline, deployments like these have a Workspace as source.

This scenario has as benefits that it is quick to setup and requires no code at all. The downside here is that deployment rules are limited currently for some components, in the case of the Data Pipeline connection you need to do a manual change after every deployment. Depending on your Data Pipeline the same goes for Lakehouse managed tables, they will not be updated after Deployment. It is true that a Data Pipeline can also update the Lakehouse managed table, however that can only be done while running the Data Pipeline. Another solution could be to have a Fabric Notebook that runs as part of your ETL which updates any Table upfront where needed.

Scenario 2: Git integration on all Workspaces, Azure DevOps Pipelines for orchestration


In this scenario every Workspace is connected to a Git branch, and updates are performed with the use of Fabric Git APIs. Orchestration in this case is done with the use of Azure DevOps Pipelines which compared to Fabric Pipelines is far more evolved and has features like branch policies, branch securities, reviewers/approvals and so on.

Since connections and schema updates are not support with Git updates, this needs to be done in a post-deployment step in Azure DevOps Pipelines. An example of this is that updating the Semantic Model connection now also needs to be implemented with the use of Rest APIs. Updating Lakehouse tables is only possible by executing a Fabric Notebook/SparkJob or during a Data Pipeline run. You can find some examples over here when you want to make use of Spark.

When you consider this scenario also make sure to protect your test and main branch, so they are not getting accidently overwritten by a user action in the Fabric Deployment Pipeline. You can use policies to accomplish this.

Scenario 3: Git integration on Dev Workspace, DevOps Pipelines for orchestration  


Just like with scenario 1 Git integration is only used for syncing with the Development Workspace, however in this scenario we are going to make Git the source of truth. You can also combine this with trunk-based or git-flow based developments if needed. Instead of using Fabric Deployment Pipelines or Git integration deployments we will use Azure DevOps Pipelines, Build Artifacts and the Fabric REST API (Item) for orchestration and deployments.  

The benefit of this scenario over the others is that you can dynamically change anything in the code during building your artifact, thus supporting dynamic connections. Just like scenario 2 this will require more implementation work.

Summary

Depending on your requirements you could opt for one of the above scenario’s or make a hybrid solution. If you want to keep it simple and low-code and you can wait with dynamic connection support until 2024-Q4 scenario 1 could be the correct one for you. If you can handle the addition effort for developing your own DevOps pipelines the other 2 scenarios are definitely interestingly enough to investigate, especially if you already heavily invested in CI/CD with the use of Azure DevOps Pipelines. 

Supported Fabric components for CI/CD

At the time or writing this blog Dataflows Gen2 is not supported for both Git integration and Fabric Deployment Pipeline.  Spark Job Definitions are supported in Git integration but not Fabric Deployment Pipeline yet. The following list are the components we verified and have support for both, please also check the Fabric documentation:

  • Data Pipeline: supported, however external data connections don’t have deployment rules support yet (Q4 2024).
  • Lakehouse: supported. Keep in mind though that folder structures and managed tables (schema) will not be part of deployments, you need to implement that yourself if required.
  • Warehouse: supported. This is including schema’s, stored procedures and functions.
  • Notebook: supported. Keep in mind that only the default Lakehouse can be changed with the use of deployment rules.
  • Semantic model: Supported (Direct Lake) if it is not a Push semantic model.
  • Reports: Supported (if the semantic model is also supported)

Conclusion

This guide provided some insights in the various technical options in Microsoft Fabric for source control and deployments of your Fabric resources, including any drawbacks and important things to know when using it. We hope that this will help you out in your projects!

Facebook
X
LinkedIn
WhatsApp
Email