Anonymizing Dataverse Data using Power Automate Flows
Leveraging Power Automate flows for Bulk Updating Data in Dataverse
Introduction
How do we anonymize data in Dataverse? How do we bulk update, say, 100,000+ records in Dataverse?
I list some of the most commonly used methods.
XRM Toolbox
Write a standard workflow (classic workflows) and run it against a View using Bulk Workflow Execution add-in via XRM Toolbox.
Use the Bulk Data Updater add-in via XRM Toolbox.
I have used both XRM add-ins and they are both excellent and I cannot recommend them enough.
Are there any other low code options that allow us to run them in the background sans licensing requirements?
Let us write a requirement and build in public, build together and learn something along the way.
Low Code Engineering
This assumes that you are appropriately licensed for running Power Automate Flows.
Requirement
We constantly refresh our sandbox environments with a copy of production environment and most of these refreshes opt for Everything option instead of Customization and Schema option.
This copies the Contact table including email address, phone number, mobile number and address fields.
For our requirement, let us keep the requirement very simple.
All PII (Personal Identifiable Information) should be anonymized.
We are happy with the same value being used for email addresses for all contacts.
We want all Contact records to be anonymized, irrespective of the status.
We want a low code solution that easy to understand, easy to maintain, easy to extend and it should be very very FAST.
Build
Standard Workflows
Previously, we created an on-demand workflow called Anonymize Contacts which was an on-demand workflow where we set the various PII fields as below.
Email Address 1(schema name: emailaddress1) is set to text value of dummy@uat-client.com
Email Address 2(schema name: emailaddress2) is set to text value of dummy@uat-client.com
Mobile Phone (schema name: mobilephone) is set to 123456789
Address 1: Street 1 (schema name: address1_line1) is set to the text value of address1-street1
etc.,
We then launched XRM Toolbox, connected to the environment and ran the Bulk Workflow Execution add-in wherein we selected the Active Contact view and selected the standard workflow that we built above.
This worked when we had a smaller count of Contacts, but with larger count of Contacts (running into several hundred thousands and a requirement for a quick turn around!), it was time for something modern.
Power Automate Flows
The modern version of the above solution is to use Power Automate Flows. Let us build that Power Automate Flow and let us build it together.
The following assumes that you are familiar with Dataverse and you are appropriately licensed to build and run Power Automate flows.
Login as a user with appropriate privileges to Power Apps Maker (https://make.powerapps.com)
Make sure that you pick the correct Environment in the top right corner.
Click on Solutions in the left navigation
Open your development (preferred solution) and click New > Automation > Cloud Flow > Instant
Why Instant Flow?
It is an instant flow because you are anonymizing data and it needs to be an explicit user or human initiated action rather than anything automated or scheduled.
The first step is Manually trigger a flow. To this trigger, add an input parameter of type Text called Email Address Domain. Add another input parameter of type Yes/No called Developer Testing. We will be using these parameters later on.
Add an Action to Initialize a Variable. The name of the variable is v_s_EmailAddressDomain, the type of the variable is String and the initial value of the variable is -client.com
Why do we need this variable? This is to set the domain on the email address. For instance, if your company’s email domain is client.com, i.e., your email address is Joe@client.com, then for the UAT environment we want the email address as dummy@uat-client.com
Add an Action to Initialize a Variable. The name of the variable is v_s_FilterRows, the type of variable is String and the initial value of variable is an expression.
concat('(statecode eq 0 and not contains(emailaddress1,', '''', triggerBody()['text'],'''','))')
What are we doing in this expression? We want to be able to filter for ALL records that are Active and haven’t been anonymized.
We then add a Condition to check if the Input Parameter Developer Testing is set to true or false.
Now, depending upon how the expression is evaluated we will either update 1 row or we will update every single row that meets the filter criteria. The steps are exactly the same with 3 major changes and we will discuss them in detail.
We add an action to List Rows to the branching where the expression evaluates to true (i.e., where Developer Testing is set to true) where
Table is Contact
Select Columns is contactid
Filter Rows is set to the variable v_s_FilterRows
Row Count is set to 1
Why Row Count of 1? Well, we want to be able to test if our Flow works as expected for 1 record before executing the Flow for ALL the records meeting the Filter Rows criteria in the system.
We then add an Action of Apply to each where we update the Contact and set the fields to the values as per our requirement.
What is the expression for the Email Address 1 field? The expression is
concat('dummy@', triggerBody()['text'], variables('v_s_EmailAddressDomain'))
which will result in dummy@uat-client.comWe then take the steps of 10 and 11 using Copy to Clipboard and paste it to the other branch (i.e., where Developer Testing is set to false). However, in this branch, we will be making a few critical changes.
Row Count for the List rows action will not be set to 1 as we want to get all the rows meeting the filter criteria and update them.
For the List rows action, within Settings, Pagination will be switched ON and Threshold will be set to 100,000
For the Apply to each control, within Settings, Concurrency Control will be switched ON and Degree of Parallelism will be set to 50.
Your flow is now complete.
Run
Developer Testing
When you run the Flow, you will need to provide to the two Input Parameters. So, when I run it in the development environment, I will provide the value of dev
for the Input Parameter Email Address Domain so that the Contact record is updated with an email address of dummy@dev-client.com. I will also set the Input Parameter Developer Testing as true
so that one and only row is updated.
Run in Non-Development Sandbox Environment
After we package the Flow in a Solution and deploy it to the target sandbox environment, we can run the Flow but this time, we will set the Input Parameter Developer Testing to true
and the Input Parameter Email Address Domain will be set to uat
so that the Contact record is updated with an email address of dummy@uat-client.com for one record.
After we are happy, we will re-run the Flow where will set the Input Parameter Email Address Domain will be set to uat
but leave the Input Parameter Developer Testing as false
If you have over 100,000 contacts, you can re-run the Flow a few more times to anonymize all of your data.
Summary
This low code engineered solution allows you to anonymize over 300,000+ Contacts in less than a working day (approx. 6 hours and 17 minutes). It is easy to build and easy to maintain.