Intra ID Service Principals: Audit expiring secrets and certificates

Building a simple, inhouse and cheap solution to find Service principals with expiring authentication methods and orphaned service principals with no assigned owner. Read more
1 Sep, 2024

Introduction

In this blog post, I will address a common but often overlooked issue in many organizations today. Which is managing orphaned service principals (those without assigned owners) and handling situations where the secrets and certificates of any service principal are either about to expire or have already expired.

Additionally, I will cover how to warn both the service principal’s owner (if one is assigned) and the relevant IT, governance, or IAM teams, who will receive categorized lists of these objects, based on your choices.

This post marks the beginning of a new category of posts for me. The concept behind this new category is to create a functional solution that can be deployed in any Azure tenant by simply running the terraform code. This process will automatically provision the necessary resources, providing a fully operational solution to a specific problem, while ensuring a least privileges focus.

One aspect you might consider customizing is the random Microsoft-generated domain we use for sending notifications. Each domain is randomly generated by Microsoft and is unique for your solution.

Prerequisites

The prerequisites to enable this solution are fairly limited, as you primarily need permissions to deploy the solution and create a Service Principal for handling lookups in Entra ID.

The following are the requirements for implemeting this solution

User running terraform these permissions are only required during deployment or updates

  • Create new Service Principal
  • Create new Ressource Group
  • Set permission on ressource group and ressources inside ressource group
  • Add secret to key vault (eg. Key Vault Secrets Officer)

Service Principal created using terraform

  • API permissions (someone needs to approve these AFTER terraform has run)
    • Microsoft Graph
      • Application.Read.All (Microsoft docs)
        • Used to get the required information about when a secret or certificate is expiring, and who the owner of the Service Principal are
      • Organization.Read.All (Microsoft docs)
      • User.Read.All (Microsoft docs)
        • Required to find information of the Service Principal owner

Terraform

The client executing the code must have Terraform installed. Terraform is free to use and can be installed using the Install Terraform Guide, on the hashicorp homepage.

Azure CLI

To use Terraform with Azure, you’ll need the Azure CLI. Please refer to the Microsoft Learn for installation instructions.

Files

Download from Github

After the Terraform code is deployed, the permissions required by the deploying user are no longer needed and can be safely removed.

Solution

Let’s take a closer look at the solution.

Each step in the process will be described so that you understand exactly what you are deploying. As mentioned earlier, we are using terraform to deploy the needed infrastructure. All variables are defined in the variable.tf file.

Confidential information will be securely stored in a Key Vault, while all other variables will be saved in the Automation Account’s variable section. But you only have to focus on the variable.tf file, terraform will save the variables in the respective locations.

Note: We will not use your own domain in the default configuration. The Azure Communication Services will create a custom Microsoft domain. You can change this later using the GUI, or by customizing the terraform file “communication_service.tf”.

The following is a visualization of what will be deployed.

Solution design

Step 1 – Automation account trigger

The Automation Account will be triggered by the method you choose—either a timer or a manual trigger. Once triggered, it will execute a PowerShell script that uses the managed identity to gain access to both the Key Vault and the Storage Account. The script will retrieve the secret for the Service Principal with Entra ID access from the Key Vault.

Step 2 – Collect data and sort it

Once the secret is obtained, the script will connect to the MS Graph module using the Service Principal. It will gather a list of all Service Principals, their owners, and whether there are any expiring secrets or certificates associated with them.

Based on the boolean settings defined in the variable section, the script will perform one or more of the following actions:

  • Notify Owners: The script will iterate through all Service Principals. If it finds any with expiring secrets or certificates and an assigned owner, it will store the owner’s information and the relevant secret details in an array. After processing all Service Principals, it will send an email to each owner’s registered email address. The timing of these emails is controlled by the email_inform_owners_days_with_warnings variable, which specifies the number of days before expiry when notifications are sent.
  • Expired Secrets and Certificates: The script will identify all Service Principals with already expired secrets or certificates. This list will be sent to the defined contact email.
  • About-to-Expire Secrets and Certificates: The script will compile a list of all Service Principals with secrets or certificates nearing expiration. This list will also be sent to the defined contact email.
  • Orphaned Service Principals: The script will identify all Service Principals without defined owners. This list will include a column indicating whether the specific Service Principal contains any secrets or certificates. The list will be sent to the defined contact email.

Step 3 – Store the data we need for later

The lists created in the previous step will be saved as CSV files in the Storage Account. These files will be used in a later stage by the Logic App.

Step 4 – Trigger the logic app

For each of the above tasks, we are going to trigger the workflow in the logic app. The logic app will trigger based on “request type” in the object that are created in the powershell scrip.

Dependent on which call we are sending they will be handled differently in the logic app. The managed identity of the logic app will require permissions for the storage account to get the csv files, that we created in step 2

Step 5 – Sending the notification

The logic app will forward the message to the Azure commmunication service environment with the e-mails, csv files (where applicable) and other information that we gathere using the powershell script.

Overview of the logic app

Variables

Time to take a look at the variables, and what they do. Anything written in the variables.tf file will be stored as either a variable inside the automation account, or as a secret in the key vault.

They are split up in two categories based on if they require a custom value

You can set tags that you need in the locals file – or just keep it empty.

You can use tags. The tags will be applied to all ressources that supports it

Requires customisation

email_Contact_email_for_notification_emails

Description:This is the e-mail address that should be used to send a message about all the expiring secrets/certs where an owner could not be found: Note: they will be send as an attachement in CSV format

Type: String

Requires e-mail address in correct format, there are no checks to verify if this is correct or not

subscription_id

Description: Provide subscription id for deployment

Type: String

tenant_id

Description: Provide tenantid for the specific tenant. This is used during signin

Type: String

baseline_resource_group_name

Description:Resource group where all resources are deployed

Type: String

key_vault_resource_name

Description: Provide name for key-vault, The key vault will be used to store secrets that we don’t wish to store in clear text

Type: String

automation_account_solution_name

Description: This is the name of the automation account. NOTE: This name has to be unique

Type: String

Communication_service_naming_convention

Description: This is a short name, that will be used in front of each of the communication services ressources. Name is used for ressources, so you can use it if you have a naming convetion etc.

Type: String

Service_Principal_name

Description: Service Principal name the application_id value. Used for connecting to Entra ID and collecting secrets and certificates

Type: String

key_vault_secret_key_name

Description: Identity of the secret used for the service principal that have access to see values in entra ID. This name is also used on the SP to identify the key

Type: String

Default values

email_Contact_email_get_list_of_orphaned_Service_Principals

Description: This will send an email to the governance team, with a list of all SP’s that does not have an owner assigned (default: true)

Type: Boolean

Referring to:

email_Contact_email_for_all_SPs_with_expired_secrets_status

Description: Enable this value to notify the governance or IT team about the status of all SP’s with expired secrets or certificates (default: true)”

Type: Boolean

Referring to:

email_Contact_email_for_all_SPs_where_secret_is_about_to_expire

Description: This will send an email to the governance team, with a list of all SP’s where the secret is about to expire (default: true)

Type: Boolean

Referring to:

email_inform_owners_directly

Description:This boolean will define wether or not owners will be contacted directly on expiring or expired secrets and certificates. All owners of the specific SP will be contacted. The owners will be contacted on the days specified in the ’email_inform_owners_days_with_warnings’ variable (default: true)

Type: Boolean

Referring to:

email_inform_owners_days_with_warnings

Description: Define with a string on which days the owner of a SP should receive the notification. eg. 0,1,2 means they will receive the email on the day it expires, 1 day before and 2 days before and so on. (default 1,2,3,4,5,6,7,14,21,28,30)

Type: String

Default: 1,2,3,4,5,6,7,14,21,28,30

secret_cert_days_to_expire

Description: Used in powershell script: the value here defines when a secret will be reported as expiring. (30 days default)

Type: String

Default: 30 days:

logic_app_communication_service_primary_connection_string

Description: Identity of the secret used for the communication service. The name is irrelevant, but for good measure you can still choose on if you wish.

Type: String

Referring to:

location

Description: Define the datacenter where the solution should be deployed

Type: String

Default: sweden central

Deployment

populate the required variables

Let’s get the it deployed. By now you should have downloaded the files from github, and dependent on your preferred editor it should look something like this. The only file that you need to open is the variables.tf.

Ensure that you have populated the fields that are in the “Requires customisation” above and press save.

Configure the variables.tf with your preferred editor

Create ressources by running terraform

I’m assuming that terraform is installed, and that it’s correctly added to the path attribute (if you are using Windows). Be sure that you CD into the terraform folder.

Run terraform init to download the required terraform modules, and run terraform plan afterwards to check if everything looks correct

terraform init downloads the terraform modules that are required.

At this step you will be notified if you do not follow the naming conventions for each resource. Common mistakes include using underscores in resource names where they are not supported. Please note that this process does not check for the uniqueness of names where it is required. This validation will only occur when you run terraform apply.

It should create 46 new ressources in Version 1.

Confirm the deployment with terraform apply

Now, this will run for a few minuts. It should return with a message about everything being created. If not, it’s usually because of the ressource naming not following the naming convention or a ressource name was not unique.

Should show all ressources as created

Now checking the ressource group should show that all ressources have been created.

All created ressources should now be visible

If you do not expect to maintain this using terraform, then you can delete the state file and terraform files from you local drive now. The rest can be handled from the portal.

Provide permissions to service principal

The next step is to grant API permissions to the Entra ID service principal. Usually, this step is handled by a specific team, so i have not automated it. If you have the necessary permissions, you can manually grant the required permissions to the Service Principal in Entra ID.

Remember to grant the permissions for the Service Principals

Update modules on automation account (Requires update 1.1.0)

Due to Azure automation accounts having some default modules installed, and since we can’t controle these with terraform unless we import them, the update process will be done by powershell.

Run the runbook named “update-az-modules”.

This will let Powershell update all default modules with the versions as stated in the current az module version.

You can upgrade to higher versions if you prefer this. The reason for the versions being as they are here, are to ensure that the module versions are tested with this specific usecase.

If a module failed, you can run the runbook again, and it should solve the problem.

Add the custom domain to the allowlist (Optional)

As part of the script, we have created a custom Microsoft domain with SPF, DKIM, and DMARC enabled. This configuration should help prevent the domain from being flagged as spam. However, depending on your email settings, the domain may still be marked with an “external” tag. This tagging can lead to users being cautious or distrusting of the email, which is a valid concern.

Remember to add the domain to your organisations allowlist

Change which part is running

To change which parts of the scripts is running, you can change the variables in variables.tf and run terraform apply again, or open the automation account –> Shared Resources –> Variables block, and change the value manually.

The script are looking at this location for the information, and the variables.tf basically just updates it here

Configuring which part of the script to run

Run manually or setup Schedule

I have not terraformed this part, so in this version you basically need to open the automation account, and either run it, or create a schedule in the Automation account –> Shared Resources –> Schedules, and attach it to the runbook.

Creating a schedule for the runbook

Or run the runbook manually with the start command

Manully starting the runbook

The Result

Based on the enabled features, the script will now run through all Service Principals, and sent notifications. Let’s run through what each setting does.

Each template from below can be edited in the logic app.

email_Contact_email_get_list_of_orphaned_Service_Principals

The defined contact will receive an e-mail like the following.

The attachement will show a simple 5 column overview. In this case, you should get an idea about how many times i have deployed this solution in my test tenant.

email_Contact_email_for_all_SPs_with_expired_secrets_status

Same template as above, but with a new text and subject.

The attachement contains both the orphaned service principals, aswell as the ones that have a defined owner.

email_Contact_email_for_all_SPs_where_secret_is_about_to_expire

The last list is a list with all secrets and certificates that are about to, but has not yet expired. The list will send items that are both with and without owners

email_inform_owners_directly

An email with the following template will be sent out for each owner for each Service Principal, covering every secret or certificate that is about to expire or already expired.

In this example it expired 7 days ago

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

About Martin Meiner Tästensen

I am a positive minded It professional, with more than 10 years of experience in various IT roles. I love to build, create and see how an idea can become a system affecting hundreds or thousands of employees – and at the same time try to make the system work as smooth as possible to give the best end user experience. My Primary focus for the first decade of my career has been towards classic infrastructure where I worked with a deep knowledge level on both network and servers – where the latter has ranged from physical servers, over OS layer and all the way to the services installed on the server.

Related articles

You might also like..