Deploying the Microsoft Purview Information Protection Scanner

Introduction

Most organizations protect their Microsoft 365 data well — but what about the sensitive files sitting on a file server that’s been running since 2014? The Microsoft Purview Information Protection Scanner bridges that gap. It extends your Purview labeling policies into on-premises repositories so that passport numbers, financial records, and confidential documents don’t slip through the cracks just because they live outside the cloud.

In this blog post I explain what the prerequisites are and how you can setup your own information protection scanner in a lab environment. After that I will share some key findings.

Prerequisites

Scanner server

RequirementMinimum spec
OSWindows Server 2016/2019/2022/2025 (64-bit, no Server Core)
CPU4 cores
RAM8 GB
Disk10 GB free (temp files: 4 per core × max file size)
SQLPort 1433 open to SQL server

Required outbound URLs (port 443)

*.aadrm.com
*.azurerms.com
*.informationprotection.azure.com
informationprotection.hosting.portal.azure.net
*.aria.microsoft.com
*.protection.outlook.com

Service account requirements

Microsoft requires the scanner service account to be an Active Directory account synchronized to Microsoft Entra ID. If you want to avoid syncing the account, you can use the -DelegatedUser parameter in Set-AIPAuthentication with a cloud-only account instead — which is the approach this guide takes.

Important — SQL permissionsThe service account needs securityadmin rights on the SQL instance during initial setup. This can be removed after the database is created and the scanner is operational.

Beyond SQL, the service account carries several other requirements depending on what the scanner needs to do. The table below covers everything:

RequirementNotes
Log on locallyNeeded for install and config only — can be removed once the scanner is operational.
Log on as a serviceGranted automatically during installation. Required for ongoing operation.
Repository permissionsFile shares: ReadWriteModify · SharePoint: Full Control · Discovery only: Read is sufficient.
RMS super userRequired if labels reprotect or remove protection. Enable the super user feature and add this account.
Site Collector AuditorRequired for scanning specific SharePoint URLs — grant at farm level.
LicenseAn Information Protection license must be assigned to the service account.

SQL server

Host SQL and the scanner on separate machines for any production deployment — a dedicated SQL instance is recommended and should not be shared with other applications. SQL Server 2016 is the minimum supported version.

EditionNotes
SQL Server ExpressTest environments only — database size limits apply.
SQL Server StandardSuitable for most production deployments.
SQL Server EnterpriseLarge-scale or high-availability deployments.

Purview portal roles

To create scanner clusters and content scan jobs in the Purview portal, you need one of the following roles:

Role
Organization Management
Compliance Administrator
Compliance Data Administrator
Security Administrator

SharePoint requirements

If you are scanning SharePoint Server document libraries, your farm must meet these requirements: We won’t get into sharepoint on-premise in this lab.

RequirementDetails
Supported versionsSharePoint 2013, 2016, and 2019. Other versions are not supported.
VersioningThe scanner inspects the last published version. If content approval is required, the labeled file must be approved before users can access it.
Large farmsCheck whether you need to increase the list view threshold (default: 5,000) so the scanner can access all files.
Long file pathsIf paths exceed 260 characters, increase httpRuntime.maxUrlLength on your SharePoint server to avoid scan timeouts.

Storage and capacity planning

There is no universal answer for how many scanner nodes you need — performance depends on server specs, storage throughput, network latency, file sizes, and policy complexity. The best approach is to run a representative pilot and measure your baseline.

Microsoft’s own benchmarks show a significant gap between modes: a 100 GB dataset completed in 68 minutes in discovery, versus 425 minutes in enforcement. Use the formula below to estimate your SQL storage requirements:

100 KB + <file count> × (1000 + 4 × <average file name length>)

Further reading Full prerequisites and capacity guidance: learn.microsoft.com/en-us/purview/deploy-scanner-prereqs

Lab environment

This guide walks through a realistic hybrid lab with the following four machines:

The goal: automatically detect and label any file containing a passport number stored on the file share.

Step 1 Configure sensitivity labels in Purview and publish them

Create two labels

General — label color grey, priority 0. This is your baseline label for everyday business documents with no sensitive content.

Sensitive — label color red, priority 1. Used for files containing passport numbers or other identity documents. Under Auto-labeling, add all passport number sensitive information types.

FieldValue
NameGeneral
Display nameGeneral
Label priority0
Description for usersFor general business files with no sensitive or critical data. Use this label for everyday documents that do not require additional protection.
Description for adminsDefault baseline label for non-sensitive business content. Intended for general documents that do not contain confidential, regulated, or business-critical information. Can be used as a default label to support consistent classification and user awareness.
Label colorGrey
General
FieldValue
NameSensitive
Display nameSensitive
Label priority1
Description for usersUse this label for files that contain passport details or copies of passports. This information is highly sensitive personal data and must be handled carefully.
Description for adminsLabel for documents containing passport information, passport numbers, scanned passports, or related identity documents. Intended to protect highly sensitive personal data and support stronger compliance and access controls.
Label colorRed
Sensitive

When you create the sensitive label go to auto-labeling go to “Sensitive info types” and add all the passport numbers.

Scope to Files & other data, we don’t want to configure access control for this LAB we just want to add labels to data.

Label Policies

1Go to Label Policies and create a label policy with the below settings

FieldValue
NameSensitive
DescriptionSensitive label for documents containing passport numbers
Published labelsGeneral, Sensitive
Publish to users and groupsExchange email – All accounts
Policy settingsDefault label for documents is: General
Default label for emails is: General
Default label for meetings is: General
Users must provide justification to remove a label or lower its classification

Step 2 Create the scanner cluster and content scan job

In the Purview portal, go to Settings → Information Protection → Information Protection Scanner.
Create a cluster
 — give it a meaningful name. In this lab: RockitoneFileShareCluster. Write this down exactly — it must match what you pass to Install-AIPScanner later.

FieldValue
Content scan job nameRockitone Content Scan
DescriptionContent Scanner
ClusterRockitoneFileshareScanner
ScheduleAlways
Info types to be discoveredEnabled
Treat recommended labeling as automaticDisabled
Enable DLP policy rulesDisabled
Enforce sensitivity labeling policyDisabled
Label files based on contentDisabled
Default labelGeneral
Relabel filesDisabled
Preserve “Date modified”, “Last modified”, and “Modified by”Disabled
Include or exclude file type to scanExclude specified file types
Excluded file types.lnk, .exe, .com, .cmd, .bat, .dll, .ini, .pst, .sca, .drm, .sys, .cpl, .inf, .drv, .dat, .tmp, .msp, .msi, .pdb, .jar, .ocx, .rtf, .rar, .msg
Default ownerSet repository owner
Repository ownerGeneral

Create a content scan job with the settings below, then add your file share repository as a UNC path (e.g. \\FS01\Z$).

Step 3 Create Content to SCAN

Go to your file shareserver and upload some random files with passport numbers.

Step 4 Create the Azure App Registration

The scanner needs a Microsoft Entra token to authenticate with the Purview service unattended. This requires an App Registration with the correct API permissions.

In the Azure Portal, go to Microsoft Entra ID → App Registrations → New registration.

Name: InformationProtectionScanner · Redirect URI: Web · http://localhost

From the Overview page, note down your Application (client) ID and Directory (tenant) ID.

Go to Certificates & Secrets → New client secret. Set expiry to 1 year and immediately copy the secret value — it is only shown once.

Add the following API permissions, then click Grant admin consent:

APIPermissionType
Azure Rights Management ServicesContent.DelegatedReaderApplication
Azure Rights Management ServicesContent.DelegatedWriterApplication
Microsoft Information Protection Sync ServiceUnifiedPolicy.Tenant.ReadApplication

Step 5 Install and configure scanner
Install Microsoft Purview Information Protection

Download the Microsoft Purview Information Protection client from the Microsoft Download Center and install it on your scan server. https://www.microsoft.com/en-us/download/details.aspx?id=53018 Install the protection client on the machine

Step 6 Create Scan Account as shown in the prerequisites.

Service Account

In Active Directory, create a new user account named svc-mips. Because GMSA is not an option, compensate with a strong static password — 32 to 64 randomly generated characters is the recommendation, aligned with Microsoft and NIST guidance. Since no human ever types this password, length has no usability cost.

Set the password to never expire. Forced rotation of service account passwords causes more outages than it prevents — a silently expired password will take the scanner offline without any obvious error.

Give modify rights to the share

Add user to SQL Database and give is securityadmin rights, (This is temp as you can remove this rights later)

Add the user to allow log on locally

Create a Cloud user in Entra “Cloud-SVC-MIPS”

Step 7 Install the Scanner

  1. Login as the SVC-MIPS on the server
  1. Open Powershell as administrator and run the following command. Its important that the cluster is the same name as we specified earlier in purview.
Install-Scanner -SqlServerInstance Servername\SQLDatabasename -Cluster RockitoneFileShareCluster

If it fails you can check the log here –> %localappdata%\Microsoft\MSIP\Logs\MSIPScanner.iplog

Step 8 Authenticate the scanner with a certificate (Preview)

Go to Configure and install the Microsoft Purview Information Protection scanner | Microsoft Learn

And choose your option in this case as this is a lab I will go for option B Self signed Certificate.

Create the certificate

# Create certificate in Local Machine store with RSA provider
$cert = New-SelfSignedCertificate -Subject "CN=PurviewScanner" -CertStoreLocation Cert:\LocalMachine\My -KeyExportPolicy Exportable -KeySpec Signature -KeyLength 2048 -KeyAlgorithm RSA -HashAlgorithm SHA256 -NotAfter (Get-Date).AddYears(2) -Provider "Microsoft Enhanced RSA and AES Cryptographic Provider"

# Display the certificate details
Write-Host "Certificate created successfully"
Write-Host "Thumbprint: $($cert.Thumbprint)"
Write-Host "Subject: $($cert.Subject)"

Export the certificate

# Export public key certificate (.cer file)
Export-Certificate -Cert $cert -FilePath "C:\temp\PurviewScanner.cer"

The scanner service account needs read access to the certificate’s private key. Open the certificate store by pressing Win + R, typing certlm.msc, and pressing Enter.

Navigate to the certificate you created, right-click it and select All Tasks → Manage Private Keys. Click Add, enter the scanner service account (<domain>\svc-mips), ensure Read is checked, and click OK.

In the Azure Portal, open the InformationProtectionScanner app registration and navigate to Certificates & Secrets. Select the Certificates tab, click Upload certificate, and upload the file from C:\Temp\PurviewScanner.cer.

Once uploaded, copy the thumbprint — you will need it when running Set-AIPAuthentication.

Run the following command in Powershell

# Get credentials for the scanner service account
$pscreds = Get-Credential CONTOSO\ScannerService

# Set authentication using certificate thumbprint
Set-Authentication `
    -AppId "your-app-id-guid" `
    -TenantId "your-tenant-id-guid" `
    -DelegatedUser "scanner@contoso.com" `
    -CertificateThumbprint "your-certificate-thumbprint" `
    -CertificateStoreLocation LocalMachine `
    -CertificateStoreName My `
    -OnBehalfOf $pscreds
    -SkipCertificateChainValidation

Because we are running a self-signed certificate we use the option -SkipCertificateChainValidation

Results should look like this;

Step 9 Run your first scan

Once the scanner node appears in the Purview portal under Information Protection Scanner → Nodes, you are ready to go. In the portal, navigate to your content scan job and click Scan now.

Open a file from the share that contains a passport number — it should now carry the Sensitive label. You have successfully extended your Purview compliance policies beyond the cloud and into your on-premises infrastructure.

Step 10 Remove Service Account in database

Because we want to use least privalged remove the svc-mips from the sysadmin SQL database.

Final Thoughts

I had some troubles with acquiring the access token on behave of the local service account after some troubleshooting I logged in as the service account it self and runned the commands it worked.

The following command can be helpfull with troubleshooting

Start-scannerDiagnostics

Thank you for reading my blog!


Ontdek meer van Rockit One

Abonneer je om de nieuwste berichten naar je e-mail te laten verzenden.

Geef een reactie