Introduction
Most organizations protect their Microsoft 365 data well — but what about the sensitive files sitting on a file server that’s been running since 2014? The Microsoft Purview Information Protection Scanner bridges that gap. It extends your Purview labeling policies into on-premises repositories so that passport numbers, financial records, and confidential documents don’t slip through the cracks just because they live outside the cloud.
In this blog post I explain what the prerequisites are and how you can setup your own information protection scanner in a lab environment. After that I will share some key findings.
Prerequisites
Scanner server
| Requirement | Minimum spec |
|---|---|
| OS | Windows Server 2016/2019/2022/2025 (64-bit, no Server Core) |
| CPU | 4 cores |
| RAM | 8 GB |
| Disk | 10 GB free (temp files: 4 per core × max file size) |
| SQL | Port 1433 open to SQL server |
Required outbound URLs (port 443)
*.aadrm.com
*.azurerms.com
*.informationprotection.azure.com
informationprotection.hosting.portal.azure.net
*.aria.microsoft.com
*.protection.outlook.com
Service account requirements
Microsoft requires the scanner service account to be an Active Directory account synchronized to Microsoft Entra ID. If you want to avoid syncing the account, you can use the -DelegatedUser parameter in Set-AIPAuthentication with a cloud-only account instead — which is the approach this guide takes.
Important — SQL permissionsThe service account needs
securityadminrights on the SQL instance during initial setup. This can be removed after the database is created and the scanner is operational.
Beyond SQL, the service account carries several other requirements depending on what the scanner needs to do. The table below covers everything:
| Requirement | Notes |
|---|---|
| Log on locally | Needed for install and config only — can be removed once the scanner is operational. |
| Log on as a service | Granted automatically during installation. Required for ongoing operation. |
| Repository permissions | File shares: Read, Write, Modify · SharePoint: Full Control · Discovery only: Read is sufficient. |
| RMS super user | Required if labels reprotect or remove protection. Enable the super user feature and add this account. |
| Site Collector Auditor | Required for scanning specific SharePoint URLs — grant at farm level. |
| License | An Information Protection license must be assigned to the service account. |
SQL server
Host SQL and the scanner on separate machines for any production deployment — a dedicated SQL instance is recommended and should not be shared with other applications. SQL Server 2016 is the minimum supported version.
| Edition | Notes |
|---|---|
| SQL Server Express | Test environments only — database size limits apply. |
| SQL Server Standard | Suitable for most production deployments. |
| SQL Server Enterprise | Large-scale or high-availability deployments. |
Purview portal roles
To create scanner clusters and content scan jobs in the Purview portal, you need one of the following roles:
| Role |
|---|
| Organization Management |
| Compliance Administrator |
| Compliance Data Administrator |
| Security Administrator |
SharePoint requirements
If you are scanning SharePoint Server document libraries, your farm must meet these requirements: We won’t get into sharepoint on-premise in this lab.
| Requirement | Details |
|---|---|
| Supported versions | SharePoint 2013, 2016, and 2019. Other versions are not supported. |
| Versioning | The scanner inspects the last published version. If content approval is required, the labeled file must be approved before users can access it. |
| Large farms | Check whether you need to increase the list view threshold (default: 5,000) so the scanner can access all files. |
| Long file paths | If paths exceed 260 characters, increase httpRuntime.maxUrlLength on your SharePoint server to avoid scan timeouts. |
Storage and capacity planning
There is no universal answer for how many scanner nodes you need — performance depends on server specs, storage throughput, network latency, file sizes, and policy complexity. The best approach is to run a representative pilot and measure your baseline.
Microsoft’s own benchmarks show a significant gap between modes: a 100 GB dataset completed in 68 minutes in discovery, versus 425 minutes in enforcement. Use the formula below to estimate your SQL storage requirements:
100 KB + <file count> × (1000 + 4 × <average file name length>)
Further reading Full prerequisites and capacity guidance: learn.microsoft.com/en-us/purview/deploy-scanner-prereqs
Lab environment
This guide walks through a realistic hybrid lab with the following four machines:

The goal: automatically detect and label any file containing a passport number stored on the file share.

Step 1 Configure sensitivity labels in Purview and publish them
Create two labels
General — label color grey, priority 0. This is your baseline label for everyday business documents with no sensitive content.
Sensitive — label color red, priority 1. Used for files containing passport numbers or other identity documents. Under Auto-labeling, add all passport number sensitive information types.
| Field | Value |
|---|---|
| Name | General |
| Display name | General |
| Label priority | 0 |
| Description for users | For general business files with no sensitive or critical data. Use this label for everyday documents that do not require additional protection. |
| Description for admins | Default baseline label for non-sensitive business content. Intended for general documents that do not contain confidential, regulated, or business-critical information. Can be used as a default label to support consistent classification and user awareness. |
| Label color | Grey |
| Field | Value |
|---|---|
| Name | Sensitive |
| Display name | Sensitive |
| Label priority | 1 |
| Description for users | Use this label for files that contain passport details or copies of passports. This information is highly sensitive personal data and must be handled carefully. |
| Description for admins | Label for documents containing passport information, passport numbers, scanned passports, or related identity documents. Intended to protect highly sensitive personal data and support stronger compliance and access controls. |
| Label color | Red |
When you create the sensitive label go to auto-labeling go to “Sensitive info types” and add all the passport numbers.



Scope to Files & other data, we don’t want to configure access control for this LAB we just want to add labels to data.
Label Policies
1Go to Label Policies and create a label policy with the below settings
| Field | Value |
|---|---|
| Name | Sensitive |
| Description | Sensitive label for documents containing passport numbers |
| Published labels | General, Sensitive |
| Publish to users and groups | Exchange email – All accounts |
| Policy settings | Default label for documents is: General |
| Default label for emails is: General | |
| Default label for meetings is: General | |
| Users must provide justification to remove a label or lower its classification |
Step 2 Create the scanner cluster and content scan job
In the Purview portal, go to Settings → Information Protection → Information Protection Scanner.
Create a cluster — give it a meaningful name. In this lab: RockitoneFileShareCluster. Write this down exactly — it must match what you pass to Install-AIPScanner later.

| Field | Value |
|---|---|
| Content scan job name | Rockitone Content Scan |
| Description | Content Scanner |
| Cluster | RockitoneFileshareScanner |
| Schedule | Always |
| Info types to be discovered | Enabled |
| Treat recommended labeling as automatic | Disabled |
| Enable DLP policy rules | Disabled |
| Enforce sensitivity labeling policy | Disabled |
| Label files based on content | Disabled |
| Default label | General |
| Relabel files | Disabled |
| Preserve “Date modified”, “Last modified”, and “Modified by” | Disabled |
| Include or exclude file type to scan | Exclude specified file types |
| Excluded file types | .lnk, .exe, .com, .cmd, .bat, .dll, .ini, .pst, .sca, .drm, .sys, .cpl, .inf, .drv, .dat, .tmp, .msp, .msi, .pdb, .jar, .ocx, .rtf, .rar, .msg |
| Default owner | Set repository owner |
| Repository owner | General |
Create a content scan job with the settings below, then add your file share repository as a UNC path (e.g. \\FS01\Z$).
Step 3 Create Content to SCAN
Go to your file shareserver and upload some random files with passport numbers.

Step 4 Create the Azure App Registration
The scanner needs a Microsoft Entra token to authenticate with the Purview service unattended. This requires an App Registration with the correct API permissions.
In the Azure Portal, go to Microsoft Entra ID → App Registrations → New registration.
Name: InformationProtectionScanner · Redirect URI: Web · http://localhost
From the Overview page, note down your Application (client) ID and Directory (tenant) ID.
Go to Certificates & Secrets → New client secret. Set expiry to 1 year and immediately copy the secret value — it is only shown once.
Add the following API permissions, then click Grant admin consent:
| API | Permission | Type |
|---|---|---|
| Azure Rights Management Services | Content.DelegatedReader | Application |
| Azure Rights Management Services | Content.DelegatedWriter | Application |
| Microsoft Information Protection Sync Service | UnifiedPolicy.Tenant.Read | Application |
Step 5 Install and configure scanner
Install Microsoft Purview Information Protection
Download the Microsoft Purview Information Protection client from the Microsoft Download Center and install it on your scan server. https://www.microsoft.com/en-us/download/details.aspx?id=53018 Install the protection client on the machine

Step 6 Create Scan Account as shown in the prerequisites.
Service Account
In Active Directory, create a new user account named svc-mips. Because GMSA is not an option, compensate with a strong static password — 32 to 64 randomly generated characters is the recommendation, aligned with Microsoft and NIST guidance. Since no human ever types this password, length has no usability cost.
Set the password to never expire. Forced rotation of service account passwords causes more outages than it prevents — a silently expired password will take the scanner offline without any obvious error.

Give modify rights to the share

Add user to SQL Database and give is securityadmin rights, (This is temp as you can remove this rights later)

Add the user to allow log on locally

Create a Cloud user in Entra “Cloud-SVC-MIPS”
Step 7 Install the Scanner
- Login as the SVC-MIPS on the server
- Open Powershell as administrator and run the following command. Its important that the cluster is the same name as we specified earlier in purview.
Install-Scanner -SqlServerInstance Servername\SQLDatabasename -Cluster RockitoneFileShareCluster


If it fails you can check the log here –> %localappdata%\Microsoft\MSIP\Logs\MSIPScanner.iplog
Step 8 Authenticate the scanner with a certificate (Preview)
Go to Configure and install the Microsoft Purview Information Protection scanner | Microsoft Learn
And choose your option in this case as this is a lab I will go for option B Self signed Certificate.
Create the certificate
# Create certificate in Local Machine store with RSA provider
$cert = New-SelfSignedCertificate -Subject "CN=PurviewScanner" -CertStoreLocation Cert:\LocalMachine\My -KeyExportPolicy Exportable -KeySpec Signature -KeyLength 2048 -KeyAlgorithm RSA -HashAlgorithm SHA256 -NotAfter (Get-Date).AddYears(2) -Provider "Microsoft Enhanced RSA and AES Cryptographic Provider"
# Display the certificate details
Write-Host "Certificate created successfully"
Write-Host "Thumbprint: $($cert.Thumbprint)"
Write-Host "Subject: $($cert.Subject)"
Export the certificate
# Export public key certificate (.cer file)
Export-Certificate -Cert $cert -FilePath "C:\temp\PurviewScanner.cer"
The scanner service account needs read access to the certificate’s private key. Open the certificate store by pressing Win + R, typing certlm.msc, and pressing Enter.
Navigate to the certificate you created, right-click it and select All Tasks → Manage Private Keys. Click Add, enter the scanner service account (<domain>\svc-mips), ensure Read is checked, and click OK.

In the Azure Portal, open the InformationProtectionScanner app registration and navigate to Certificates & Secrets. Select the Certificates tab, click Upload certificate, and upload the file from C:\Temp\PurviewScanner.cer.
Once uploaded, copy the thumbprint — you will need it when running Set-AIPAuthentication.
Run the following command in Powershell
# Get credentials for the scanner service account
$pscreds = Get-Credential CONTOSO\ScannerService
# Set authentication using certificate thumbprint
Set-Authentication `
-AppId "your-app-id-guid" `
-TenantId "your-tenant-id-guid" `
-DelegatedUser "scanner@contoso.com" `
-CertificateThumbprint "your-certificate-thumbprint" `
-CertificateStoreLocation LocalMachine `
-CertificateStoreName My `
-OnBehalfOf $pscreds
-SkipCertificateChainValidation
Because we are running a self-signed certificate we use the option -SkipCertificateChainValidation
Results should look like this;

Step 9 Run your first scan
Once the scanner node appears in the Purview portal under Information Protection Scanner → Nodes, you are ready to go. In the portal, navigate to your content scan job and click Scan now.


Open a file from the share that contains a passport number — it should now carry the Sensitive label. You have successfully extended your Purview compliance policies beyond the cloud and into your on-premises infrastructure.

Step 10 Remove Service Account in database
Because we want to use least privalged remove the svc-mips from the sysadmin SQL database.
Final Thoughts
I had some troubles with acquiring the access token on behave of the local service account after some troubleshooting I logged in as the service account it self and runned the commands it worked.
The following command can be helpfull with troubleshooting
Start-scannerDiagnostics
Thank you for reading my blog!
Ontdek meer van Rockit One
Abonneer je om de nieuwste berichten naar je e-mail te laten verzenden.
