Azure AISearch – Indexing

Required Azure Role

Either one of these built-in roles (or a custom role with equivalent permissions):

  • Owner — full access including role assignments
  • Contributor — manage resources (no role assignments)

If role assignment is needed with Contributor, also grant User Access Administrator.

Azure Administration Tools / Interface

This guide uses the Azure Portal and a manual creation process.

Create and Manage Steps (Azure Portal)

  1. Click Import and vectorize data.
  1. Select Azure Blob Storage as the data source and choose the storage account created previously.
  1. Select the RAG scenario for indexing.
  1. Click on RAG and move to define the RAG. Remember the highlighted parameters that they need to be selected and also select “Enable deletion tracking”.
  1. On Vectorize your text, select your Azure OpenAI resource and choose the deployment text-embedding-ada-002. Pick authentication as API key or User assigned identity (define the identity if used).
  1. On Vectorize and enrich your images, leave defaults (current data scope has no images).
  1. In Advanced settings, enable Semantic ranker. Keep index fields as-is for now. Set an indexing schedule to match repository usage (e.g., every 10 minutes for busy repos, daily for low-change repos).
  1. Review configuration and click Create.
  1. Open the created Index and copy the Index name and Semantic configuration for use by OpenAI Chat Completions. Define these values in the Repository Administrator and the Azure App Service (bot backend).
  1. Go to the Indexer to verify run status and adjust scheduling or trigger manual runs as needed.

11.   Go to Indexer Settings, please ensure to setup these parameters. “Schedule” has to defined as per actual requirements on how fast new or updated repository contents should be indexed. Please note that AI agent would not have an access to updated/new contents from repository if they are not indexed. Enable indexer caching and select previously created storage account to keep the indexing caching data.

12.  Go to skillsets to make changes for performance and cost optimization . Select right skillset for indexer and update with suggested optimizations.

Update below skillsets fields as highlighted.

 

1. maximumPageLength – Define it to 3000

2. pageOverlapLength – Define it to 300

3. maximumPagesToTake – Define it to 10.