In this blog, I will show you how you can build your own custom classification model using the Document Intelligence Studio. If this is your first time looking at the Document Intelligence Studio I recommend looking at this blog first: First look at Document Intelligence Studio
Blogs
This blog is part in a series of blogs about the capabilities of the Azure Document Intelligence resource.
Requirements
- Azure ‘Document Intelligence’ or ‘Azure AI services multi-service account’ resource
- Azure ‘Storage account’
- Minimal of 10 Documents (5 per category, minimum of 2 categories)
Project setup
Open the custom classification model page in Document Intelligence Studio (1). Scroll down the page and select Create a project in the My projects list (2).
This will open a wizard with four steps for setting up your project.
- Provide a name and description for the project.
- Select the Document Intelligence resource and API version you want to use.
- Select the storage account, container, and folder to store the training data.
- Review and create your project.
Training
Upload the files you want to use to train your model. (1) Upload at least 5 documents per catagory you wish to create. You can see the files you uploaded in the middle of the screen (2).
Use Add type button (3) to define the categories you wish to use in this model.
Then Select the document and add the corresponding category.
When documents are assigned to the correct category, you can train your model by selecting the Train button in the top right corner of the screen.
Be patient, training a model can take a little while.
You can see the status of your model by navigating to the Models tab from the menu on the left of the screen.
Testing
When the model is ready you can test it using the Test (1) option in the menu. Upload the file you want to use to test your model (2), Do not use the same files you used to train the model. Select Run analysis (3) on the file. On the right side of the screen you will see the results of the analysis. (4)
File types
At the moment of writing it is only possible to classify the following file types using the Document Intelligence Studio: .jfif .pjpeg .jpeg .pjp, .jpg .pdf .png .tif .tiff.
In API versions 2024-02-29-preview, 2023-10-31-preview, and later you can train a custom classification model on Microsoft Office documents (.docx, .xlxs and .pptx), but only using the REST API.
If you want to know more about how to do this, than read my blog about Training a custom classification model with Word or Excel Documents.