Passport Loader

Splitting a multipage PDF and extracting specific pages

Introduction

In this tutorial, you will learn how to split a multipage PDF file into single pages documents. You will be doing this using PassportPDF and C#.

You should have your machine already set up using instructions from our getting started guide.

Splitting PDF files with PassportPDF

Now, we’ll see how to use PassportPDF to split a multipage PDF document into multiple single-page documents. Then we’ll show how to download these documents and save them on a local disk.

We will use the following endpoints from PassportPDF:

  • DocumentLoadFromURIAsync to load a document from a URI.
  • ExtractPageAsync to extract single pages from the PDF document.
  • SaveDocumentAsync to save every single page as a separate document.

These endpoints are demonstrated in the following example:

using PassportPDF.Api;
using PassportPDF.Client;
using PassportPDF.Model;

namespace PDFSplitter
{

    public class PDFSplitter
    {
        static async Task Main(string[] args)
        {
            GlobalConfiguration.ApiKey = "YOUR-PASSPORT-CODE";

            PassportManagerApi apiManager = new();
            PassportPDFPassport passportData = await apiManager.PassportManagerGetPassportInfoAsync(GlobalConfiguration.ApiKey);

            if (passportData == null)
            {
                throw new ApiException("The Passport number given is invalid, please set a valid passport number and try again.");
            }
            else if (passportData.IsActive is false)
            {
                throw new ApiException("The Passport number given not active, please go to your PassportPDF dashboard and active your plan.");
            }

            string uri = "https://passportpdfapi.com/test/multiple_pages.pdf";

            DocumentApi api = new();

            Console.WriteLine("Loading document into PassportPDF...");
            DocumentLoadResponse document = await api.DocumentLoadFromURIAsync(new LoadDocumentFromURIParameters(uri));
            Console.WriteLine("Document loaded.");

            PDFApi pdfApi = new();

            Console.WriteLine("Splitting PDF into single-page documents..");

            string pagesToExtract = "*";

            PdfExtractPageResponse pdfExtractPageResponse = await pdfApi.ExtractPageAsync(new PdfExtractPageParameters(document.FileId, pagesToExtract)
            {
                ExtractAsSeparate = true
            });

            if (pdfExtractPageResponse.Error is not null)
            {
                throw new ApiException(pdfExtractPageResponse.Error.ExtResultMessage);
            }
            else
            {
                Console.WriteLine("Splitting PDF document is done.");
            }

            // Download every page as a separate document
            Console.WriteLine("Downloading single page documents..");
            try
            {
                for(int i=0; i< pdfExtractPageResponse.FileIds.Count; i++)
                {
                    var pageId = pdfExtractPageResponse.FileIds[i];

                    PdfSaveDocumentResponse savePageResponse = await pdfApi.SaveDocumentAsync(new PdfSaveDocumentParameters(pageId));

                    string savePath = Path.Join(Directory.GetCurrentDirectory(), "extracted_page_" + (i+1).ToString() + ".pdf");

                    File.WriteAllBytes(savePath, savePageResponse.Data);

                    Console.WriteLine("Done downloading extracted page. Document has been saved in : {0}", savePath);
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine("Could not download single pages! : {0}", ex.Message);
            }

        }
    }
}

For the complete .NET project, please visit the github repo of this tutorial.

Some important points to notice in the code:

  • When making a request to the ExtractPageAsync endpoint, you need to set the parameter ExtractAsSeparate to “true.” This will create different “file IDs” for each page, allowing you to save each page separately.

After running the sample code above, you should have 4 PDF files saved on your local machine. Each file contains one page from the original PDF document.

Extracting specific pages from the document

If you would like to extract some specific pages from your PDF document, then you can do so by changing the PageRange argument value in the PdfExtractPageParameters to include the numbers of the specific pages that you would like to extract.

For example, if you want to extract only page 2, page 3, and page 4 from the previous document, then in the previous code, you need to change this part:

string pagesToExtract = "*";

To this:

string pagesToExtract = "2,3,4";

Final remarks

If you don’t set the parameter ExtractAsSeparate to true, then no IDs will be generated for each page, and you will get only one ID representing the whole document. This means that you will download the same multipage document you started with.

For more information regarding the use of this endpoint, as well as other PassportPDF endpoints, please visit the API reference page.