Validating a PDF/A document using Python
Introduction
In this tutorial, you will learn how to validate a PDF/A document using PassportPDF API and Python.
You should have your machine already set up using instructions from the getting started guide.
The importance of PDF/A validation
PDF/A is the ISO standard for the long-term archiving of electronic documents (ISO 19000). It is also a complex format with many variations.
Because of its complexity, not all PDF/A converters are the same, and converted files may contain errors.
As long-term archiving is a crucial (and legal!) need for companies and organizations, it is important to be sure that the archived files are fully compliant and do not contain errors that could prevent their usage in the future. Also, it can be necessary to verify if a PDF/A document is saved under a specific version (1, 2, 3, 4) and conformance level (a, b, u, e, f). It is why validation is necessary.
The PDF/A standard evolves with time. As it is a subset of PDF, each new PDF version release requires an update of the format:
- PDF/A-1 (a & b), ISO 19005-1:2005 - based on PDF 1.4.
- PDF/A-2 (a, b & u), ISO 19005-2:2011 - based on PDF 1.7.
- PDF/A-3 (a, b & u), ISO 19005-3:2012 - based on PDF 1.7.
- PDF/A-4 (e & f), ISO 19005-4:2020 - based on PDF 2.0.
You can read more about the features of each version and conformance level on the AvePDF Blog articles about PDF/A conversion and PDF/A-4 (the AvePDF web app is built with PassportPDF).
PDF/A, just like PDF, is built to keep backward compatibility with earlier versions. Once a file is saved as PDF/A, it is valid and considered compliant, even if newer versions are released. In most cases, you shouldn’t need to update it to the latest standard version.
However, in some contexts (like when creating hybrid invoices that require the use of PDF/A-3), it is necessary to verify and modify the version and the conformance level.
Regardless of the version and conformance level, PDF/A documents need to conform to specific rules. This is to ensure that their content will stay the same over the years, no matter the device or OS used to open them. This means that:
- All fonts must be legally embeddable for unlimited, universal rendering.
- External content references are forbidden.
- Use of standard-based metadata is required.
- Encryption is forbidden.
How to use PassportPDF API and Python to validate a PDF/A document
Now we’re going to cover how to use the PassportPDF API to check the version and conformance level of a PDF/A document. If the document has the right parameters, we don’t need to make any changes. Otherwise, we will convert it.
We will use the following endpoints:
- DocumentLoadFromURI to load a document from a URI.
- ValidatePDFA to validate the PDF/A conformance level.
- ConvertToPDFA to convert a document to PDF/A format with a chosen conformance level.
The code below illustrates how to use these endpoints to validate a PDF/A file:
""" PDF/A validation tutorial """ import requests if __name__=="__main__": endpoint = "https://passportpdfapi.com/api/document/DocumentLoadFromURI" headers = { "X-PassportPDF-API-Key" : "YOUR-PASSPORT-CODE", } data = { "URI" : "https://passportpdfapi.com/test/pdfa_file.pdf" } response = requests.post(endpoint, json=data, headers=headers) if(response.status_code == 200): json_response = response.json() file_id = json_response["FileId"] # Validate the document standard data = { "FileId" : file_id, "Conformance" : "AutoDetect" } validate_pdfa_endpoint = "https://passportpdfapi.com/api/pdf/ValidatePDFA" validate_pdfa_response = requests.post(validate_pdfa_endpoint, json=data, headers=headers) if(validate_pdfa_response.status_code == 200): json_response = validate_pdfa_response.json() is_valid = json_response["IsValid"] conformance = json_response["Conformance"] target_conformance = "PDFA3u" print("PDF/A document has conformance level : ", conformance) if(conformance == target_conformance): print(f"PDF/A document has the desired conformance level which is : {target_conformance}") else: print(f"PDF/A document does not have {target_conformance} conformance level, running conversion process...") data = { "FileId" : file_id, "Conformance" : target_conformance } convert_endpoint = "https://passportpdfapi.com/api/pdf/ConvertToPDFA" convert_response = requests.post(convert_endpoint, json=data, headers=headers) if(convert_response.status_code == 200): validate_pdfa_response = requests.post(validate_pdfa_endpoint, json=data, headers=headers) json_response = validate_pdfa_response.json() is_valid = json_response["IsValid"] conformance = json_response["Conformance"] if(conformance == target_conformance): print(f"PDF/A document now has {target_conformance} conformance level.") else: print(f"PDF/A document was not correctly converted to {target_conformance} conformance level!!") else: print("Could not convert the document.") else: print("Something went wrong when trying to validate PDFA conformance level.") # Close document close_document_endpoint = "https://passportpdfapi.com/api/document/DocumentClose" close_response = requests.post(close_document_endpoint, json={"FileId" : file_id}, headers=headers) if(close_response.status_code == 200): print("Document closed successfully.") else: print("Could not close document!") else: print("Something went wrong when trying to load the document!")
As you can see, when we first uploaded the document, we found that the PDF/A file had a PDF/A-3u conformance level. After that, we ran the conversion process using the ConvertToPDFA endpoint. We checked if the converted document had the right conformance level, and this time it did.
Final remarks
Many organizations require PDF/A standards for their archives.
PassportPDF API makes this process seamless to streamline your document conversion processes and help with compliance. We’ve shown in this tutorial how easy it is to perform such a conversion in Python language.
For more information about the endpoints used in this tutorial, please visit the PassportPDF API reference.