Extracting barcode information from PDF documents
Introduction
In this tutorial we will look into how to use PassportPDF to extract information that’s been embedded in a barcode.
You should have your machine already set up using instructions from the previous article. You should also have requests_toolbelt installed.
Extracting barcode information from PDF documents
In the following tutorial, we are going to :
- Load a document from a URI using the api/document/DocumentLoadFromURI endpoint.
- Extract barcode information using the api/pdf/ReadBarcodes endpoint.
- Close the document using the api/document/DocumentClose endpoint.
These steps are shown in the code below :
import requests if __name__=="__main__": endpoint = "https://passportpdfapi.com/api/document/DocumentLoadFromURI" headers = { "X-PassportPDF-API-Key" : "YOUR-PASSPORT-CODE", } data = { "URI" : "https://passportpdfapi.com/test/invoice_with_barcode.pdf" } response = requests.post(endpoint, json=data, headers=headers) if(response.status_code == 200): json_response = response.json() file_id = json_response["FileId"] data = { "FileId" : file_id, "PageRange" : "1" } # Extract barcode read_barcodes_endpoint = "https://passportpdfapi.com/api/pdf/ReadBarcodes" barcodes_response = requests.post(read_barcodes_endpoint, json=data, headers=headers) if(barcodes_response.status_code == 200): print(barcodes_response.json()) else: print("Something went wrong when trying to read barcodes!") # Close document close_document_endpoint = "https://passportpdfapi.com/api/document/DocumentClose" close_response = requests.post(close_document_endpoint, json={"FileId" : file_id}, headers=headers) if(close_response.status_code == 200): print("Document closed successfully.") else: print("Could not close document!") else: print("Something went wrong!")
When you perform the first step, which is to pass the document as a request to the API, you will get in the response a field called “FileId”. This file ID will allow you to perform all sorts of operations on your document. You just need to use this file ID as part of your subsequent requests, as shown in the code.
The document that we passed to the API looks like this :
As you can see, there is a barcode on the invoice. It contains a message that we extracted using the code above.
Now, let’s take a look at the response we received.
As you can see, we get a lot of information about the barcode. There is the type of the barcode, its symbology, the coordinates of the barcode as well as the message or data that’s embedded in the barcode. In this case the data is : https://passportpdf.com/
The same with any other endpoint in the PassportPDF API, you will get the number of remaining tokens as part of the response.
Final remarks
Using the barcodes reader endpoint from PassportPDF, you can read all sorts of barcodes including:
- Linear barcodes.
- QR codes.
- Micro QR codes.
- Data matrix.
- PDF417.
- Aztec.
- Maxi code.
If you combine this tutorial with the previous one, you can imagine the endless possibilities that PassportPDF allows you to perform. You can literally get all the information you need from a scanned form-like document such as invoices, purchase orders, receipts, …etc.