Extract Text From a PDF Using Python pdftotext

In this tutorial, we will introcude a simple way to extract text from a pdf file in python, we will use python pdftotext library to implement it.

1. Instal pdftotext

pip install pdftotext

2. Import library

import pdftotext

3. Read a pdf file

pdf_file = open("test.pdf" , "rb")

4. Extract text from a pdf file

gvj_pdf = pdftotext.PDF(pdf_file)

5. Print the text in pdf

for i in gvj_pdf: # iterating every page in pdf
  print(i)

6. Close pdf file

pdf_file.close()