Python File Processing: Extract Text From DOCX Using Mammoth

In this tutorial, we will use an example to show you how to extract text from a docx file using python mammoth library.

Python File Processing: Extract Text From DOCX Using Mammoth

1.Install mammoth library

pip install mammoth

2.Import library

import mammoth

3.Open a docx file

with open(input_filename, "rb") as docx_file:

4.Extract text from docx file

result = mammoth.extract_raw_text(docx_file)
text = result.value # The raw text
with open('output.txt', 'w') as text_file:
    text_file.write(text)

In this tutorial, we will useĀ mammoth.extract_raw_text() function get extract text from a docx file. Then, we will save it to output.txt file.