I was looking for a server side way of converting the doc file into docx or pdf format using the python programming language without the use of win32.client, comtypes and API. Iam using it on Azure cloud services. So if there is any other way please help!
Asked
Active
Viewed 1,372 times
1
S.R Rahul
- 23
- 5
-
What do you mean for "Azure cloud services" do you have SSH access to it? Or is a Cloud based app like Google Docs? Or is that a VM running Linux/Windows? Because there are many approach to it – Francesco Mantovani Mar 11 '20 at 10:24
-
Thanks for reply !! It is VM running on Linux – S.R Rahul Mar 12 '20 at 18:26
-
Last question: is that VM running Windows 10 or Linux? – Francesco Mantovani Mar 12 '20 at 21:14
-
VM running Linux. – S.R Rahul Mar 16 '20 at 07:28
-
If you tell me why you cannot install 'win32com.client' I can give you more more advices – Francesco Mantovani Mar 17 '20 at 08:35
2 Answers
2
There are few approaches:
- with unoconv:
unoconv -d document --format=docx test.doc - with lowriter:
lowriter --convert-to docx test.doc - with soffice:
soffice --headless --convert-to docx test.doc - with libreoffice:
libreoffice --convert-to docx test.doc
You can run these command directly from your terminal but if you want you can integrated them into python as described here:
#!/usr/bin/env python
import glob
import subprocess
for doc in glob.iglob("*.doc"):
subprocess.call(['soffice', '--headless', '--convert-to', 'docx', doc])
In the example I'm using soffice but you can now substitute unoconv, lowriter or libreoffice.
Francesco Mantovani
- 10,216
- 13
- 73
- 113
-1
Need LibreOffice
import os
import tempfile
def doc2docx(content):
""" Convert .doc to .docx with LibreOffice """
with tempfile.TemporaryDirectory() as tmpdirname:
filename = os.path.join(tmpdirname, 'filename.doc')
with open(filename, 'wb') as doc:
doc.write(content)
os.system(f'soffice --headless --convert-to docx { filename } --outdir { tmpdirname }')
filename += 'x'
with open(filename, 'rb') as docx:
content = docx.read()
return content
with open('test.doc', 'rb') as f:
content = f.read()
content = doc2docx(content)
xmduhan
- 965
- 12
- 14