Extracting text/html out Word (.docx) files

November 3, 2018
Been a ColdFusion Developer since 1996
Master 6 posts
Followers: 2 people
Web Developer
4

Extracting text/html out Word (.docx) files

Been a ColdFusion Developer since 1996
Master 6 posts
Followers: 2 people
November 3, 2018

Repositories

https://github.com/jmohler1970/WordExtractor

https://github.com/jmohler1970/WordExtractor_demo

Introduction

We are going to be extracting out HTML from a Word (.docx) file.

.docx is an example of an Open Document Format for Office Applications (ODF) file. It is a ZIP of an XML document.
By unzipping the file and locating the appropriate XML file, we can process the data an generate HTML
Comments (4)
2018-11-07 21:13:37
2018-11-07 21:13:37

Excellent content James.  This was really well done.

Like
(1)
(1)
>
David Byers
's comment
2018-11-07 21:58:03
2018-11-07 21:58:03
>
David Byers
's comment

Glad you liked it!

Like
2018-11-05 18:38:01
2018-11-05 18:38:01

Well done, James! I really enjoyed this walk through. Been a loooooOOOOooooong time since I’ve seen any great CF tutorials. And I think I can use this. I’ve been wanting to convert my word documents into markdown files. Thanks for the head start!

BTW: I had no idea docx files were really just zip files. (mind-blown)

Like
(1)
(1)
>
chrisg57685480
's comment
2018-11-06 04:52:25
2018-11-06 04:52:25
>
chrisg57685480
's comment

Glad you liked it!

Like
Add your comment