[DH] Manage Document Formats with Pandoc
Using reference document to ensure consistency in styles such as font family and font size, caption placement
Previously...
In my previous article in this series, I shared about the use of Markdown document and Pandoc to aid in note-taking and writing documentation. It's common to have to follow a standardised document format for a project requirements.
In this article, I will be sharing how to create a custom reference document to use for style management.
Getting Started
I will be using the same Markdown document - sample.md
for demonstrating the difference between using Pandoc default reference document and custom reference document.
The document contains the following elements.
Paragraphs of content (dummy text)
Images
Tables
Pandoc base reference document
To take a look at the style specifications used by Pandoc, you can retrieve the base reference style which is
reference.docx
by default.You will need to redirect the content to the file as this command only prints the content from
reference.docx
.From the image below, you can see the styles applied for the different headings, captions, tables and others.
pandoc --print-default-data-file reference.docx > pandoc-reference.docx
Sample Report
Without a reference document, this is how the Word document look like using Pandoc's base reference document.
Create Custom Reference Document
You can first create a reference document with a different name from the base reference style document.
In this case, I named mine as
custom-reference.docx
.
pandoc -o custom-reference.docx --print-default-data-file reference.docx
Manage Custom Reference Document
There are different ways to modify the styles in the reference document. The changes I would like to make are:
Change fonts to be black instead of blue for the headings
Table to include borders
To support block quotes
Copy styles from existing document
The steps below are for Mac machine. Please refer to this post for steps to execute on a Windows machine.
In the top navigation bar, go to "Format" > "Style"
When the "Style" prompt appear, click on "Organizer..."
- In the "Organizer..." prompt, you can close the "Normal.dotm" file and open the desired file to copy the styles between the selected documents.
Create/Modify styles from scratch
In general, it is recommended to only modify styles used by pandoc. The styles are:
Paragraph styles
Character styles
Table style
You can refer to Pandoc official documentation for more information [here](https://pandoc.org/MANUAL.html#option--reference-doc).
Modify Table Style
One of the more tricky changes to make is to the table style. By default, the table does not have any borders. To modify the table borders in Microsoft Word, take the following steps
Highlight the target table. The "Table Design" tab will be available in the tool bar.
Select "Modify Table Style"
In the "Modify Style" prompt, make the desired changes. Click OK
Remember to save the changes for the reference document
Convert Markdown to Word Document using Reference Document
This command includes an extra option --reference-doc
, where you specify the path to a custom reference document for Pandoc to use for styling during conversion.
# Note the --reference-doc option should be provided
# with the filepath to the reference document
pandoc sample.md -t docx+native_numbering \
--reference-doc=custom-reference.docx \
-o sample-report.docx --trace
Final Output
Here's the end result of converting the same Markdown document
Caveats
Type of Word Processor
Your mileage may vary, when you are not using the typical word processor, i.e. Microsoft Word.
Alternatives such as WPS Office, LibreOffice and Google Docs may have different ways of handling styles.
For example, WPS Office Document does not seem to have the "Table Style" equivalent. So even with the custom reference document, the table style will not be converted as intended.
Tables and Captions
If you have very huge tables, you will have to manually re-distribute the width of the columns and/or rows to make the tables look more presentable.
The annotations used to generate the table captions are specifically to indicate to Pandoc for conversion. The syntax is not be valid Markdown syntax based on your version control platform, e.g. Gitlab. So do take note if the Markdown document is intended for a wider audience viewing.
References
- While the table of figures and table of content can be generated programmatically, I feel it's more intuitive to generate them through the Microsoft Word native feature.