How to export documentation

Pandoc

Using GitBook, all our documentation is in markdown format. When we want to export it in another formar we use Pandoc, a command line tool which allows to convert from one format to another. On their website there is a guide on how to install it and how to use it.

Pandoc allows to convert from markdown to many formats but currently we are using markdown to docx.

Lua filters

There is a problem with images because in markdown they can be in HTML format, GitBook also inserts images in HTML, but Pandoc is not able to parse HTML images inside markdown files. So we have created a Lua filter to parse those HTML images into a valid format for Pandoc to be able to transform them into the desired format.

function convertHtmlImageToMarkdownImage (elem)
    if elem.tag == "RawInline" and elem.format == "html" and elem.text:match '%<img' then
        local imgHtml = elem.text
        local alt = imgHtml:match('alt="(.-)"') or ""
        local src = imgHtml:match('src="(.-)"') or ""
        local title = imgHtml:match('title="(.-)"') or ""
        return pandoc.Image({}, src, title, alt)
    end
end
return {
    {RawInline = convertHtmlImageToMarkdownImage}
}
463B
htmlImageParser.lua

Lua filters can be used specifying the route of the file when using the convert command.

pandoc --lua-filter ./htmlImageParser.lua

Example

With all the files in your local, just open a terminal and use the following command:

pandoc example.md --file-scope=true --lua-filter=htmlImageParser.lua -o exampleOutput.docx
  • example.md: This is the original file we want to convert. If you want to include more than one file in the final document just write them separated by spaces.

  • --file-scope: It is a parameter we use for the annotations and footnotes to be included in the same page. Check Pandoc parameters documentation.

  • -lua-filter: As mentioned on the previous section, this allows us to execute some code to transform the images into a valid format.

  • -o exampleOutput.docx: The name of the generated file. Since we are not specifying the output format, it takes it from the file extension.

Last updated