Dutch government: document to HTML transformation as a new baseline

Hidde de Vries & Jeroen Hulscher

AC 2025
Sophia-Antipolis, France & online
7 April 2025

Today, I want to share something about how the W3C's Web Content Accessibility Guidelines (WCAG) help us, at the Dutch government, solve a major problem that involves millions of inaccessible PDF files.

Government bodies throughout Europe have to conform to European standard EN 301 549. It's a harmonized standard used in the EU, that relies heavily on WCAG. Being part of the Dutch Ministry of Interior, we provide tools for accessibility statements and dashboards to help government bodies conform to EN 301 549. Unfortunately almost none of them succeed. The reason? PDF-documents. Millions of them. And they produce many more every day.

Obviously it would be fixed if people were just able to create accessible PDF-documents, but unfortunately that requires skills, knowledge and software that is not widely available. Especially not within government ICT.

NLdoc

Extracts all data from pdf-documents
Available as an API or HTML-download
Provides hints on how to improve to comply with WCAG

Image: Screenshot of a scanned document from 2008 about the Island of Aruba in Dutch

So, two years ago, we decided to try a different approach. The project is called NLdoc, and it's going live early June. It extracts all content from any PDF-file, including scanned documents, and creates WCAG-compliant structured documents that government bodies or their software partners can use either via an API or download as HTML. The process is based on machine learning. Not AI, because of regulations for obvious reasons.

After the processing is done, we’ll show the HTML version and provide hints about what needs attention regarding accessibility, based on various success criteria from WCAG, like 'hey this image lacks a text alternative'.

We recently developed the first version of a Wordpress Plug-in for NLdoc to show how it works together with existing software, and we will expand to other CMS and DMS platforms after June. It's open-source, free to use for government bodies and soon here to stay in terms of funding and support.

Thank you for the opportunity to share this very brief introduction, feel free to approach me for more details, or share your thoughts. Are there any questions right now?

Dutch government: document to HTML transformation as a new baseline

EN 301 549 (WCAG 2.x)

Why not accessible pdf-documents?

NLdoc

Thank you!