Creative Commons License Chooser Output vs. W3C Validator: Prospective CC License User Loses
Summary (tl;dr)
Creative Commons (CC) claims their License Chooser’s RDFa expression of the user’s chosen CC license in machine-readable form is valid code for HTML 4 and 5, but it uses XML constructs which only validate for XHTML documents. I had to give up several days of my life diving deep, deep into the weeds to figure out for myself the proper syntax to be valid ccREL (the CC machine-readable language standard) AND validate as valid HTML 5 on the World Wide Web Consortium validator. Presently i can find no way to get valid ccREL RDFa machine-readable licensing code to validate as HTML 4.01 Transitional—the least-critical version of HTML 4.Trying To Do The Best Thing
Good Intentions
I wanted to be a good netizen and citizen of the world—i truly did! Having long vaguely known about less-restrictive forms of licensing of intellectual property than the traditional anglosphere (and particularly U.S.) concept of copyright, and having often felt slight twinges of discomfort using the traditional All Rights Reserved clause on my website’s pages, i felt motivated to learn about other options. Having often seen CC-BY licenses, i was drawn to Creative Commons.
It should have been easy:
- Go to the Creative Commons site and read to learn about their less-restrictive licensing options
- Use their handy License Chooser to help select the best CC license match for my needs
- Let the License Chooser generate copy/paste-ready text, including machine-readable code using their ccREL Creative Commons Rights Expression Language expressed in RDFa format
- Copy and paste the License Chooser’s output into an appropriate area of my first CC-licensed web page
- Validate the finished page using the trusty W3C Validator
- Publish the new page publicly and be happy about encouraging others to copy, remix, etc.!
Blocked
Everything came to a grinding halt at Step 5: The CC-generated license code failed W3C validation.
How could this be?! Creative Commons is all about establishing easy-to-implement easy-to-parse standards. They are using RDFa, which is part of the RDF W3C standard. The W3C Validator has long been a reference for verifying proper syntax of HTML documents. What is wrong with this picture?
At the time (early 2020s, a few years before this article was first published), i found no way out of the dilemma. I used the CC license code as the Chooser gave it to me—other than applying empty alt=""
tags to each icon to avoid several wholly unnecessary errors, with a disclaimer at the page bottom that the page would validate if not for the errant CC license code.
I did contact Creative Commons at that time, explaining that i really wanted to re-license a majority of my site’s content with one or several flavors of CC licenses, but that i was unwilling to go to the great effort of rewriting my site as XHTML just to get the CC code to validate, and equally unwilling to accept that my site’s pages would no longer validate even though real-world i was assured they’d work OK. A CC representative did respond—which i appreciate—but the response was unhelpful and the issue unresolved.
In summary:- CC insists their license code is valid, and fully workable as-is in HTML 4 and 5 documents
- The W3C Validator insists that the CC license code is invalid in HTML 4 and HTML 5 DTD documents
What is the point of standards and validation procedures to ensure standards compliance if the validation is supposed to be ignored?
Time Lost: Making the User Fix Things
I shouldn’t have had to set aside days of my life to go deep into the nitty-gritty niggly details of the Creative Commons Rights Expression Language and RDFa, but that’s what i had to do if there was ever going to be a solution.
The unsolvable validation errors revolved around the following vocabulary definition declarations at the start of the CC license code:
<p xmlns:cc="http://creativecommons.org/ns#" xmlns:dct="http://purl.org/dc/terms/">
I’ll spare you all the detours and dead ends that ate up so much of my time. Eventually i figured out that the following is equally legitimate for vocabulary definition declarations, and satisfies the W3C Validator for HTML 5 (and i wanted a <div>
rather than a <p>
):
<div prefix="dct: http://purl.org/dc/terms/ cc: http://creativecommons.org/ns#" class="license-text">
The class="license-text"
isn’t essential, but is a nice add-on. I doubt that the order of the declarations matters. What mattered to the W3C validator was using prefix="dct: http:
etc. syntax rather than xmlns:cc="http:
etc. syntax, and having both declarations in one prefix attribute rather than two separate serial prefix attributes.
Why can’t the CC License Chooser output its code snippets formatted this way? I don’t know. From my reading, i’m not seeing any downsides to this syntax alteration.
Still Not As Promised
It would be really, really nice and really, really useful to me to have my many existing valid HTML 4.01 Transitional pages continue to validate with some machine-readable CC license code. Sadly, this does not appear to be an option. Maybe it’s because the W3C has better things to do than update its older, stable validator to correctly identify CC’s RDFa, and the Nu validator only does HTML 5? I don’t know. I do know that for me it’s unnecessary friction for redoing the licensing of my many legacy pages, some going back to 1996. And that i don’t want to just go along with someone’s opinion of “Oh, it’s actually OK in practice, no matter what the validator says”.
Why retain HTML 4 in the 2020s and later, when there are newer alternatives? Ignoring the added work to convert, using HTML 4.01 when no newer features are required allows a bevy of older WWW browser software, which may not have implemented HTML 5 or may not have done so per current standards, to properly render the pages. Least common denominator: use the oldest standard which covers the needs of the given page and those who access it, for greatest availability including to those who do not have the latest and greatest tech hardware and software. More welcoming, less privileged elitist.
If we can’t have working validators and easily-accessible code snippets which validate (when put into an otherwise conformant HTML 4 document), what’s the point? Why try? Should we just all go around doing whatever the f*ck we want, and let the WWW browser makers continue to have an ever-harder job cleaning up random syntax messes?
I’m much happier now finally being able to present HTML 5-formatted web pages which are CC licensed, like this one. I’m quite unhappy with how much work i had to do to get to this point. I’m sullen that for my legacy pages i’m forced to choose between updating each one to HTML 5, else slapping in the CC license, removing any claim that the page validates (or leave a claim with a caveat about “other than the CC license code”, which is how i’ve handled it in the past) and hoping that the WWW browsers, crawlers, and other software that interacts with my site can figure things out.
Please, Creative Commons: fix your License Chooser to output license code which properly validates in both HTML 4 and HTML 5.
Please, W3C or WhatWG or some other entity responsible and trusted for HTML, CSS, and related standards: kindly provide us a validator for both HTML 4 and 5 (and ideally other related WWW page standards that others may prefer) which understands at least the parts of RDFa that CC is using for their licenses, and gives us all correct answers about what is and is not valid.