utf-8 displayed incorrectly
My site uses a template engine to wrap standard page HTML around templates that contain only the body text. However utf-8 multibyte characters are displayed correctly if there is no HTML header but incorrectly if the standard page header, which explicitly specifies utf-8 encoding, is presented to Firefox.
https://www.jamescobban.net/templates/Articles/FrenchOccupations.html is the template of the body text https://www.jamescobban.net/displayPage.php?template=Articles/FrenchOccupations has the HTML header and footer wrapped around it. You can look at it yourself, but to save time the header added by displayPage is:
<title> ...text extracted by DisplayPage from the h1 tag... </title> <meta charset="utf-8"> <meta http-equiv="default-style" content="text/css"> <meta name="author" content="James A. Cobban"> <meta name="copyright" content="© 2018 James A. Cobban"> <meta name="keywords" content="genealogy, family, tree, ontario, canada "> <link rel="stylesheet" type="text/css" href="/styles.css">
Ñemoĩporã poravopyre
Let's be very clear about the situation. Firefox renders this page --
https://www.jamescobban.net/displayPage.php?template=Articles/FrenchOccupations
-- as directed by the server. There is no bug there.
However, this page --
https://www.jamescobban.net/templates/Articles/FrenchOccupations.html
-- does not use a DOCTYPE Declaration on the first line, and therefore renders in Quirks mode. Please understand that Quirks mode is not going to be updated because its entire purpose is backwards compatibility with poor web design practices of decades past. See: https://developer.mozilla.org/docs/Web/HTML/Quirks_Mode_and_Standards_Mode
All that actually is not relevant to you because you would never serve that raw HTML fragment to a visitor under normal circumstances, would you? Presumably you can redirect a request for anything in the templates directory to the proper page.
Emoñe’ẽ ko mbohavái ejeregua reheve 👍 0Opaite Mbohovái (7)
When Firefox loads this page, it uses windows-1252 encoding:
https://www.jamescobban.net/templates/Articles/FrenchOccupations.html
If I use the classic menu bar (tap Alt to display)
View > Text Encoding > Unicode
then I see the same problem as the other page.
I'm not sure how you author your templates, but can you check the encoding used by your editor and see whether you can re-save the file as UTF-8?
(Note: I see the same issue in Chrome for Windows.)
Thank you. I was using VIM. I explicitly set the encoding and fileencoding to utf-8 and it corrected the problem.
By the way why would the default text encoding for an html file, in the absence of an explicit specification in the head, not be utf-8? And since it is not I cannot find anywhere in about:config where I can fix that.
There is a fallback setting in Options/Preferences, but that is about 8-bit encoding and doesn't support Unicode.
- Options/Preferences -> Content -> Fonts & Colors -> Advanced -> Character Encoding for Legacy Content
You can setup the server to send files as Unicode (utf-8) encoding as that will always prevail.
The template (first of your two links) doesn't have a DOCTYPE declaration, so it renders in Quirks Mode. Quirks Mode is not standardized, and may use behaviors from the 1990s such as using the OS default for character encoding.
All of the standards state that the preferred and default character encoding for html documents is utf-8. So why does Firefox implement a proprietary encoding belonging to a single manufacturer, especially when displaying documents on systems for which Windows is a swear word. Even if you think that this default is in the best interests of your customers, why would you not permit your customers to make their own decision of what the default encoding is?
I have created a .htaccess on my development site to explicitly set what should have been the server default in the first place, but FileZilla doesn't show the .htaccess on the local side and the only documentation I can find is how to get it to display hidden files on the server side.
"using the OS default for character encoding." But I am running Ubuntu and there is no way that the Ubuntu default character encoding is a Microsoft proprietary code page!
Please fix the broken default.
Moambuepyre
Ñemoĩporã poravopyre
Let's be very clear about the situation. Firefox renders this page --
https://www.jamescobban.net/displayPage.php?template=Articles/FrenchOccupations
-- as directed by the server. There is no bug there.
However, this page --
https://www.jamescobban.net/templates/Articles/FrenchOccupations.html
-- does not use a DOCTYPE Declaration on the first line, and therefore renders in Quirks mode. Please understand that Quirks mode is not going to be updated because its entire purpose is backwards compatibility with poor web design practices of decades past. See: https://developer.mozilla.org/docs/Web/HTML/Quirks_Mode_and_Standards_Mode
All that actually is not relevant to you because you would never serve that raw HTML fragment to a visitor under normal circumstances, would you? Presumably you can redirect a request for anything in the templates directory to the proper page.