Flourish PHP Unframework

Why force use of fUTF8?

posted by marcus 4 years ago

Hello.

I have been looking at your library for some time now and I've found it very interesting. However, I am wondering why you are forcing us to use fUTF8 even if I'm really not interested of using that part of the library.

I got the impression that I could pick and choose the classes I want from the library but this doesn't seem to be true? I thought one of the ideas was to get away from the close coupling that the frameworks use.

Kindly, Marcus.

Reply

Hi Marcus,

I apologize I didn't get a chance to respond sooner, I just got back from vacation yesterday.

I decided to hard-code UTF-8 into Flourish for a few reasons. One of the most important reasons is that most websites will run into "extended" characters very quickly when dealing with curly quotes. By default HTML is set to use ISO-8859-1 for all encoding, however this character set does not include curly quotes, just single and double primes. If any content is pasted into an input that includes curly quotes, the browser is going to have to figure out what the best solution is. Some send it as UTF-8 multi-bytes sequences and some will use Windows-1252 encoded which is compatible with ISO-8859-1. Basically there is no defined behavior the browsers need to implement when users input characters that don't fit into the character set. This leads to data corruption, which is obviously not a good thing.

As you can see, a character set that includes support beyond the characters of ISO-8859-1 is very important. UTF-8 is pretty much the universal option across almost all software. UTF-8 can handle almost every character in every language and will realistically never cause you any encoding problems. All of the databases, save SQL Server, natively support UTF-8. All XML parsers support UTF-8. All browsers support UTF-8. Email clients also support UTF-8 with the appropriate mime-type headers and encoding. JSON uses UTF-8.

Now, it would be possible to support other character sets with some of the different end-points for Flourish, however I wanted to provide a solution for everything without having to worry about translating encodings and specifying encodings through the end-developer code. In addition, I was able to write UTF-8 versions of all of the PHP string functions in fUTF8, even if the mbstring extension is not installed. Trying to do this for multiple encodings would increase the size of Flourish and would have taken quite a bit more time. In the end I don't support anything other than UTF-8 because the costs outweigh the benefits.

If you aren't very familiar with UTF-8, I highly recommend checking out a few of these links for more information. There is a good reason why the majority of software out there supports UTF-8. Certainly there are situations where UTF-8 is less desirable, but everything has its trade-offs.

posted by wbond 4 years ago Reply
In reply to post by wbond from 4 years ago
In reply to original post by marcus