6. Templating

Templates provide a convenient way of separating your controller and domain logic from your presentation logic. Templates typically contain the HTML of your application, but may also be used for other formats, such as XML. Templates are often referred to as “views”, which make up part of the second component of the model–view–controller (MVC) software architecture pattern.

6.1 Working with UTF-8

This section was originally written by Alex Cabal over at PHP Best Practices and has been used as the basis for our own UTF-8 advice.

There’s no one-liner. Be careful, detailed, and consistent.

Right now PHP does not support Unicode at a low level. There are ways to ensure that UTF-8 strings are processed OK, but it’s not easy, and it requires digging in to almost all levels of the web app, from HTML to SQL to PHP. We’ll aim for a brief, practical summary.

UTF-8 at the PHP level

The basic string operations, like concatenating two strings and assigning strings to variables, don’t need anything special for UTF-8. However, most string functions, like strpos() and strlen(), do need special consideration. These functions often have an mb_* counterpart: for example, mb_strpos() and mb_strlen(). These mb_* strings are made available to you via the Multibyte String Extension, and are specifically designed to operate on Unicode strings.

You must use the mb_* functions whenever you operate on a Unicode string. For example, if you use substr() on a UTF-8 string, there’s a good chance the result will include some garbled half-characters. The correct function to use would be the multibyte counterpart, mb_substr().

The hard part is remembering to use the mb_* functions at all times. If you forget even just once, your Unicode string has a chance of being garbled during further processing.

Not all string functions have an mb_* counterpart. If there isn’t one for what you want to do, then you might be out of luck.

You should use the mb_internal_encoding() function at the top of every PHP script you write (or at the top of your global include script), and the mb_http_output() function right after it if your script is outputting to a browser. Explicitly defining the encoding of your strings in every script will save you a lot of headaches down the road.

Additionally, many PHP functions that operate on strings have an optional parameter letting you specify the character encoding. You should always explicitly indicate UTF-8 when given the option. For example, htmlentities() has an option for character encoding, and you should always specify UTF-8 if dealing with such strings. Note that as of PHP 5.4.0, UTF-8 is the default encoding for htmlentities() and htmlspecialchars().

Finally, If you are building a distributed application and cannot be certain that the mbstring extension will be enabled, then consider using the symfony/polyfill-mbstring Composer package. This will use mbstring if it is available, and fall back to non UTF-8 functions if not.

UTF-8 at the Database level

If your PHP script accesses MySQL, there’s a chance your strings could be stored as non-UTF-8 strings in the database even if you follow all of the precautions above.

To make sure your strings go from PHP to MySQL as UTF-8, make sure your database and tables are all set to the utf8mb4 character set and collation, and that you use the utf8mb4 character set in the PDO connection string. See example code below. This is critically important.

Note that you must use the utf8mb4 character set for complete UTF-8 support, not the utf8 character set! See Further Reading for why.

UTF-8 at the browser level

Use the mb_http_output() function to ensure that your PHP script outputs UTF-8 strings to your browser.

The browser will then need to be told by the HTTP response that this page should be considered as UTF-8. Today, it is common to set the character set in the HTTP response header like this:

1 <?php
2 header('Content-Type: text/html; charset=UTF-8')

The historic approach to doing that was to include the charset <meta> tag in your page’s <head> tag.

 1 <?php
 2 // Tell PHP that we're using UTF-8 strings until the end of the script
 3 mb_internal_encoding('UTF-8');
 4 $utf_set = ini_set('default_charset', 'utf-8');
 5 if (!$utf_set) {
 6     throw new Exception('could not set default_charset to utf-8, please ensure it\'s set on yo\
 7 ur system!');
 8 }
 9 
10 // Tell PHP that we'll be outputting UTF-8 to the browser
11 mb_http_output('UTF-8');
12  
13 // Our UTF-8 test string
14 $string = 'Êl síla erin lû e-govaned vîn.';
15 
16 // Transform the string in some way with a multibyte function
17 // Note how we cut the string at a non-Ascii character for demonstration purposes
18 $string = mb_substr($string, 0, 15);
19 
20 // Connect to a database to store the transformed string
21 // See the PDO example in this document for more information
22 // Note the `charset=utf8mb4` in the Data Source Name (DSN)
23 $link = new PDO(
24     'mysql:host=your-hostname;dbname=your-db;charset=utf8mb4',
25     'your-username',
26     'your-password',
27     array(
28         PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
29         PDO::ATTR_PERSISTENT => false
30     )
31 );
32 
33 // Store our transformed string as UTF-8 in our database
34 // Your DB and tables are in the utf8mb4 character set and collation, right?
35 $handle = $link->prepare('insert into ElvishSentences (Id, Body, Priority) values (default, :b\
36 ody, :priority)');
37 $handle->bindParam(':body', $string, PDO::PARAM_STR);
38 $priority = 45;
39 $handle->bindParam(':priority', $priority, PDO::PARAM_INT); // explicitly tell pdo to expect a\
40 n int
41 $handle->execute();
42 
43 // Retrieve the string we just stored to prove it was stored correctly
44 $handle = $link->prepare('select * from ElvishSentences where Id = :id');
45 $id = 7;
46 $handle->bindParam(':id', $id, PDO::PARAM_INT);
47 $handle->execute();
48 
49 // Store the result into an object that we'll output later in our HTML
50 // This object won't kill your memory because it fetches the data Just-In-Time to
51 $result = $handle->fetchAll(\PDO::FETCH_OBJ);
52 
53 // An example wrapper to allow you to escape data to html
54 function escape_to_html($dirty){
55     echo htmlspecialchars($dirty, ENT_QUOTES, 'UTF-8');
56 }
57 
58 header('Content-Type: text/html; charset=UTF-8'); // Unnecessary if your default_charset is se\
59 t to utf-8 already
60 ?><!doctype html>
61 <html>
62     <head>
63         <meta charset="UTF-8">
64         <title>UTF-8 test page</title>
65     </head>
66     <body>
67         <?php
68         foreach($result as $row){
69             escape_to_html($row->Body);  // This should correctly output our transformed UTF-8\
70  string to the browser
71         }
72         ?>
73     </body>
74 </html>

Further reading