Differences between revisions 6 and 15 (spanning 9 versions)
Revision 6 as of 2007-05-28 14:22:20
Size: 1449
Editor: dslb-084-058-237-229
Comment:
Revision 15 as of 2007-06-04 14:43:59
Size: 1993
Editor: guest-154
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= About character encoding = = About character encoding (28May07 Markus) =
Line 3: Line 3:
In general we use the multibyte character encoding UTF-8 as default encoding with the follwing consequences: CompletionSearch supports ISO-8859-1 and the multibyte character encoding UTF-8.
UTF-8
is the default encoding with the following consequences:
Line 5: Line 6:
- The $AC->settings->encoding is 'utf-8' unless overriden in autocomplete_config.php
- The text.php is saved as UTF-8
- The css file uses '@charset "utf-8";'
 * The $AC->settings->encoding is 'utf-8' unless overriden in autocomplete_config.php
 * The texts in text.php are saved as UTF-8
 * The css file uses '@charset "utf-8";'
 * We use mb_strtolower (instead of strtolower) with parameter $AC->settings->encoding to enable UTF-8
Line 9: Line 11:
- $AC->settings->capitals is utf-8 encoded
- In ajax.php we utf-8 encode the query string if $AC->settings->encoding is utf-8 and the charset of content_type is not utf-8 (means the request is sent in a non-utf-8 type)
We do the following depending on the defined encoding:
 * We UTF-8 encode $AC->settings->capitals if $AC->settings->encoding is UTF-8
 * In ajax.php we UTF-8 encode the query string if $AC->settings->encoding is UTF-8 and the charset of content_type is not UTF-8 (means the request is sent as a non-UTF-8 type)
 * We set the page encoding of index.php, options.php and change_options.php according to $AC->settings->encoding (<meta http-equiv="content-type" content="text/html;charset=<?php echo $AC->settings->encoding; ?>">)
 * Texts from text.php are UTF-8 decoded by $AC->get_text() if $AC->settings->encoding is ISO-8859-1
 * We url encode the javascript code in function javascript_rhs (in generate_javascript.php) if $AC->settings->encoding is not UTF-8 (this is not necessary if utf-8 is used)
Line 12: Line 18:
We have to support other encodings like iso-8859-1 because some collections are not utf-8 encoded.
The default encoding can be overriden by $config->encoding in the autocomplete_config.php.
=== Note: The form attribute accept-charset ===
If the form attribute accept-charset is set to "UTF-8" the form variables are UTF-8 encoded before sent to server (even if the page encoding is not UTF-8).
Line 15: Line 21:
To handle non-utf-8 encoding we do the following:
- the page encoding of index.php is determined by $AC->settings->encoding
( <meta http-equiv="content-type" content="text/html;charset=<?php echo $AC->settings->encoding; ?>">
)
- Texts from text.php are utf-8 decoded by $AC->get_text()
== The PHP Apache extension php_mbstring ==
Line 21: Line 23:


== UTF-8 lowercase in PHP (23May07 Markus) ==

Requires extension mbstring (for functions like mb_strtolower). Following line required in php.ini
The use of the mb_strtolower function (and other mb_ functions) requires the extension php_mbstring in php.ini:
Line 35: Line 33:
(On geek, the mb_... functions were available by default, on Markus's laptop the line above had to be added.)


== T
exts in text.php are now UTF-8 encoded (23May07 Markus) ==
(On geek, the mb_... functions were available by default, on Markus' laptop the line above had to be added.)
If this is the first extension you use be sure to have specified the location of the extension with the extension_dir directive.

About character encoding (28May07 Markus)

CompletionSearch supports ISO-8859-1 and the multibyte character encoding UTF-8. UTF-8 is the default encoding with the following consequences:

  • The $AC->settings->encoding is 'utf-8' unless overriden in autocomplete_config.php

  • The texts in text.php are saved as UTF-8
  • The css file uses '@charset "utf-8";'
  • We use mb_strtolower (instead of strtolower) with parameter $AC->settings->encoding to enable UTF-8

We do the following depending on the defined encoding:

  • We UTF-8 encode $AC->settings->capitals if $AC->settings->encoding is UTF-8

  • In ajax.php we UTF-8 encode the query string if $AC->settings->encoding is UTF-8 and the charset of content_type is not UTF-8 (means the request is sent as a non-UTF-8 type)

  • We set the page encoding of index.php, options.php and change_options.php according to $AC->settings->encoding (<meta http-equiv="content-type" content="text/html;charset=<?php echo $AC->settings->encoding; ?>">)

  • Texts from text.php are UTF-8 decoded by $AC->get_text() if $AC->settings->encoding is ISO-8859-1

  • We url encode the javascript code in function javascript_rhs (in generate_javascript.php) if $AC->settings->encoding is not UTF-8 (this is not necessary if utf-8 is used)

Note: The form attribute accept-charset

If the form attribute accept-charset is set to "UTF-8" the form variables are UTF-8 encoded before sent to server (even if the page encoding is not UTF-8).

The PHP Apache extension php_mbstring

The use of the mb_strtolower function (and other mb_ functions) requires the extension php_mbstring in php.ini:

In windows:
extension=php_mbstring.dll

or in linux:
extension=php_mbstring.so

(On geek, the mb_... functions were available by default, on Markus' laptop the line above had to be added.) If this is the first extension you use be sure to have specified the location of the extension with the extension_dir directive.

CompleteSearch: FrontPage (last edited 2017-03-19 13:30:19 by Hannah Bast)