Differences between revisions 6 and 17 (spanning 11 versions)
Revision 6 as of 2007-05-28 14:22:20
Size: 1449
Editor: dslb-084-058-237-229
Comment:
Revision 17 as of 2007-08-10 23:47:23
Size: 2042
Editor: vpn-113
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= About character encoding = [wiki:Self:Installation Installation Guide]
Line 3: Line 3:
In general we use the multibyte character encoding UTF-8 as default encoding with the follwing consequences: = About character encoding (28May07 Markus) =
Line 5: Line 5:
- The $AC->settings->encoding is 'utf-8' unless overriden in autocomplete_config.php
- The text.php is saved as UTF-8
- The css file uses '@charset "utf-8";'
CompletionSearch supports ISO-8859-1 and the multibyte character encoding UTF-8.
UTF-8 is the default encoding with the following consequences:
Line 9: Line 8:
- $AC->settings->capitals is utf-8 encoded
- In ajax.php we utf-8 encode the query string if $AC->settings->encoding is utf-8 and the charset of content_type is not utf-8 (means the request is sent in a non-utf-8 type)
 * The $AC->settings->encoding is 'utf-8' unless overriden in autocomplete_config.php
 * The texts in text.php are saved as UTF-8
 * The css file uses '@charset "utf-8";'
 * We use mb_strtolower (instead of strtolower) with parameter $AC->settings->encoding to enable UTF-8
Line 12: Line 13:
We have to support other encodings like iso-8859-1 because some collections are not utf-8 encoded.
The default encoding can be overriden by $config->encoding in the autocomplete_config.php.
We do the following depending on the defined encoding:
 * We UTF-8 encode $AC->settings->capitals if $AC->settings->encoding is UTF-8
 * In ajax.php we UTF-8 encode the query string if $AC->settings->encoding is UTF-8 and the charset of content_type is not UTF-8 (means the request is sent as a non-UTF-8 type)
 * We set the page encoding of index.php, options.php and change_options.php according to $AC->settings->encoding (<meta http-equiv="content-type" content="text/html;charset=<?php echo $AC->settings->encoding; ?>">)
 * Texts from text.php are UTF-8 decoded by $AC->get_text() if $AC->settings->encoding is ISO-8859-1
 * We url encode the javascript code in function javascript_rhs (in generate_javascript.php) if $AC->settings->encoding is not UTF-8 (this is not necessary if utf-8 is used)
Line 15: Line 20:
To handle non-utf-8 encoding we do the following:
- the page encoding of index.php is determined by $AC->settings->encoding
( <meta http-equiv="content-type" content="text/html;charset=<?php echo $AC->settings->encoding; ?>">
)
- Texts from text.php are utf-8 decoded by $AC->get_text()
=== Note: The form attribute accept-charset ===
If the form attribute accept-charset is set to "UTF-8" the form variables are UTF-8 encoded before sent to server (even if the page encoding is not UTF-8).
Line 21: Line 23:
== The PHP Apache extension php_mbstring ==
Line 22: Line 25:

== UTF-8 lowercase in PHP (23May07 Markus) ==

Requires extension mbstring (for functions like mb_strtolower). Following line required in php.ini
The use of the mb_strtolower function (and other mb_ functions) requires the extension php_mbstring in php.ini:
Line 35: Line 35:
(On geek, the mb_... functions were available by default, on Markus's laptop the line above had to be added.) (On geek, the mb_... functions were available by default, on Markus' laptop the line above had to be added.)
Line 37: Line 37:

== Texts in text.php are now UTF-8 encoded (23May07 Markus) ==
If this is the first extension you use be sure to have specified the location of the extension with the extension_dir directive.

[wiki:Installation Installation Guide]

About character encoding (28May07 Markus)

CompletionSearch supports ISO-8859-1 and the multibyte character encoding UTF-8. UTF-8 is the default encoding with the following consequences:

  • The $AC->settings->encoding is 'utf-8' unless overriden in autocomplete_config.php

  • The texts in text.php are saved as UTF-8
  • The css file uses '@charset "utf-8";'
  • We use mb_strtolower (instead of strtolower) with parameter $AC->settings->encoding to enable UTF-8

We do the following depending on the defined encoding:

  • We UTF-8 encode $AC->settings->capitals if $AC->settings->encoding is UTF-8

  • In ajax.php we UTF-8 encode the query string if $AC->settings->encoding is UTF-8 and the charset of content_type is not UTF-8 (means the request is sent as a non-UTF-8 type)

  • We set the page encoding of index.php, options.php and change_options.php according to $AC->settings->encoding (<meta http-equiv="content-type" content="text/html;charset=<?php echo $AC->settings->encoding; ?>">)

  • Texts from text.php are UTF-8 decoded by $AC->get_text() if $AC->settings->encoding is ISO-8859-1

  • We url encode the javascript code in function javascript_rhs (in generate_javascript.php) if $AC->settings->encoding is not UTF-8 (this is not necessary if utf-8 is used)

Note: The form attribute accept-charset

If the form attribute accept-charset is set to "UTF-8" the form variables are UTF-8 encoded before sent to server (even if the page encoding is not UTF-8).

The PHP Apache extension php_mbstring

The use of the mb_strtolower function (and other mb_ functions) requires the extension php_mbstring in php.ini:

In windows:
extension=php_mbstring.dll

or in linux:
extension=php_mbstring.so

(On geek, the mb_... functions were available by default, on Markus' laptop the line above had to be added.)

If this is the first extension you use be sure to have specified the location of the extension with the extension_dir directive.

CompleteSearch: FrontPage (last edited 2017-03-19 13:30:19 by Hannah Bast)