RTEMS 6.1-rc4
|
Please read the LICENSE file, which is shipping with this software.
*** QUICK START ***
For compilation of the C library call "make c-library", for compilation of the ruby library call "make ruby-library" and for compilation of the PostgreSQL extension call "make pgsql-library".
For ruby you can also create a gem-file by calling "make ruby-gem".
"make all" can be used to build everything, but both ruby and PostgreSQL installations are required in this case.
*** GENERAL INFORMATION ***
The C library is found in this directory after successful compilation and is named "libutf8proc.a" and "libutf8proc.so". The ruby library consists of the files "utf8proc.rb" and "utf8proc_native.so", which are found in the subdirectory "ruby/". If you chose to create a gem-file it is placed in the "ruby/gem" directory. The PostgreSQL extension is named "utf8proc_pgsql.so" and resides in the "pgsql/" directory.
Both the ruby library and the PostgreSQL extension are built as stand-alone libraries and are therefore not dependent the dynamic version of the C library files, but this behaviour might change in future releases.
The Unicode version being supported is 5.0.0. Note: Version 4.1.0 of Unicode Standard Annex #29 was used, as version 5.0.0 had not been available at the time of implementation.
For Unicode normalizations, the following options have to be used: Normalization Form C: STABLE, COMPOSE Normalization Form D: STABLE, DECOMPOSE Normalization Form KC: STABLE, COMPOSE, COMPAT Normalization Form KD: STABLE, DECOMPOSE, COMPAT
*** C LIBRARY ***
The documentation for the C library is found in the utf8proc.h header file. "utf8proc_map" is most likely function you will be using for mapping UTF-8 strings, unless you want to allocate memory yourself.
*** RUBY API ***
The ruby library adds the methods "utf8map" and "utf8map!" to the String class, and the method "utf8" to the Integer class.
The String::utf8map method does the same as the "utf8proc_map" C function. Options for the mapping procedure are passed as symbols, i.e: "Hello".utf8map(:casefold) => "hello"
The descriptions of all options are found in the C header file "utf8proc.h". Please notice that the according symbols in ruby are all lowercase.
String::utf8map! is the destructive function in the meaning that the string is replaced by the result.
There are shortcuts for the 4 normalization forms specified by Unicode: String::utf8nfd, String::utf8nfd!, String::utf8nfc, String::utf8nfc!, String::utf8nfkd, String::utf8nfkd!, String::utf8nfkc, String::utf8nfkc!
The method Integer::utf8 returns a UTF-8 string, which is containing the unicode char given by the code point. 0x000A.utf8 => "\n" 0x2028.utf8 => "\342\200\250"
*** POSTGRESQL API ***
For PostgreSQL there are two SQL functions supplied named "unifold" and "unistrip". These functions function can be used to prepare index fields in order to be folded in a way where string-comparisons make more sense, e.g. where "bathtub" == "bath<soft hyphen>tub" or "Hello World" == "hello world".
CREATE TABLE people ( id serial8 primary key, name text, CHECK (unifold(name) NOTNULL) ); CREATE INDEX name_idx ON people (unifold(name)); SELECT * FROM people WHERE unifold(name) = unifold('John Doe');
The function "unistrip" removes character marks like accents or diaeresis, while "unifold" keeps then.
NOTICE: The outputs of the function can change between releases, as utf8proc does not follow a versioning stability policy. You have to rebuild your database indicies, if you upgrade to a newer version of utf8proc.
*** TODO ***
*** CONTACT ***
If you find any bugs or experience difficulties in compiling this software, please contact us:
Project page: http://www.public-software-group.org/utf8proc