diff options
Diffstat (limited to 'static/netbsd/man7/nls.7')
| -rw-r--r-- | static/netbsd/man7/nls.7 | 518 |
1 files changed, 518 insertions, 0 deletions
diff --git a/static/netbsd/man7/nls.7 b/static/netbsd/man7/nls.7 new file mode 100644 index 00000000..7e57562c --- /dev/null +++ b/static/netbsd/man7/nls.7 @@ -0,0 +1,518 @@ +.\" $NetBSD: nls.7,v 1.15 2009/04/09 02:51:54 joerg Exp $ +.\" +.\" Copyright (c) 2003 The NetBSD Foundation, Inc. +.\" All rights reserved. +.\" +.\" This code is derived from software contributed to The NetBSD Foundation +.\" by Gregory McGarry. +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS +.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED +.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS +.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +.\" POSSIBILITY OF SUCH DAMAGE. +.\" +.Dd February 21, 2007 +.Dt NLS 7 +.Os +.Sh NAME +.Nm NLS +.Nd Native Language Support Overview +.Sh DESCRIPTION +Native Language Support (NLS) provides commands for a single +worldwide operating system base. +An internationalized system has no built-in assumptions or dependencies +on language-specific or cultural-specific conventions such as: +.Pp +.Bl -bullet -offset indent -compact +.It +Character classifications +.It +Character comparison rules +.It +Character collation order +.It +Numeric and monetary formatting +.It +Date and time formatting +.It +Message-text language +.It +Character sets +.El +.Pp +All information pertaining to cultural conventions and language is +obtained at program run time. +.Pp +.Dq Internationalization +(often abbreviated +.Dq i18n ) +refers to the operation by which system software is developed to support +multiple cultural-specific and language-specific conventions. +This is a generalization process by which the system is untied from +calling only English strings or other English-specific conventions. +.Dq Localization +(often abbreviated +.Dq l10n ) +refers to the operations by which the user environment is customized to +handle its input and output appropriate for specific language and cultural +conventions. +This is a specialization process, by which generic methods already +implemented in an internationalized system are used in specific ways. +The formal description of cultural conventions for some country, together +with all associated translations targeted to the native language, is +called the +.Dq locale . +.Pp +.Nx +provides extensive support to programmers and system developers to +enable internationalized software to be developed. +.Nx +also supplies a large variety of locales for system localization. +.Ss Localization of Information +All locale information is accessible to programs at run time so that +data is processed and displayed correctly for specific cultural +conventions and language. +.Pp +A locale is divided into categories. +A category is a group of language-specific and culture-specific conventions +as outlined in the list above. +ISO C specifies the following six standard categories supported by +.Nx : +.Pp +.Bl -tag -compact -width LC_MONETARYXX +.It Ev LC_COLLATE +string-collation order information +.It Ev LC_CTYPE +character classification, case conversion, and other character attributes +.It Ev LC_MESSAGES +the format for affirmative and negative responses +.It Ev LC_MONETARY +rules and symbols for formatting monetary numeric information +.It Ev LC_NUMERIC +rules and symbols for formatting nonmonetary numeric information +.It Ev LC_TIME +rules and symbols for formatting time and date information +.El +.Pp +Localization of the system is achieved by setting appropriate values +in environment variables to identify which locale should be used. +The environment variables have the same names as their respective +locale categories. +Additionally, the +.Ev LANG , +.Ev LC_ALL , +and +.Ev NLSPATH +environment variables are used. +The +.Ev NLSPATH +environment variable specifies a colon-separated list of directory names +where the message catalog files of the NLS database are located. +The +.Ev LC_ALL +and +.Ev LANG +environment variables also determine the current locale. +.Pp +The values of these environment variables contains a string format as: +.Pp +.Bd -literal + language[_territory][.codeset][@modifier] +.Ed +.Pp +Valid values for the language field come from the ISO639 standard which +defines two-character codes for many languages. +Some common language codes are: +.Pp +.Bl -column "PERSIAN (farsi)" "Sy Code" "OCEANIC/INDONESIAN" +.It Sy Language Name Ta Sy Code Ta Sy Language Family +.It ABKHAZIAN AB IBERO-CAUCASIAN +.It AFAN (OROMO) OM HAMITIC +.It AFAR AA HAMITIC +.It AFRIKAANS AF GERMANIC +.It ALBANIAN SQ INDO-EUROPEAN (OTHER) +.It AMHARIC AM SEMITIC +.It ARABIC AR SEMITIC +.It ARMENIAN HY INDO-EUROPEAN (OTHER) +.It ASSAMESE AS INDIAN +.It AYMARA AY AMERINDIAN +.It AZERBAIJANI AZ TURKIC/ALTAIC +.It BASHKIR BA TURKIC/ALTAIC +.It BASQUE EU BASQUE +.It BENGALI BN INDIAN +.It BHUTANI DZ ASIAN +.It BIHARI BH INDIAN +.It BISLAMA Ta BI Ta "" +.It BRETON BR CELTIC +.It BULGARIAN BG SLAVIC +.It BURMESE MY ASIAN +.It BYELORUSSIAN BE SLAVIC +.It CAMBODIAN KM ASIAN +.It CATALAN CA ROMANCE +.It CHINESE ZH ASIAN +.It CORSICAN CO ROMANCE +.It CROATIAN HR SLAVIC +.It CZECH CS SLAVIC +.It DANISH DA GERMANIC +.It DUTCH NL GERMANIC +.It ENGLISH EN GERMANIC +.It ESPERANTO EO INTERNATIONAL AUX. +.It ESTONIAN ET FINNO-UGRIC +.It FAROESE FO GERMANIC +.It FIJI FJ OCEANIC/INDONESIAN +.It FINNISH FI FINNO-UGRIC +.It FRENCH FR ROMANCE +.It FRISIAN FY GERMANIC +.It GALICIAN GL ROMANCE +.It GEORGIAN KA IBERO-CAUCASIAN +.It GERMAN DE GERMANIC +.It GREEK EL LATIN/GREEK +.It GREENLANDIC KL ESKIMO +.It GUARANI GN AMERINDIAN +.It GUJARATI GU INDIAN +.It HAUSA HA NEGRO-AFRICAN +.It HEBREW HE SEMITIC +.It HINDI HI INDIAN +.It HUNGARIAN HU FINNO-UGRIC +.It ICELANDIC IS GERMANIC +.It INDONESIAN ID OCEANIC/INDONESIAN +.It INTERLINGUA IA INTERNATIONAL AUX. +.It INTERLINGUE IE INTERNATIONAL AUX. +.It INUKTITUT Ta IU Ta "" +.It INUPIAK IK ESKIMO +.It IRISH GA CELTIC +.It ITALIAN IT ROMANCE +.It JAPANESE JA ASIAN +.It JAVANESE JV OCEANIC/INDONESIAN +.It KANNADA KN DRAVIDIAN +.It KASHMIRI KS INDIAN +.It KAZAKH KK TURKIC/ALTAIC +.It KINYARWANDA RW NEGRO-AFRICAN +.It KIRGHIZ KY TURKIC/ALTAIC +.It KURUNDI RN NEGRO-AFRICAN +.It KOREAN KO ASIAN +.It KURDISH KU IRANIAN +.It LAOTHIAN LO ASIAN +.It LATIN LA LATIN/GREEK +.It LATVIAN LV BALTIC +.It LINGALA LN NEGRO-AFRICAN +.It LITHUANIAN LT BALTIC +.It MACEDONIAN MK SLAVIC +.It MALAGASY MG OCEANIC/INDONESIAN +.It MALAY MS OCEANIC/INDONESIAN +.It MALAYALAM ML DRAVIDIAN +.It MALTESE MT SEMITIC +.It MAORI MI OCEANIC/INDONESIAN +.It MARATHI MR INDIAN +.It MOLDAVIAN MO ROMANCE +.It MONGOLIAN Ta MN Ta "" +.It NAURU Ta NA Ta "" +.It NEPALI NE INDIAN +.It NORWEGIAN NO GERMANIC +.It OCCITAN OC ROMANCE +.It ORIYA OR INDIAN +.It PASHTO PS IRANIAN +.It PERSIAN (farsi) FA IRANIAN +.It POLISH PL SLAVIC +.It PORTUGUESE PT ROMANCE +.It PUNJABI PA INDIAN +.It QUECHUA QU AMERINDIAN +.It RHAETO-ROMANCE RM ROMANCE +.It ROMANIAN RO ROMANCE +.It RUSSIAN RU SLAVIC +.It SAMOAN SM OCEANIC/INDONESIAN +.It SANGHO SG NEGRO-AFRICAN +.It SANSKRIT SA INDIAN +.It SCOTS GAELIC GD CELTIC +.It SERBIAN SR SLAVIC +.It SERBO-CROATIAN SH SLAVIC +.It SESOTHO ST NEGRO-AFRICAN +.It SETSWANA TN NEGRO-AFRICAN +.It SHONA SN NEGRO-AFRICAN +.It SINDHI SD INDIAN +.It SINGHALESE SI INDIAN +.It SISWATI SS NEGRO-AFRICAN +.It SLOVAK SK SLAVIC +.It SLOVENIAN SL SLAVIC +.It SOMALI SO HAMITIC +.It SPANISH ES ROMANCE +.It SUNDANESE SU OCEANIC/INDONESIAN +.It SWAHILI SW NEGRO-AFRICAN +.It SWEDISH SV GERMANIC +.It TAGALOG TL OCEANIC/INDONESIAN +.It TAJIK TG IRANIAN +.It TAMIL TA DRAVIDIAN +.It TATAR TT TURKIC/ALTAIC +.It TELUGU TE DRAVIDIAN +.It THAI TH ASIAN +.It TIBETAN BO ASIAN +.It TIGRINYA TI SEMITIC +.It TONGA TO OCEANIC/INDONESIAN +.It TSONGA TS NEGRO-AFRICAN +.It TURKISH TR TURKIC/ALTAIC +.It TURKMEN TK TURKIC/ALTAIC +.It TWI TW NEGRO-AFRICAN +.It UIGUR Ta UG Ta "" +.It UKRAINIAN UK SLAVIC +.It URDU UR INDIAN +.It UZBEK UZ TURKIC/ALTAIC +.It VIETNAMESE VI ASIAN +.It VOLAPUK VO INTERNATIONAL AUX. +.It WELSH CY CELTIC +.It WOLOF WO NEGRO-AFRICAN +.It XHOSA XH NEGRO-AFRICAN +.It YIDDISH YI GERMANIC +.It YORUBA YO NEGRO-AFRICAN +.It ZHUANG Ta ZA Ta "" +.It ZULU ZU NEGRO-AFRICAN +.El +.Pp +For example, the locale for the Danish language spoken in Denmark +using the ISO 8859-1 character set is da_DK.ISO8859-1. +The da stands for the Danish language and the DK stands for Denmark. +The short form of da_DK is sufficient to indicate this locale. +.Pp +The environment variable settings are queried by their priority level +in the following manner: +.Pp +.Bl -bullet +.It +If the +.Ev LC_ALL +environment variable is set, all six categories use the locale it +specifies. +.It +If the +.Ev LC_ALL +environment variable is not set, each individual category uses the +locale specified by its corresponding environment variable. +.It +If the +.Ev LC_ALL +environment variable is not set, and a value for a particular +.Ev LC_* +environment variable is not set, the value of the +.Ev LANG +environment variable specifies the default locale for all categories. +Only the +.Ev LANG +environment variable should be set in /etc/profile, since it makes it +most easy for the user to override the system default using the individual +.Ev LC_* +variables. +.It +If the +.Ev LC_ALL +environment variable is not set, a value for a particular +.Ev LC_* +environment variable is not set, and the value of the +.Ev LANG +environment variable is not set, the locale for that specific +category defaults to the C locale. +The C or POSIX locale assumes the ASCII character set and defines +information for the six categories. +.El +.Ss Character Sets +A character is any symbol used for the organization, control, or +representation of data. +A group of such symbols used to describe a +particular language make up a character set. +It is the encoding values in a character set that provide +the interface between the system and its input and output devices. +.Pp +The following character sets are supported in +.Nx : +.Bl -tag -width ISO_8859_family +.It ASCII +The American Standard Code for Information Exchange (ASCII) standard +specifies 128 Roman characters and control codes, encoded in a 7-bit +character encoding scheme. +.It ISO 8859 family +Industry-standard character sets specified by the ISO/IEC 8859 +standard. +The standard is divided into 15 numbered parts, with each +part specifying broad script similarities. +Examples include Western European, Central European, Arabic, Cyrillic, +Hebrew, Greek, and Turkish. +The character sets use an 8-bit character encoding scheme which is +compatible with the ASCII character set. +.It Unicode +The Unicode character set is the full set of known abstract characters of +all real-world scripts. It can be used in environments where multiple +scripts must be processed simultaneously. +Unicode is compatible with ISO 8859-1 (Western European) and ASCII. +Many character encoding schemes are available for Unicode, including UTF-8, +UTF-16 and UTF-32. +These encoding schemes are multi-byte encodings. +The UTF-8 encoding scheme uses 8-bit, variable-width encodings which is +compatible with ASCII. +The UTF-16 encoding scheme uses 16-bit, variable-width encodings. +The UTF-32 encoding scheme using 32-bit, fixed-width encodings. +.El +.Ss Font Sets +A font set contains the glyphs to be displayed on the screen for a +corresponding character in a character set. +A display must support a suitable font to display a character set. +If suitable fonts are available to the X server, then X clients can +include support for different character sets. +.Xr xterm 1 +includes support for Unicode with UTF-8 encoding. +.Xr xfd 1 +is useful for displaying all the characters in an X font. +.Pp +The +.Nx +.Xr wscons 4 +console provides support for loading fonts using the +.Xr wsfontload 8 +utility. +Currently, only fonts for the ISO8859-1 family of character sets are +supported. +.Ss Internationalization for Programmers +To facilitate translations of messages into various languages and to +make the translated messages available to the program based on a +user's locale, it is necessary to keep messages separate from the +programs and provide them in the form of message catalogs that a +program can access at run time. +.Pp +Access to locale information is provided through the +.Xr setlocale 3 +and +.Xr nl_langinfo 3 +interfaces. +See their respective man pages for further information. +.Pp +Message source files containing application messages are created by +the programmer and converted to message catalogs. +These catalogs are used by the application to retrieve and display +messages, as needed. +.Pp +.Nx +supports two message catalog interfaces: the X/Open +.Xr catgets 3 +interface and the Uniforum +.Xr gettext 3 +interface. +The +.Xr catgets 3 +interface has the advantage that it belongs to a standard which is +well supported. +Unfortunately the interface is complicated to use and +maintenance of the catalogs is difficult. +The implementation also doesn't support different character sets. +The +.Xr gettext 3 +interface has not been standardized yet, however it is being supported +by an increasing number of systems. +It also provides many additional tools which make programming and +catalog maintenance much easier. +.Ss Support for Multi-byte Encodings +Some character sets with multi-byte encodings may be difficult to decode, +or may contain state (i.e., adjacent characters are dependent). +ISO C specifies a set of functions using 'wide characters' which can handle +multi-byte encodings properly. +The behaviour of these functions is affected +by the +.Ev LC_CTYPE +category of the current locale. +.Pp +A wide character is specified in ISO C +as being a fixed number of bits wide and is stateless. +There are two types for wide characters: +.Em wchar_t +and +.Em wint_t . +.Em wchar_t +is a type which can contain one wide character and operates like 'char' +type does for one character. +.Em wint_t +can contain one wide character or WEOF (wide EOF). +.Pp +There are functions that operate on +.Em wchar_t , +and substitute for functions operating on 'char'. +See +.Xr wmemchr 3 +and +.Xr towlower 3 +for details. +There are some additional functions that operate on +.Em wchar_t . +See +.Xr wctype 3 +and +.Xr wctrans 3 +for details. +.Pp +Wide characters should be used for all I/O processing which may rely +on locale-specific strings. +The two primary issues requiring special use of wide characters are: +.Bl -bullet -offset indent +.It +All I/O is performed using multibyte characters. +Input data is converted into wide characters immediately after +reading and data for output is converted from wide characters to +multi-byte encoding immediately before writing. +Conversion is controlled by the +.Xr mbstowcs 3 , +.Xr mbsrtowcs 3 , +.Xr wcstombs 3 , +.Xr wcsrtombs 3 , +.Xr mblen 3 , +.Xr mbrlen 3 , +and +.Xr mbsinit 3 . +.It +Wide characters are used directly for I/O, using +.Xr getwchar 3 , +.Xr fgetwc 3 , +.Xr getwc 3 , +.Xr ungetwc 3 , +.Xr fgetws 3 , +.Xr putwchar 3 , +.Xr fputwc 3 , +.Xr putwc 3 , +and +.Xr fputws 3 . +They are also used for formatted I/O functions for wide characters +such as +.Xr fwscanf 3 , +.Xr wscanf 3 , +.Xr swscanf 3 , +.Xr fwprintf 3 , +.Xr wprintf 3 , +.Xr swprintf 3 , +.Xr vfwprintf 3 , +.Xr vwprintf 3 , +and +.Xr vswprintf 3 , +and wide character identifier of %lc, %C, %ls, %S for conventional +formatted I/O functions. +.El +.Sh SEE ALSO +.Xr gencat 1 , +.Xr xfd 1 , +.Xr xterm 1 , +.Xr catgets 3 , +.Xr gettext 3 , +.Xr nl_langinfo 3 , +.Xr setlocale 3 , +.Xr wsfontload 8 +.Sh BUGS +This man page is incomplete. |
