MBYTE


*mbyte.txt*     For Vim version 6.1.  最近更新:2004年1月13日

                VIM REFERENCE MANUAL      by Bram Moolenaar et al.
                <译者:yemao http://vimcdoc.sf.net>

译者:本文尚在整理中,建议不要阅读!!! 

多字节支持                                 *multibyte* *multi-byte*
                                                *Chinese* *Japanese* *Korean*
本章讲述了关于使用多字节语言相关的内容。这样的语言包括汉语,日语,朝鲜语。
同时也讲述了 Unicode。

一般性的常识,请参考用户手册 |usr_45.txt|.
如何更改菜单语言和消息语言,请参考 |mlang.txt|.

{not available when compiled without the +multi_byte feature}


1. 初步                                       |mbyte-first|
2. 本地化                            |mbyte-locale|
3. 编码                                       |mbyte-encoding|
4. 使用终端                         |mbyte-terminal|
5. X11 的字体                                |mbyte-fonts-X11|
6. MS-Windows 的字体                 |mbyte-fonts-MSwin|
7. 在 X11 下输入                    |mbyte-XIM|
8. 在 MS-Windows 下输入                     |mbyte-IME|
9. 使用键图                         |mbyte-keymap|
10. 使用 UTF-8                                |mbyte-utf8|
11. 选项总述                                |mbyte-options|


1.初步                                                *mbyte-first*
这里对 Vim 的多字节特点作总体介绍。幸运的话,你的 vim 会一切运行正常,如果
不正常,请阅读以下内容。你可能会花一些时间,经过多次试验才使 vim 支持多字节。
因为不幸的是,每个系统都有一套自己处理多字节语言的方法,而且非常复杂。


编 译

如果你有编译好的 Vim,检查它是否包含|+multi_byte|特性,|:version|命令可以做
到这一点。如果不包含,你需要重新编译 vim,同时加上 "big" 特性。你也可以查看
还有哪些特性,具体请查看源代码中的 INSTALL。


本 地 化

首先,确认当前的 locale 设置正确。如果系统已经设定了对该语言的支持,一切将
正常工作。否则,你需要在 shell 里面设置 $LNAG 变量:

        setenv LANG ja_JP.EUC

很不幸,locale 的名字取决于你的系统。Chinese 也可能被叫做 "zh_CN.gbk"
或者 "zh"。这样可以查看当前的语言:

        :language

修改 Vim 所用的 locale:

        :language zh_CN.bgk

如果设置的 locale 不能工作,vim 将会给出错误信息。要找出需要使用的 locale,
这是一个好办法。不过最好是在 shell 里面设置 locale,那样,启动的时候就可以
正常使用。具体请看|mbyte-locale|。


编 码

如果 locale 正常工作,Vim 随后将设置 'encoding',如果不能使用,你可以重新
设置来取代之:

        :set encoding=utf-8

|encoding-values| 列出了可以使用的值。

设置之后,正在用 Vim 编辑的文件将使用这种编码。不仅仅是缓冲区的文件,寄存
器,变量都会使用这种编码。这也意味着 'encoding' 的改变使变得无效,不过不是
内容的变化,而是不能正常显示。

你可以在另一种编码下编辑文件,vim在读文件的时候把它转变成另一种编码,按原来的编码
保存文件。具体请看''fileencoding' 'fileencodings'和'++enc'.


字 体 和 显 示
如果使用终端,你必须确定终端可以使用vim所使用的编码。否则你必须修改 'termencoding'
使vim自动转化文件编码。
在GUI下,你必须选择在当前编码下可以使用的字体。这和终端有所不同。它和你用的系统,
locale和一些其它的东西有关。具体请看 |mbyte-fonts-X11| 和 |mbyte-fonts-MSwin|.

在X11下,你可以设置 'guifontset',把要使用的字体写入列表。以朝鲜语为例:

        :set guifontset=k12,r12

同时,你可以设置 'guifont' 和 'guifontwide'。'guifont' 设置单宽字体,'guifontwide'
来设置双宽字体。因此,'guifontwide' 字体的宽度是 'guifont' 字体的两倍。
以 UTF-8 为例:

        :set guifont=-misc-fixed-medium-r-normal-*-18-120-100-100-c-90-iso10646-1
        :set guifontwide=-misc-fixed-medium-r-normal-*-18-120-100-100-c-90-iso10646-1

你也可以只设置 'guifont',Vim 将尝试自动设置 'guifontwide'。

输 入
有多种方法输入多字节字:
-在X11下,XIM可以输入多字节字,具体请看 |XIM|.
-在MS-Windows下,IME可以输入多字节,具体请看 |IME|。
-键盘在所有的系统下都可以输入。具体请看 |mbyte-keymap|。
设置 'iminsert','imsearch' 和 'imcmdline' 可以选择不同的输入方法,也可以
临时禁用它们。


2.  Locale                                              *mbyte-locale*

最简单的设置就是系统使用你想用的 locale,不过你也可以设置你所使用的shell的locale,
或者在vim中使用一定的 locale。

世界上有很多种语言,而不同的文化和环境的数量至少有语言的种类那么多。一个地区对应
的语言环境就叫 "locale",它包括所使用的语言,文字,排序规则,时间格式,货币格式
等信息,Vim 只和语言和文字有关。

你只能使用系统支持的 locale,有的系统的只有很少的 locale,在美国特别如此。也许你
想使用的语言系统没有,这样你就需要把它当一个额外的包来安装,具体请按系统文档操作。
不同的系统安装 locale 的位置也不同,例如,"/usr/share/locale" 或者
"/usr/lib/locale",具体请看 setlocale() 的 man page。

在这些目录里你可以看到每个 locale 的全称,它们也区别大小写,因此 "ja_JP.EUC" 和
"ja_jp.euc" 不同。有的系统有一个 locale.alias 文件,它允许一个简称如 "nl" 和它的
全称 "nl_NL.ISO_8859-1" 转化。
Note: X-windows 有自己的locale 设置,不幸的是它使用的 locale 名也和其它地方不同。
这非常混乱!Vim 使用 setlocale() 的设置,而它不是 X-windows 的东西,也许你需要经
过多次测试来找出 X-windows 使用的 locale。

                                                *locale-name*
locale 名的简单的格式是
        language
或者    language_territory
或者    language_territory.codeset

Territory 表示国家(或者它的一部份),codeset表示文字。例如,"ja_JP.eucJP"表示
        ja      语言是Japanese
        JP      国家是Japan
        ecuJP   文字是EUC-JP

不过它也可以是"ja","ja_JP.EUC","ja_JP.ujis" 等等,糟糕的是,locale名对
一个特定的语言,国家和文字是不统一的,它取决于你的系统。

例子如下:
    文字            语言                  locale名 
    GB2312          Chinese (simplified)  zh_CN.EUC, zh_CN.GB2312
    Big5            Chinese (traditional) zh_TW.BIG5, zh_TW.Big5
    CNS-11643       Chinese (traditional) zh_TW
    EUC-JP          Japanese              ja, ja_JP.EUC, ja_JP.ujis, ja_JP.eucJP
    Shift_JIS       Japanese              ja_JP.SJIS, ja_JP.Shift_JIS
    EUC-KR          Korean                ko, ko_KR.EUC

使 用 一 种 LOCALE

使整个系统使用一个 locale,请看系统文档。大部分情况你需要在 "/etc" 下的配置
文件中设置它。

在shell里使用一个 locale,设置$LANG环境变量。你想使用韩语,locale 名为 "ko",
    sh:    export LANG=ko
    csh:   setenv LANG ko
要经常使用它,把它们写进 ~/.profile 或者 ~/.cshrc中。

只在vim中使用一个 locale,用language命令:

        :language ko

把它写进 ~/.vimrc 写进就可以经常使用。

或者在启动 vim 的时候设置 $LANG:
   sh:    LANG=ko vim {vim-arguments}
   csh:   env LANG=ko vim {vim-arguments}


3.  Encoding                            *mbyte-encoding*

Vim 用 'encoding' 来设置 vim 识别字符和编码的方法。这个设置对文本使用的地方都有
效,包括缓冲,寄存器和变量。

                                                        *charset* *codeset*
Charset 是 encoding 的另一种叫法,它们有一点很小的区别。但是对 vim 来说无关紧要。
"codeset" 是它的另一个别名。

每个字符都被编码成为一个或者两个字节。如果所有的字符都被编码成一个字节,我们就称
之为单字节编码。最常用的是 "latin1"。它把可用的字符数限制为 256 个。其中还有一部
分控制字符,这使得可用于文本的字符数更少。

如果某些字符使用两个或更多字节,我们称其为多字节编码。这允许多得多的字符,符合大
多数东亚语种的要求。

多数多字节编码方式使用开始的 127 个字符作为一个字节。这和 ASCII 码相同,使得与纯
ASCII 码之间的转换十分容易。不管你使用哪种语言,所以就算你的 encoding 设置错了,
你也可以看到正确的文本。

                                                        *encoding-names*
Vim 可以使用多种不同的字符编码,主要有以下三大类:
1   8bit        单字节编码,256 个不同的字符。主要用于美国和欧洲。例如:
                ISO-8859-1(Latin1),所有的字符占一个屏幕单元。

2   2byte       双字节编码,超过 10000 个字符。主要在亚洲各国使用。例如:
                euc-cn (中文)。所占的屏幕单元数和字节数相同。(euc-jp 除外,它
                的第一个字节是 0x8e)。

u   Unicode     通用编码,可以取代其它所有的编码格式。如:ISO 10646.
                有几百万个字符。例如:UTF-8。字节数和屏幕单元的关系很复杂。

其它的编码不能在 Vim 内部使用。但是以其它方式编码的文件在转换之后可以被 vim 编辑,
具体请看 'fileencoding'。Note 所有的编码都必须对 128 之内的字符使用 ASCII 编码
(编译成 EBCDIC 除外)。

Vim 支持的 'encoding' 值有:                                *encoding-values*
1   latin1      8-bit 字符 (ISO 8859-1)
1   iso-8859-n  ISO_8859 变体 (n = 2 to 15)
1   koi8-r      俄语
1   koi8-u      乌克兰语
1   8bit-{name} 任何 8-bit 编码 (Vim 特定名称)
1   cp{number}  MS-Windows: 任何安装的单字节 codepage
2   cp932       日语 (Windows only)
2   euc-jp      日语 (Unix only)
2   sjis        日语 (Unix only)
2   cp949       韩语 (Unix and Windows)
2   euc-kr      韩语 (Unix only)
2   cp936       简体中文 (Windows only)
2   euc-cn      简体中文 (Unix only)
2   cp950       繁体中文 (on Unix alias for big5)
2   big5        繁体中文 (on Windows alias for cp950)
2   euc-tw      繁体中文 (Unix only)
2   2byte-{name} Unix: 任何双字节编码 (Vim 特定名称)
2   cp{number}  MS-Windows: 任何安装的双字节 codepage
u   utf-8       32 bit UTF-8 编码的 Unicode (ISO/IEC 10646-1)
u   ucs-2       16 bit UCS-2 编码的 Unicode (ISO/IEC 10646-1)
u   ucs-2le     如 ucs-2, little endian
u   utf-16      ucs-2 extended with double-words for more characters
u   utf-16le    如 utf-16, little endian
u   ucs-4       32 bit UCS-4 编码的 Unicode (ISO/IEC 10646-1)
u   ucs-4le     如 ucs-4, little endian

{name} 可以是任何系统支持的编码名。Vim 会调用 iconv() 在该编码名及当前 locale
之间转换。对 MS-Windows 来说,"cp{number}" 意味着使用 codepage {number}。
例如:
                :set encoding=8bit-cp1252
                :set encoding=2byte-cp932

也可以使用别名,以下是一个不完全列表:

1   ansi        同 latin1 (obsolete, for backward compatibility)
2   japan       日语: on Unix "euc-jp", on MS-Windows cp932
2   korea       韩语: on Unix "euc-kr", on MS-Windows cp949
2   prc         简体中文: on Unix "chinese", on MS-Windows cp936
2   taiwan      繁体中文: on Unix "euc-tw", on MS-Windows cp950
u   utf8        同 utf-8
u   unicode     同 ucs-2
u   ucs2be      同 ucs-2 (big endian)
u   ucs-2be     同 ucs-2 (big endian)
u   ucs-4be     同 ucs-4 (big endian)

对於 UCS 编码字节次序会影响编码结果。这比较麻烦。因此尽可能使用 UTF-8。
默认使用 big-endian (关键字节在前):
            name        bytes           char 
            ucs-2             11 22         1122
            ucs-2le           22 11         1122
            ucs-4       11 22 33 44     11223344
            ucs-4le     44 33 22 11     11223344

在 MS-Windows 系统上你经常需要使用 "ucs-2le",因为它使用了 little-endian
UCS-2 编码。

有一些编码看起来相似,实际上是不完全相同的。Vim 把它们当不同的编码来处理,
必要时做转换。当转换不必要或者需要避免时,你可以使用相近的编码名。

        cp932, shift-jis, sjis
        cp936, euc-cn

                                                        *encoding-table*
一般情况下,'encoding' 和当前的 locale 相同,'termencoding' 为空,这意味着
键盘和显示方式以当前 locale 编码字符,Vim 内部也使用相同的字符。

你可以通过设置 'encodng' 为不同的值来使 vim 使用另一种编码方式。但是由於键
盘和显示仍用当前的 locale,这就需要编码之间的转换。然后 'termencoding' 使用
当前 locale 值,Vim 负责在 'encoding' 和 'termencoding' 之间转换。例如:

        :let &termencoding = &encoding
        :set encoding=utf-8

尽管如此,并不是所有组合都可以转换。下面这个表列出 9 种组合如何转换。这还受到
iconv() 的功能影响。因为这取决于你所用的系统,这里无法给出具体的信息。

'tenc'      'enc'       解释 

 8bit       8bit        可以转换,当 'termencoding' 和 'encoding' 不同时,有些
                        字符的输入和显示会有问题,Vim 不做转换 (把 'encoding'
                        设成 utf-8 可以解决)。
 8bit      2byte        MS-Windows:系统上安装的所有 codepages 都可以转换,同时,
                        你只能输入 8bit 字符,在其它的系统上都不行。
 8bit      Unicode      可以转换,但是你只能输入 8bit 字符,在终端下你只能看到
                        8bit字符,在 GUI 下可以看到 'guifont' 支持的所有字符。
 2byte      8bit        可以转换,但是输入非 ASCII 字符会有问题。
 2byte     2byte        MS-Windows:支持系统安装的所有的 codepages 之间的转换,
                        当 locale 和 'encoding' 不同的时输入会有问题。
                        在其它系统上,当 'termencoding' 和 'encoding' 相同或者为空
                        时可以转换。
 2byte     Unicode      可以转换。Vim 会转换输入的字符。
 Unicode    8bit        可以转换(不平常的)
 Unicode    2byte       不能转换
 Unicode   Unicode      转换非常好(当'termencoding'为空时也可以转换,因为Unicode内部
                        使用UTF-8)

转换                                                  *charset-conversion*

以下情况,Vim 会自动把一种编码转换成另一种编码:
- 读文件时 'fileencoding' 和 'encoding' 不同
- 写文件时 'fileencoding' 和 'encoding' 不同
- 显示字符时 'termencoding' 和 'encoding' 不同
- 读取输入时 'termencoding' 和 'encoding' 不同
- 显示信息时 LC_MESSAGE 使用的编码和 'encoding' 不同 (需要支持此功能的
  gettext)
- Vim 脚本 |:scriptencoding| 和 'encoding'不同时
- 读写一个 |viminfo| 文件时
以上很多都需要 |+iconv| 特性,读写的转换也许还需要特别指定 'chaconvert'。
转换字符和一些有用的工具
    所有:            iconv
       GNU iconv可以转换很多编码,Unicode作为中间编码,它可以转换成其它所有的
       编码。具体请看http://www.gnu.org/directory/libiconv.html

    日文:         nkf
        Nkf是"Network Kanji code conversion Filter"的缩写,nkf最特别的地方是它可以
        猜测输入的Kanji的编码。所以,你不必知道输入文件的|charset|是什么。要从ISO-2202-JP
        或Shift_JIS转换成EUC-JP,在Vim里输入以下命令:
            :%!nkf -e
        Nkf可以在下面地址找到:
        http://www.sfc.wide.ad.jp/~max/FreeBSD/ports/distfiles/nkf-1.62.tar.gz

    中文:        hc
        Hc是"Hanzi Converter"的简写,Hc把一个GB文件转换成Big5文件,或者把一个
        Big5文件转换成GB文件,可以在以下地址找到:
        ftp://ftp.cuhk.hk/pub/chinese/ifcss/software/unix/convert/hc-30.tar.gz

    韩文:         hmconv
        Hmconv是一套E-mail文字转换的工具,它可以在EUC-KR和ISO-2202-KR之间转换。
        Hmconv可以在以下地址找到:
        ftp://ftp.kaist.ac.kr/pub/hangul/code/hmconv/hmconv1.0pl3

    多语言:   lv
        Lv是一个强大的多语言文件查看器,它还可以作为|charset|转换器,支持的
        |charset|有:ISO-2202-CN,ISO-2202-JP,ISO-2202-KR,EUC-CN,EUC-JP,EUC-KR,
        EUC-TW,UTF-7,UTF-8,ISO-8859系列,Shift_JIS,Big5和HZ.Lv可以在以下地址
        找到:
        http://www.ff.iij4u.or.jp/~nrt/freeware/lv4493.tar.gz

4. 使用终端                                         *mbyte-terminal*

GUI 版本的 Vim 全面支持多字节字符。在终端內使用多字節编码需要终端本身的支持。
因此灵活性不高。

举个例子,你可以在支持多字节及/或 |XIM| 的 xterm 里使用 Vim。这样的终端有:
kterm (Kanji term), hanterm (Korean), Eterm(Enlightened terminal) 和 rxvt。

如果你的终端不支持正确的编码,可以通过设置 'termencoding' 来解决。Vim 会将
输入的字符从 'termencoding' 转化成 'encoding',而把显示的字符从 'encoding'
转化成 'termencoding'。如果终端所支持的编码不包含 Vim 所使用的字符,会导致
字符丢失,也会引起显示混乱。如果你的终端支持 Unicode,如以上提到的 xterm,
应该就可以正常工作,因为几乎所有的编码都可以无损地转换为 Unicode。


在 XFREE86 XTERM 中 使 用 UTF-8                             *UTF8-xterm*

下面是一个简要的关于如何在 XFree86 所带 xterm 中使用 UTF-8 的说明。XFree86 的
作者是 Thomas Dickey (本段来自 Markus Kuhn)。

从以下地址获得最新的支持 UTF-8 的 xterm:

        http://invisible-island.net/xterm/xterm.html

带以下参数编译 "./configure --enable-wide-chars;make"

从以下地址获得 ISO 10646-1 所支持的各种字体:

       http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz

并按照 README 文件安装那些字体。

现在,用以下命令启动 xterm
  xterm -u8 -fn -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1
要使用大字体,用以下命令:
  xterm -u8 -fn -misc-fixed-medium-r-normal--15-140-75-75-c-90-iso10646-1

现在,你就拥有一个支持 UTF-8 的模拟终端了。用以下两种方法测试一下

   cat utf-8-demo.txt
   vim utf-8-demo.txt

示例文件来自于 ucs-fonts.tar.gz,它的目的是测试你的 xterm 使用 UTF-8 是否有问题。

对於 Vim 你可能还需设置 'encoding' 为 "utf-8"。


5. X11  中的字体                                    *mbyte-fonts-X11*

不幸的是,在 X11 下使用字体是非常复杂的。单字节字体的名称是一个长字符串,而多字
节字体則需要多个。
首先,Vim 显示文本只能使用等寬的字体。你不能使用相称空间的字体,包括许多可用的
字体(也许还很好看)。尽管如此,菜单和工具栏可以使用任何字体。

Note: 显示和输入是独立的,就算没有输入你所使用的语言的输入法,你也可能看到你所使
用的语言。
你可以使用默认的字体来设置菜单和工具栏,但可能非常丑陋,阅读以下内容,你可以学到
怎样选择一个较好的字体。

X LOGICAL FONT DESCRIPTION (XLFD)
                                                        *XLFD*
XLFD 是 X 中使用的包括字体大小,字符集等信息的字体命。格式如下:

FOUNDRY-FAMILY-WEIGHT-SLANT-WIDTH-STYLE-PIXEL-POINT-X-Y-SPACE-AVE-CR-CE

每个字段的意思是:
- FOUNDRY:FOUNDRY 字段,编写字体的公司名称。
- FAMILY:FAAMILY_NAME 字段,基本的字体族名称(helvetica,gothic,times 等等)
- WEIGHT:WEIGHT_NAME 字段,字母的粗细。(light,medium,bold 等等)。
- SLANT:    SLANT 字段。
                r:  Roman (no slant)
                i:  Italic
                o:  Oblique
                ri: Reverse Italic
                ro: Reverse Oblique
                ot: Other
                number: Scaled font
- PIXEL:    PIXEL_SIZE field.  Height, in pixels, of characters.
- WIDTH:     SETWIDTH_NAME字段,字符宽度(normal,condensed,narrow,double
             wide 等)
- STYLE:    ADD_STYLE_NAME字段,字体的额外信息。(Serif,Sans-Serif,Informal,
             Decorated等等)
- PIXEL:    PIXEL_SIZE字段,字体高度,以pixels计算。
- POINT:    POINT_SIZE字段,10倍字体的高度,以points计算。
- X:         RESOLUTION_X字段,X的位数(每英寸多少点)
- Y:        RESOLUTION_Y字段,Y的位数(每英寸多少点)
- SPACE:    SPACING字段
                p:比例
                m:单空格
                c:charcell
- AVE:      AVERAGE_WIDTH字段,10倍字体的宽度,以pixels计算
- CR:       CHARSET_REGISTRY字段。编码组的名称
- CE:       CHARSET_ENCODING field.  The rest of the charset name.  For some
            charsets, such as JIS X 0208, if this field is 0, code points has
            the same value as GL, and GR if 1.

举个例子,设置大小为14,对应JIS X 0208编码,可以写成以下格式
    -misc-fixed-medium-r-normal--16-110-100-100-c-160-jisx0208.1990-0

X FONTSET
                                                *fontset* *xfontset*
单字节字符和一个字有关,对多字节来讲,字母的组合经常用到,这意味着一组
字使用了一种字体和另一种字体组的字体(也许是双宽度),字体的集合就叫做 fontset。
fontset 里的字体依赖于你系统的 locale,X windows 维护着这个 locale 所需要的字体组
的表,你需要在 guifontset 里指定 locale 所需要的全部字体。

Note:fontset 经常使用当前的 locale,即使 'encoding' 已设置成另一种编码。在那种
情况下,你要设置 'guifont' 和 'guifontwide' 而不是 'guifontset'。
例如:
    |charset| language              "groups of characters" 
    GB2312    Chinese (simplified)  ISO-8859-1 及 GB 2312
    Big5      Chinese (traditional) ISO-8859-1 及 Big5
    CNS-11643 Chinese (traditional) ISO-8859-1, CNS 11643-1 及 CNS 11643-2
    EUC-JP    Japanese              JIS X 0201 及 JIS X 0208
    EUC-KR    Korean                ISO-8859-1 及 KS C 5601 (KS X 1001)


你可以用xlsfonts来查找字体,例如,你需要找 KS C 5601 的一种字体:
    xlsfonts | grep ksc5601

这个比较复杂,也令人迷惑,也许你需要阅读 X-Windows 的文档,它可以帮助你理解你不
懂的地方。
                                                *base_font_name_list*
当你找到你所需要的字体以后,你就可以通过设置 'guifontset' 来设置字体。要使用一系列的
字体,你可以用 "," 把它们隔开,例如:

 :set guifontset=-misc-fixed-medium-r-normal--14-130-75-75-c-140-jisx0208.1983-0,
        \-misc-fixed-medium-r-normal--14-130-75-75-c-70-jisx0201.1976-0

可以选择的是,你可以设置一个基本的省略编码名的字体列表,让 X-Windows 从 locale 选择
所需要的字体,例如:


 :set guifontset=-misc-fixed-medium-r-normal--14-130-75-75-c-140,
        \-misc-fixed-medium-r-normal--14-130-75-75-c-70
Alternatively, you can supply a single base font name that allows X-Windows to
select from all available fonts.  For example:

 :set guifontset=-misc-fixed-medium-r-normal--14-*
你也可以提供一个基本的字体名让X-Windows选择系统可用的所有字体,例如:>
 :set
<
同时,你也可以指定字体别名,请参考字体目录下的fonts.alias文件(例如,/usr/X11R6/lib/X11/fonts).
例如:

 :set guifontset=-misc-fixed-medium-r-normal--14-*
 :set

                                                                *E253*
在东方字体中,正规的字符单元是正方形,当你混和一个 Latin 字和一个东亚字的时候,东亚字
的宽度应该是 Latin 字的两倍。
如果 'guifontset' 不为空,|:highlight| 命令的 "font" 参数也可以看成是 fontset 的解释器。
例如,你可以设置高亮显示:
        :hi
如果你使用了一个错误的 "font" 参数,你会收到错误信息。同样确定你是在设置
'guifontset' 之前设置高亮显示。

使 用 资 源 文 件

不设置 'guifontset',你可以设置 X11 资源,Vim 会从它们中取值。这仅仅对
了解 X 资源的人有用。
如果你使用 Motif 和 Athena,把以下三行插入 $HOME/.Xdefaults 文件:


Note:Vim.font 设置文本
      Vim*fontSet 设置菜单
      Vim*fontList 设置Moitf GUI的菜单

举个例子,你使用日语,14号字体

        Vim.font: -misc-fixed-medium-r-normal--14-*
        Vim*fontSet: -misc-fixed-medium-r-normal--14-*
        Vim*fontList: -misc-fixed-medium-r-normal--14-*

或者:

        Vim*font: k14,r14
        Vim*fontSet: k14,r14
        Vim*fontList: k14,r14

要使它们立即生效,你可以这样做

        xrdb -merge ~/.Xdefaults

不然,你要关闭 X 之后重新启动X来使它们生效。

GTK+ 版的 GUI Vim 不使用 .Xdefaults,它使用 ~/.gtkrc,大多数都可以正常工作。要更改菜单
字体,你可以如下设置:

        style "default"
        {
                fontset="-*-*-medium-r-normal--14-*-*-*-c-*-*-*"
        }
        widget_class "*" style "default"


6.  MS-Windows 中的字体                             *mbyte-fonts-MSwin*

最简单的方法就是用对话框来选择字体。你可以在菜单 "Edit/Select Font..." 中找到。
一旦你找到一个好用的字体,你可以使用以下命令来查看它的名字:

        :set guifont

然后你可以在 |gvimrc| 中添加一个命令来设置 'guifont':

        :set guifont=courier_new:h12


7.  Input on X11                                *mbyte-XIM*

X INPUT METHOD (XIM) BACKGROUND                 *XIM* *xim* *x-input-method*

XIM is an international input module for X.  There are two kind of structures,
Xlib unit type and |IM-server| (Input-Method server) type.  |IM-server| type
is suitable for complex input, such as CJK.

- IM-server
                                                        *IM-server*
  In |IM-server| type input structures, the input event is handled by either
  of the two ways: FrontEnd system and BackEnd system.  In the FrontEnd
  system, input events are snatched by the |IM-server| first, then |IM-server|
  give the application the result of input.  On the other hand, the BackEnd
  system works reverse order.  MS Windows adopt BackEnd system.  In X, most of
  |IM-server|s adopt FrontEnd system.  The demerit of BackEnd system is the
  large overhead in communication, but it provides safe synchronization with
  no restrictions on applications.

  For example, there are xwnmo and kinput2 Japanese |IM-server|, both are
  FrontEnd system.  Xwnmo is distributed with Wnn (see below), kinput2 can be
  found at: ftp://ftp.sra.co.jp/pub/x11/kinput2/

  For Chinese, there's a great XIM server named "xcin", you can input both
  Traditional and Simplified Chinese characters.  And it can accept other
  locale if you make a correct input table.  Xcin can be found at:
  http://xcin.linux.org.tw/

- Conversion Server
                                                        *conversion-server*
  Some system needs additional server: conversion server.  Most of Japanese
  |IM-server|s need it, Kana-Kanji conversion server.  For Chinese inputting,
  it depends on the method of inputting, in some methods, PinYin or ZhuYin to
  HanZi conversion server is needed.  For Korean inputting, if you want to
  input Hanja, Hangul-Hanja conversion server is needed.

  For example, the Japanese inputting process is divided into 2 steps.  First
  we pre-input Hira-gana, second Kana-Kanji conversion.  There are so many
  Kanji characters (6349 Kanji characters are defined in JIS X 0208) and the
  number of Hira-gana characters are 76.  So, first, we pre-input text as
  pronounced in Hira-gana, second, we convert Hira-gana to Kanji or Kata-Kana,
  if needed.  There are some Kana-Kanji conversion server: jserver
  (distributed with Wnn, see below) and canna. Canna can be found at:
  ftp://ftp.nec.co.jp/pub/Canna/

There is a good input system: Wnn4.2.  Wnn 4.2 contains,
    xwnmo (|IM-server|)
    jserver (Japanese Kana-Kanji conversion server)
    cserver (Chinese PinYin or ZhuYin to simplified HanZi conversion server)
    tserver (Chinese PinYin or ZhuYin to traditional HanZi conversion server)
    kserver (Hangul-Hanja conversion server)
Wnn 4.2 can be found at:
    ftp://ftp.FreeBSD.ORG/pub/FreeBSD/ports/distfiles/Wnn4.2.tar.gz


- Input Style
                                                        *xim-input-style*
  When inputting CJK, there are four areas:
      1. The area to display of the input while it is being composed
      2. The area to display the currently active input mode.
      3. The area to display the next candidate for the selection.
      4. The area to display other tools.

  The third area is needed when converting.  For example, in Japanese
  inputting, multiple Kanji characters could have the same pronunciation, so
  a sequence of Hira-gana characters could map to a distinct sequence of Kanji
  characters.

  The first and second areas are defined in international input of X with the
  names of "Preedit Area", "Status Area" respectively.  The third and fourth
  areas are not defined and are left to be managed by the |IM-server|.  In the
  international input, four input styles have been defined using combinations
  of Preedit Area and Status Area: |OnTheSpot|, |OffTheSpot|, |OverTheSpot|
  and |Root|.

  Currently, GUI Vim support three style, |OverTheSpot|, |OffTheSpot| and
  |Root|.

*.  on-the-spot                                         *OnTheSpot*
    Preedit Area and Status Area are performed by the client application in
    the area of application.  The client application is directed by the
    |IM-server| to display all pre-edit data at the location of text
    insertion. The client registers callbacks invoked by the input method
    during pre-editing.
*.  over-the-spot                                       *OverTheSpot*
    Status Area is created in a fixed position within the area of application,
    in case of Vim, the position is the additional status line.  Preedit Area
    is made at present input position of application.  The input method
    displays pre-edit data in a window which it brings up directly over the
    text insertion position.
*.  off-the-spot                                        *OffTheSpot*
    Preedit Area and Status Area are performed in the area of application, in
    case of Vim, the area is additional status line.  The client application
    provides display windows for the pre-edit data to the input method which
    displays into them directly.
*.  root-window                                         *Root*
    Preedit Area and Status Area are outside of the application.  The input
    method displays all pre-edit data in a separate area of the screen in a
    window specific to the input method.


USING XIM                       *multibyte-input* *E284* *E286* *E287* *E288*
                                *E285* *E291* *E292* *E290* *ez4* *E289*

Note that Display and Input are independent.  It is possible to see your
language even though you have no input method for it.  But when your Display
method doesn't match your Input method, the text will be displayed wrong.

        Note: You can not use IM unless you specify 'guifontset'.
              Therefore, Latin users, you have to also use 'guifontset'
              if you use IM.

To input your language you should run the |IM-server| which supports your
language and |conversion-server| if needed.

The next 3 lines should be put in your ~/.Xdefaults file.  They are common for
all X applications which uses |XIM|.  If you already use |XIM|, you can skip
this.

        *international: True
        *.inputMethod: your_input_server_name
        *.preeditType: your_input_style

input_server_name       is your |IM-server| name (check your |IM-server|
                        manual).
your_input_style        is one of |OverTheSpot|, |OffTheSpot|, |Root|.  See
                        also |xim-input-style|.

*international may not necessary if you use X11R6.
*.inputMethod and *.preeditType are optional if you use X11R6.

For example, when you are using kinput2 as |IM-server|,

        *international: True
        *.inputMethod: kinput2
        *.preeditType: OverTheSpot

When using |OverTheSpot|, GUI Vim always connects to the IM Server even in
Normal mode, so you can input your language with commands like "f" and "r".
But when using one of the other two methods, GUI Vim connects to the IM Server
only if it is not in Normal mode.

If your IM Server does not support |OverTheSpot|, and if you want to use your
language with some Normal mode command like "f" or "r", then you should use a
localized xterm  or an xterm which supports |XIM|

If needed, you can set the XMODIFIERS environment variable:

        sh:  export XMODIFIERS="@im=input_server_name"
        csh: setenv XMODIFIERS "@im=input_server_name"

For example, when you are using kinput2 as |IM-server| and sh,

        export XMODIFIERS="@im=kinput2"


FULLY CONTROLED XIM

You can fully control XIM, like with IME of MS-Windows (see |multibyte-ime|).
This is currently only available for the GTK GUI.

Before using fully controled XIM, one setting is required.  Set the
'imactivatekey' option to the key that is used for the activation of the input
method.  For example, when you are using kinput2 + canna as IM Server, the
activation key is probably Shift+Space:

        :set imactivatekey=S-space

See 'imactivatekey' for the format.


8.  Input on MS-Windows                                 *mbyte-IME*

(Windows IME support)                           *multibyte-ime* *IME*

{only works Windows GUI and compiled with the |+multi_byte_ime| feature}

To input multibyte characters on Windows, you have to use Input Method Editor
(IME).  In process of your editing text, you must switch status (on/off) of
IME many many many times.  Because IME with status on is hooking all of your
key inputs, you cannot input 'j', 'k', or almost all of keys to Vim directly.

This |+multi_byte_ime| feature help this.  It reduce times of switch status of
IME manually.  In normal mode, there are almost no need working IME, even
editing multibyte text.  So exiting insert mode with ESC, Vim memorize last
status of IME and force turn off IME.  When re-enter insert mode, Vim revert
IME status to that momorized automatically.

This works on not only insert-normal mode, but also search-command input and
replace mode.

Cursor color when IME or XIM is on                              *CursorIM*
    There is a little cute feature for IME.  Cursor can indicate status of IME
    by changing its color.  Usually status of IME was indicated by little icon
    at a corner of desktop (or taskbar).  It is not easy to verify status of
    IME.  But this feature help this.
    This works in the same way when using XIM.

    You can select cursor color when status is on by using highlight group
    CursorIM.  For example, add these lines to your _gvimrc:

        if has('multi_byte_ime')
            highlight Cursor guibg=Green guifg=NONE
            highlight CursorIM guibg=Purple guifg=NONE
        endif

    Cursor color with off IME is green.  And purple cursor indicates that
    status is on.

WHAT IS IME
    IME is a part of East asian version Windows.  That helps you to input
    multibyte character.  English and other language version Windows does not
    have any IME.  (Also there are no need usually.) But there is one that
    called Microsoft Global IME.  Global IME is a part of Internet Exproler
    4.0 or above.  You can get more information about Global IME, at below
    URL.

WHAT IS GLOBAL IME                                      *global-ime*
    Global IME makes capability to input Chinese, Japanese, and Korean text
    into Vim buffer on any language version of Windows 98, Windows 95, and
    Windows NT 4.0.  Please see below URL for detail of Global IME.  You can
    also find various language version of Global IME at same place.

    - Global IME detailed information.
        http://www.microsoft.com/windows/ie/features/ime.asp

    - Active Input Method Manager (Global IME)
        http://msdn.microsoft.com/workshop/misc/AIMM/aimm.asp

    Support Global IME is a experimental feature.

NOTE: For IME to work you must make sure in the "Language settings for the
system" the default locale is set to your language.  The exact location of
this depends on the version of Windows you use.


9. Input with a keymap                                  *mbyte-keymap*

When the keyboard doesn't produce the characters you want to enter in your
text, you can use the 'keymap' option.  This will translate one or more
(English) characters to another (non-English) character.  This only happens
when typing text, not when typing Vim commands.  This avoids having to switch
between two keyboard settings.

The value of the 'keymap' option specifies a keymap file to use.  The name of
this file is one of these two:

        keymap/{keymap}_{encoding}.vim
        keymap/{keymap}.vim

Here {keymap} is the value of the 'keymap' option and {encoding} of the
'encoding' option.  The file name with the {encoding} included is tried first.

'runtimepath' is used to find these files.  To see an overview of all
available keymap files, use this:
        :echo globpath(&rtp, "keymap/*.vim")

In Insert and Command-line mode you can use CTRL-^ to toggle between using the
keyboard map or not. |i_CTRL-^| |c_CTRL-^|
This flag is remembered for Insert mode with the 'iminsert' option.  When
leaving and entering Insert mode the previous value is used.  The same value
is also used for commands that take a single character argument, like |f| and
|r|.
For Command-line mode the flag is NOT remembered.  You are expected to type an
Ex command first, which is ASCII.
For typing search patterns the 'imsearch' option is used.  It can be set to
use the same value as for 'iminsert'.

It is possible to give the GUI cursor another color when the language mappings
are being used.  This is disabled by default, to avoid that the cursor becomes
invisible when you use a non-standard background color.  Here is an example to
use a brightly colored cursor:
        :highlight Cursor guifg=NONE guibg=Green
        :highlight lCursor guifg=NONE guibg=Cyan

                        *keymap-file-format* *:loadk* *:loadkeymap* *E105*
The keymap file looks something like this:

        " Maintainer:   name <email@address>
        " Last Changed: 2001 Jan 1

        let b:keymap_name = "short"

        loadkeymap
        a       A
        b       B       comment

The lines starting with a " are comments and will be ignored.  Blank lines are
also ignored.  The lines with the mappings may have a comment after the useful
text.

The "b:keymap_name" can be set to a short name, which will be shown in the
status line.  The idea is that this takes less room than the value of
'keymap', which might be long to distinguish between different languages,
keyboards and encodings.

The actual mappings are in the lines below "loadkeymap".  In the example "a"
is mapped to "A" and "b" to "B".  Thus the first item is mapped to the second
item.  This is done for each line, until the end of the file.
These items are exactly the same as what can be used in a |:lnoremap| command.
You can check the result with this command:
        :lmap
The two items must be separated by white space.  You cannot include white
space inside an item, use the special names "<Tab>" and "<Space>" instead.
The length of the two items together must not exceed 200 bytes.

It's possible to have more than one character in the first column.  This works
like a dead key.  Example:
                    'a     *
Since Vim doesn't know if the next character after a quote is really an "a",
it will wait for the next character.  To be able to insert a single quote,
also add this line:
        ''      '
Since the mapping is defined with |:lnoremap| the resulting quote will not be
used for the start of another character.

Although it's possible to have more than one character in the second column,
this is unusual.  But you can use various ways to specify the character:
        A       a               literal character
        A       <char-97>       decimal value
        A       <char-0x61>     hexadecimal value
        A       <char-0141>     octal value
        x       <Space>         special key name

The characters are assumed to be encoded for the current value of 'encoding'.
It's possible to use ":scriptencoding" when all characters are given
literally.  That doesn't work when using the <char-> construct, because the
conversion is done on the keymap file, not on the resulting character.

The lines after "loadkeymap" are interpreted with 'cpoptions' set to "C".
This means that continuation lines are not used and a backslash has a special
meaning in the mappings.  Examples:

        " a comment line
        \"      x       maps " to x
        \\      y       maps \ to y

If you write a keymap file that will be useful for others, consider submitting
it to the Vim maintainer for inclusion in the distribution:
<[email protected]>


HEBREW KEYMAP                                           *keymap-hebrew*

This file explains what characters are available in UTF-8 and CP1255 encodings,
and what the keymaps are to get those characters:

glyph   encoding           keymap 
Char   utf-8 cp1255  hebrew  hebrewp  name 

my mozilla can't show the content correctly!!!!!

10. Using UTF-8                         *mbyte-utf8* *UTF-8* *utf-8* *utf8*
                                                                *Unicode*
The Unicode character set was designed to include all characters from other
character sets.  Therefore it is possible to write text in any language using
Unicode (with a few rarely used languages excluded).  And it's mostly possible
to mix these languages in one file, which is impossible with other encodings.

Unicode can be encoded in several ways.  The two most popular ones are UCS-2,
which uses 16-bit words and UTF-8, which uses one or more bytes for each
character.  Vim can support all of these encodings, but always uses UTF-8
internally.

Vim has comprehensive UTF-8 support.  It appears to work in:
- xterm with utf-8 support enabled
- Athena, Motif and GTK GUI
- MS-Windows GUI

Double-width characters are supported.  This works best with 'guifontwide' or
'guifontset'.  When using only 'guifont' the wide characters are drawn in the
normal width and a space to fill the gap.

Up to two combining characters can be used.  The combining character is drawn
on top of the preceding character.  When editing text a composing character is
mostly considered part of the preceding character.  For example "x" will
delete a character and its following composing characters by default. If the
'delcombine' option is on, then pressing 'x' will delete the combining
characters, one at a time, then the base character.  But when inserting, you
type the first character and the following composing characters separately,
after which they will be joined.  The "r" command will not allow you to type a
combining character, because it doesn't know one is coming.  Use "R" instead.

Bytes which are not part of a valid UTF-8 byte sequence are handled like a
single character and displayed as <xx>, where "xx" is the hex value of the
byte.

Overlong sequences are not handled specially and displayed like a valid
character.  However, search patterns may not match on an overlong sequence.
(an overlong sequence is where more bytes are used than required for the
character.)  An exception is NUL (zero) which is displayed as "<00>".

In the file and buffer the full range of Unicode characters can be used (31
bits).  However, displaying only works for 16 bit characters, and only for the
characters present in the selected font.

Useful commands:
- "ga" shows the decimal, hexadecimal and octal value of the character under
  the cursor.  If there are composing characters these are shown too. (if the
  message is truncated, use ":messages").
- "g8" shows the bytes used in a UTF-8 character, also the composing
  characters, as hex numbers.


STARTING VIM

If your current locale is in an utf-8 encoding, Vim will automatically start
in utf-8 mode.

If you are using another locale:

        set encoding=utf-8

You might also want to select the font used for the menus.  Unfortunately this
doesn't always work.  See the system specific remarks below, and 'langmenu'.


USING UTF-8 IN X-Windows                                *utf-8-in-xwindows*

You need to specify a font to be used.  For double-wide characters another
font is required, which is exactly twice as wide.  There are three ways to do
this:

1. Set 'guifont' and let Vim find a matching 'guifontwide'
2. Set 'guifont' and 'guifontwide'
3. Set 'guifontset'

See the documentation for each option for details.  Example:

   :set guifont=-misc-fixed-medium-r-normal--15-140-75-75-c-90-iso10646-1

You might also want to set the font used for the menus.  This only works for
Motif.  Use the ":hi Menu font={fontname}" command for this. |:highlight|


TYPING UTF-8                                            *utf-8-typing*

If you are using X-Windows, you should find an input method that supports
utf-8.

If your system does not provide support for typing utf-8, you can use the
'keymap' feature.  This allows writing a keymap file, which defines a utf-8
character as a sequence of ASCII characters.  See |mbyte-keymap|.

Another method is to set the current locale to the language you want to use
and for which you have a XIM available.  Then set 'termencoding' to that
language and Vim will convert the typed characters to 'encoding' for you.

If everything else fails, you can type any character as four hex bytes:

        CTRL-V u 1234

"1234" is interpreted as a hex number.  You must type four characters, prepend
a zero if necessary.


COMMAND ARGUMENTS                                       *utf-8-char-arg*

Commands like |f|, |F|, |t| and |r| take an argument of one character.  For
UTF-8 this argument may include one or two composing characters.  These needs
to be produced together with the base character, Vim doesn't wait for the next
character to be typed to find out if it is a composing character or not.
Using 'keymap' or |:lmap| is a nice way to type these characters.

The commands that search for a character in a line handle composing characters
as follows.  When searching for a character without a composing character,
this will find matches in the text with or without composing characters.  When
searching for a character with a composing character, this will only find
matches with that composing character.  It was implemented this way, because
not everybody is able to type a composing character.



11. Overview of options                                 *mbyte-options*

These options are relevant for editing multi-byte files.  Check the help in
options.txt for detailed information.

'encoding'      Encoding used for the keyboard and display.  It is also the
                default encoding for files.

'fileencoding'  Encoding of a file.  When it's different from 'encoding'
                conversion is done when reading or writing the file.

'fileencodings' List of possible encodings of a file.  When opening a file
                these will be tried and the first one that doesn't cause an
                error is used for 'fileencoding'

'charconvert'   Expression used to convert files from one encoding to another.

'formatoptions' The 'm' flag can be included to have formatting break a line
                at a multibyte character of 256 or higher.  Thus is useful for
                languages where a sequence of characters can be broken
                anywhere.

'guifontset'    The list of font names used for a multi-byte endoding.  When
                this option is not empty, it replaces 'guifont'.

'keymap'        Specify the name of a keyboard mapping.


Contributions specifically for the multi-byte features by:
        Chi-Deok Hwang <[email protected]>
        Nam SungHyun <[email protected]>
        K.Nagano <[email protected]>
        Taro Muraoka  <[email protected]>
        Yasuhiro Matsumoto <[email protected]>

 vim:tw=78:ts=8:ft=help:norl:

Generated by vim2html on Tue Jul 27 00:35:24 CST 2004