歡迎您光臨本站 註冊首頁

JAVA的中文處理學習筆記

←手機掃碼閱讀     火星人 @ 2014-03-12 , reply:0
  






Hello Unicode ——JAVA的中文處理學習筆記: linux java mutlibyte <br>encoding locale i18n i10n chinese




Hello Unicode

    ——JAVA的中文處理學習筆記



作者: 車東 chedong@bigfoot.com



最後更新:2002-12-30 13:20:57


版權聲明:可以任意轉載,轉載時請務必標明原始出處和作者信息

關鍵詞:linux java mutlibyte encoding locale i18n i10n chinese 


內容摘要:通過2個測試程序說明系統預設編碼方式和應用的編碼策略對字元處理的影響,選擇合適的編碼處理策略,構建更符合國際化規範的通用應用。


測試程序-1

==========


為了了解JAVA應用的編碼處理的機制,首先要了解操作系統對JVM預設編碼方式的影響,因此我做了一個Env.java,用於列印顯示不同系統下JVM的屬性和系統支持的LOCALE。程序很簡單:


/*
* Copyright (c) 2002 chedong@bigfoot.com
* $Id: Env.java,v 1.1 2002/07/30 09:48:12 chedong Exp $
*/

import java.util.*;
import java.text.*;

/**
* 目的:
* 顯示環境變數和JVM的預設屬性
* 輸入:無
* 輸出:
* 1 支持的LOCALE
* 2 JVM的預設屬性
*/

public class Env {
/**
* main entrance
*/
public static void main(String[] args) {

System.out.println("Hello, it's: " + new Date());

//print available locales
Locale list[] = DateFormat.getAvailableLocales();
System.out.println("======System available locales:======== ");
for (int i = 0; i < list.length; i++) {
System.out.println(list[i].toString() + "\t" + list[i].getDisplayName());
}

//print JVM default properties
System.out.println("======System property======== ");
System.getProperties().list(System.out);
}
}

最需要注意的是JVM的file.encoding屬性,這個屬性確定了JVM的預設的編碼/解碼方式:從而影響應用中所有位元組流==>字元流的解碼方式 
字元流==>位元組流的編碼方式。


    LINUX下的LOCALE可以通過 LANG=zh_CN; LC_ALL=zh_CN.GBK;
export LANG LC_ALL 設置。locale 命令可以顯示系統當前的環境設置

    Windows的LOCALE可以通過 控制面板==>區域設置
設置實現















Linux(J2SE1.3.1)
LANG=en_US LC_ALL=en_US
Linux(J2SE1.3.1)
LANG=zh_CN LC_ALL=zh_CN.GBK
Windows(J2SE1.3.0) 區域設置:中國 
中文
Windows(J2SE1.3.0) 區域設置:英國 英文
Hello, it's: Tue Jul 30 11:05:44 CST 2002

======System available locales:======== 

en English

en_US English (United States)

ar Arabic

ar_AE Arabic (United Arab Emirates)

ar_BH Arabic (Bahrain)

ar_DZ Arabic (Algeria)

ar_EG Arabic (Egypt)

ar_IQ Arabic (Iraq)

ar_JO Arabic (Jordan)

ar_KW Arabic (Kuwait)

ar_LB Arabic (Lebanon)

ar_LY Arabic (Libya)

ar_MA Arabic (Morocco)

ar_OM Arabic (Oman)

ar_QA Arabic (Qatar)

ar_SA Arabic (Saudi Arabia)

ar_SD Arabic (Sudan)

ar_SY Arabic (Syria)

ar_TN Arabic (Tunisia)

ar_YE Arabic (Yemen)

be Byelorussian

be_BY Byelorussian (Belarus)

bg Bulgarian

bg_BG Bulgarian (Bulgaria)

ca Catalan

ca_ES Catalan (Spain)

ca_ES_EURO Catalan (Spain,Euro)

cs Czech

cs_CZ Czech (Czech Republic)

da Danish

da_DK Danish (Denmark)

de German

de_AT German (Austria)

de_AT_EURO German (Austria,Euro)

de_CH German (Switzerland)

de_DE German (Germany)

de_DE_EURO German (Germany,Euro)

de_LU German (Luxembourg)

de_LU_EURO German (Luxembourg,Euro)

el Greek

el_GR Greek (Greece)

en_AU English (Australia)

en_CA English (Canada)

en_GB English (United Kingdom)

en_IE English (Ireland)

en_IE_EURO English (Ireland,Euro)

en_NZ English (New Zealand)

en_ZA English (South Africa)

es Spanish

es_BO Spanish (Bolivia)

es_AR Spanish (Argentina)

es_CL Spanish (Chile)

es_CO Spanish (Colombia)

es_CR Spanish (Costa Rica)

es_DO Spanish (Dominican Republic)

es_EC Spanish (Ecuador)

es_ES Spanish (Spain)

es_ES_EURO Spanish (Spain,Euro)

es_GT Spanish (Guatemala)

es_HN Spanish (Honduras)

es_MX Spanish (Mexico)

es_NI Spanish (Nicaragua)

et Estonian

es_PA Spanish (Panama)

es_PE Spanish (Peru)

es_PR Spanish (Puerto Rico)

es_PY Spanish (Paraguay)

es_SV Spanish (El Salvador)

es_UY Spanish (Uruguay)

es_VE Spanish (Venezuela)

et_EE Estonian (Estonia)

fi Finnish

fi_FI Finnish (Finland)

fi_FI_EURO Finnish (Finland,Euro)

fr French

fr_BE French (Belgium)

fr_BE_EURO French (Belgium,Euro)

fr_CA French (Canada)

fr_CH French (Switzerland)

fr_FR French (France)

fr_FR_EURO French (France,Euro)

fr_LU French (Luxembourg)

fr_LU_EURO French (Luxembourg,Euro)

hr Croatian

hr_HR Croatian (Croatia)

hu Hungarian

hu_HU Hungarian (Hungary)

is Icelandic

is_IS Icelandic (Iceland)

it Italian

it_CH Italian (Switzerland)

it_IT Italian (Italy)

it_IT_EURO Italian (Italy,Euro)

iw Hebrew

iw_IL Hebrew (Israel)

ja Japanese

ja_JP Japanese (Japan)

ko Korean

ko_KR Korean (South Korea)

lt Lithuanian

lt_LT Lithuanian (Lithuania)

lv Latvian (Lettish)

lv_LV Latvian (Lettish) (Latvia)

mk Macedonian

mk_MK Macedonian (Macedonia)

nl Dutch

nl_BE Dutch (Belgium)

nl_BE_EURO Dutch (Belgium,Euro)

nl_NL Dutch (Netherlands)

nl_NL_EURO Dutch (Netherlands,Euro)

no Norwegian

no_NO Norwegian (Norway)

no_NO_NY Norwegian (Norway,Nynorsk)

pl Polish

pl_PL Polish (Poland)

pt Portuguese

pt_BR Portuguese (Brazil)

pt_PT Portuguese (Portugal)

pt_PT_EURO Portuguese (Portugal,Euro)

ro Romanian

ro_RO Romanian (Romania)

ru Russian

ru_RU Russian (Russia)

sh Serbo-Croatian

sh_YU Serbo-Croatian (Yugoslavia)

sk Slovak

sk_SK Slovak (Slovakia)

sl Slovenian

sl_SI Slovenian (Slovenia)

sq Albanian

sq_AL Albanian (Albania)

sr Serbian

sr_YU Serbian (Yugoslavia)

sv Swedish

sv_SE Swedish (Sweden)

th Thai

th_TH Thai (Thailand)

tr Turkish

tr_TR Turkish (Turkey)

uk Ukrainian

uk_UA Ukrainian (Ukraine)

zh Chinese

zh_CN Chinese (China)

zh_HK Chinese (Hong Kong)

zh_TW Chinese (Taiwan)

======System property======== 

-- listing properties --

java.runtime.name=Java(TM) 2 Runtime Environment, Stand...

sun.boot.library.path=/usr/java/jdk1.3.1_04/jre/lib/i386

java.vm.version=1.3.1_04-b02

java.vm.vendor=Sun Microsystems Inc.

java.vendor.url=http://java.sun.com/

path.separator=:

java.vm.name=Java HotSpot(TM) Client VM

file.encoding.pkg=sun.io

java.vm.specification.name=Java Virtual Machine Specification

user.dir=/home/chedong/src/char_test

java.runtime.version=1.3.1_04-b02

java.awt.graphicsenv=sun.awt.X11GraphicsEnvironment

os.arch=i386

java.io.tmpdir=/tmp

line.separator=



java.vm.specification.vendor=Sun Microsystems Inc.

java.awt.fonts=

os.name=Linux

java.library.path=/usr/java/jdk1.3.1_04/jre/lib/i386:/u...

java.specification.name=Java Platform API Specification

java.class.version=47.0

os.version=2.4.7-10

user.home=/home/chedong

user.timezone=Asia/Shanghai

java.awt.printerjob=sun.awt.motif.PSPrinterJob

file.encoding=ISO-8859-1

java.specification.version=1.3

user.name=chedong

java.class.path=/home/chedong/classes

java.vm.specification.version=1.0

java.home=/usr/java/jdk1.3.1_04/jre

user.language=en

java.specification.vendor=Sun Microsystems Inc.

java.vm.info=mixed mode

java.version=1.3.1_04

java.ext.dirs=/usr/java/jdk1.3.1_04/jre/lib/ext

sun.boot.class.path=/usr/java/jdk1.3.1_04/jre/lib/rt.jar:...

java.vendor=Sun Microsystems Inc.

file.separator=/

java.vendor.url.bug=http://java.sun.com/cgi-bin/bugreport...

sun.cpu.endian=little

sun.io.unicode.encoding=UnicodeLittle

user.region=US

sun.cpu.isalist=


Hello, it's: Tue Jul 30 11:07:34 CST 2002

======System available locales:======== 

en 英文

en_US 英文 (美國)

ar 阿拉伯文

ar_AE 阿拉伯文 (阿拉伯聯合大公國)

ar_BH 阿拉伯文 (巴林)

ar_DZ 阿拉伯文 (阿爾及利亞)

ar_EG 阿拉伯文 (埃及)

ar_IQ 阿拉伯文 (伊拉克)

ar_JO 阿拉伯文 (約旦)

ar_KW 阿拉伯文 (科威特)

ar_LB 阿拉伯文 (黎巴嫩)

ar_LY 阿拉伯文 (利比亞)

ar_MA 阿拉伯文 (摩洛哥)

ar_OM 阿拉伯文 (阿曼)

ar_QA 阿拉伯文 (卡達)

ar_SA 阿拉伯文 (沙烏地阿拉伯)

ar_SD 阿拉伯文 (蘇丹)

ar_SY 阿拉伯文 (敘利亞)

ar_TN 阿拉伯文 (突尼西亞)

ar_YE 阿拉伯文 (葉門)

be 白俄羅斯文

be_BY 白俄羅斯文 (白俄羅斯)

bg 保加利亞文

bg_BG 保加利亞文 (保加利亞)

ca 加泰羅尼亞文

ca_ES 加泰羅尼亞文 (西班牙)

ca_ES_EURO 加泰羅尼亞文 (西班牙,Euro)

cs 捷克文

cs_CZ 捷克文 (捷克共和國)

da 丹麥文

da_DK 丹麥文 (丹麥)

de 德文

de_AT 德文 (奧地利)

de_AT_EURO 德文 (奧地利,Euro)

de_CH 德文 (瑞士)

de_DE 德文 (德國)

de_DE_EURO 德文 (德國,Euro)

de_LU 德文 (盧森堡)

de_LU_EURO 德文 (盧森堡,Euro)

el 希臘文

el_GR 希臘文 (希臘)

en_AU 英文 (澳大利亞)

en_CA 英文 (加拿大)

en_GB 英文 (英國)

en_IE 英文 (愛爾蘭)

en_IE_EURO 英文 (愛爾蘭,Euro)

en_NZ 英文 (紐西蘭)

en_ZA 英文 (南非)

es 西班牙文

es_BO 西班牙文 (玻利維亞)

es_AR 西班牙文 (阿根廷)

es_CL 西班牙文 (智利)

es_CO 西班牙文 (哥倫比亞)

es_CR 西班牙文 (哥斯大黎加)

es_DO 西班牙文 (多明尼加)

es_EC 西班牙文 (厄瓜多)

es_ES 西班牙文 (西班牙)

es_ES_EURO 西班牙文 (西班牙,Euro)

es_GT 西班牙文 (瓜地馬拉)

es_HN 西班牙文 (宏都拉斯)

es_MX 西班牙文 (墨西哥)

es_NI 西班牙文 (尼加拉瓜)

et 愛沙尼亞文

es_PA 西班牙文 (巴拿馬)

es_PE 西班牙文 (秘魯)

es_PR 西班牙文 (波多黎哥)

es_PY 西班牙文 (巴拉圭)

es_SV 西班牙文 (薩爾瓦多)

es_UY 西班牙文 (烏拉圭)

es_VE 西班牙文 (委內瑞拉)

et_EE 愛沙尼亞文 (愛沙尼亞)

fi 芬蘭文

fi_FI 芬蘭文 (芬蘭)

fi_FI_EURO 芬蘭文 (芬蘭,Euro)

fr 法文

fr_BE 法文 (比利時)

fr_BE_EURO 法文 (比利時,Euro)

fr_CA 法文 (加拿大)

fr_CH 法文 (瑞士)

fr_FR 法文 (法國)

fr_FR_EURO 法文 (法國,Euro)

fr_LU 法文 (盧森堡)

fr_LU_EURO 法文 (盧森堡,Euro)

hr 克羅埃西亞文

hr_HR 克羅埃西亞文 (克羅埃西亞)

hu 匈牙利文

hu_HU 匈牙利文 (匈牙利)

is 冰島文

is_IS 冰島文 (冰島)

it 義大利文

it_CH 義大利文 (瑞士)

it_IT 義大利文 (義大利)

it_IT_EURO 義大利文 (義大利,Euro)

iw 希伯來文

iw_IL 希伯來文 (以色列)

ja 日文

ja_JP 日文 (日本)

ko 朝鮮文

ko_KR 朝鮮文 (南朝鮮)

lt 立陶宛文

lt_LT 立陶宛文 (立陶宛)

lv 拉托維亞文(列托)

lv_LV 拉托維亞文(列托) (拉脫維亞)

mk 馬其頓文

mk_MK 馬其頓文 (馬其頓王國)

nl 荷蘭文

nl_BE 荷蘭文 (比利時)

nl_BE_EURO 荷蘭文 (比利時,Euro)

nl_NL 荷蘭文 (荷蘭)

nl_NL_EURO 荷蘭文 (荷蘭,Euro)

no 挪威文

no_NO 挪威文 (挪威)

no_NO_NY 挪威文 (挪威,Nynorsk)

pl 波蘭文

pl_PL 波蘭文 (波蘭)

pt 葡萄牙文

pt_BR 葡萄牙文 (巴西)

pt_PT 葡萄牙文 (葡萄牙)

pt_PT_EURO 葡萄牙文 (葡萄牙,Euro)

ro 羅馬尼亞文

ro_RO 羅馬尼亞文 (羅馬尼亞)

ru 俄文

ru_RU 俄文 (俄羅斯)

sh 塞波尼斯-克羅埃西亞文

sh_YU 塞波尼斯-克羅埃西亞文 (南斯拉夫)

sk 斯洛伐克文

sk_SK 斯洛伐克文 (斯洛伐克)

sl 斯洛維尼亞文

sl_SI 斯洛維尼亞文 (斯洛維尼亞)

sq 阿爾巴尼亞文

sq_AL 阿爾巴尼亞文 (阿爾巴尼亞)

sr 塞爾維亞文

sr_YU 塞爾維亞文 (南斯拉夫)

sv 瑞典文

sv_SE 瑞典文 (瑞典)

th 泰文

th_TH 泰文 (泰國)

tr 土耳其文

tr_TR 土耳其文 (土耳其)

uk 烏克蘭文

uk_UA 烏克蘭文 (烏克蘭)

zh 中文

zh_CN 中文 (中國)

zh_HK 中文 (香港)

zh_TW 中文 (台灣)

======System property======== 

-- listing properties --

java.runtime.name=Java(TM) 2 Runtime Environment, Stand...

sun.boot.library.path=/usr/java/jdk1.3.1_04/jre/lib/i386

java.vm.version=1.3.1_04-b02

java.vm.vendor=Sun Microsystems Inc.

java.vendor.url=http://java.sun.com/

path.separator=:

java.vm.name=Java HotSpot(TM) Client VM

file.encoding.pkg=sun.io

java.vm.specification.name=Java Virtual Machine Specification

user.dir=/home/chedong/src/char_test

java.runtime.version=1.3.1_04-b02

java.awt.graphicsenv=sun.awt.X11GraphicsEnvironment

os.arch=i386

java.io.tmpdir=/tmp

line.separator=



java.vm.specification.vendor=Sun Microsystems Inc.

java.awt.fonts=

os.name=Linux

java.library.path=/usr/java/jdk1.3.1_04/jre/lib/i386:/u...

java.specification.name=Java Platform API Specification

java.class.version=47.0

os.version=2.4.7-10

user.home=/home/chedong

user.timezone=Asia/Shanghai

java.awt.printerjob=sun.awt.motif.PSPrinterJob

file.encoding=GBK

java.specification.version=1.3

user.name=chedong

java.class.path=/home/chedong/classes

java.vm.specification.version=1.0

java.home=/usr/java/jdk1.3.1_04/jre

user.language=zh

java.specification.vendor=Sun Microsystems Inc.

java.vm.info=mixed mode

java.version=1.3.1_04

java.ext.dirs=/usr/java/jdk1.3.1_04/jre/lib/ext

sun.boot.class.path=/usr/java/jdk1.3.1_04/jre/lib/rt.jar:...

java.vendor=Sun Microsystems Inc.

file.separator=/

java.vendor.url.bug=http://java.sun.com/cgi-bin/bugreport...

sun.cpu.endian=little

sun.io.unicode.encoding=UnicodeLittle

user.region=CN

sun.cpu.isalist=


Hello, it's: Tue Jul 30 11:49:36 CST 2002

======System available locales:======== 

en English

en_US English (United States)

ar Arabic

ar_AE Arabic (United Arab Emirates)

ar_BH Arabic (Bahrain)

ar_DZ Arabic (Algeria)

ar_EG Arabic (Egypt)

ar_IQ Arabic (Iraq)

ar_JO Arabic (Jordan)

ar_KW Arabic (Kuwait)

ar_LB Arabic (Lebanon)

ar_LY Arabic (Libya)

ar_MA Arabic (Morocco)

ar_OM Arabic (Oman)

ar_QA Arabic (Qatar)

ar_SA Arabic (Saudi Arabia)

ar_SD Arabic (Sudan)

ar_SY Arabic (Syria)

ar_TN Arabic (Tunisia)

ar_YE Arabic (Yemen)

be Byelorussian

be_BY Byelorussian (Belarus)

bg Bulgarian

bg_BG Bulgarian (Bulgaria)

ca Catalan

ca_ES Catalan (Spain)

ca_ES_EURO Catalan (Spain,Euro)

cs Czech

cs_CZ Czech (Czech Republic)

da Danish

da_DK Danish (Denmark)

de German

de_AT German (Austria)

de_AT_EURO German (Austria,Euro)

de_CH German (Switzerland)

de_DE German (Germany)

de_DE_EURO German (Germany,Euro)

de_LU German (Luxembourg)

de_LU_EURO German (Luxembourg,Euro)

el Greek

el_GR Greek (Greece)

en_AU English (Australia)

en_CA English (Canada)

en_GB English (United Kingdom)

en_IE English (Ireland)

en_IE_EURO English (Ireland,Euro)

en_NZ English (New Zealand)

en_ZA English (South Africa)

es Spanish

es_AR Spanish (Argentina)

es_BO Spanish (Bolivia)

es_CL Spanish (Chile)

es_CO Spanish (Colombia)

es_CR Spanish (Costa Rica)

es_DO Spanish (Dominican Republic)

es_EC Spanish (Ecuador)

es_ES Spanish (Spain)

es_ES_EURO Spanish (Spain,Euro)

es_GT Spanish (Guatemala)

es_HN Spanish (Honduras)

es_MX Spanish (Mexico)

es_NI Spanish (Nicaragua)

es_PA Spanish (Panama)

es_PE Spanish (Peru)

es_PR Spanish (Puerto Rico)

es_PY Spanish (Paraguay)

es_SV Spanish (El Salvador)

es_UY Spanish (Uruguay)

es_VE Spanish (Venezuela)

et Estonian

et_EE Estonian (Estonia)

fi Finnish

fi_FI Finnish (Finland)

fi_FI_EURO Finnish (Finland,Euro)

fr French

fr_BE French (Belgium)

fr_BE_EURO French (Belgium,Euro)

fr_CA French (Canada)

fr_CH French (Switzerland)

fr_FR French (France)

fr_FR_EURO French (France,Euro)

fr_LU French (Luxembourg)

fr_LU_EURO French (Luxembourg,Euro)

hr Croatian

hr_HR Croatian (Croatia)

hu Hungarian

hu_HU Hungarian (Hungary)

is Icelandic

is_IS Icelandic (Iceland)

it Italian

it_CH Italian (Switzerland)

it_IT Italian (Italy)

it_IT_EURO Italian (Italy,Euro)

iw Hebrew

iw_IL Hebrew (Israel)

ja Japanese

ja_JP Japanese (Japan)

ko 韓文

ko_KR 韓文 (大韓民國)

lt Lithuanian

lt_LT Lithuanian (Lithuania)

lv Latvian (Lettish)

lv_LV Latvian (Lettish) (Latvia)

mk Macedonian

mk_MK Macedonian (Macedonia)

nl Dutch

nl_BE Dutch (Belgium)

nl_BE_EURO Dutch (Belgium,Euro)

nl_NL Dutch (Netherlands)

nl_NL_EURO Dutch (Netherlands,Euro)

no Norwegian

no_NO Norwegian (Norway)

no_NO_NY Norwegian (Norway,Nynorsk)

pl Polish

pl_PL Polish (Poland)

pt Portuguese

pt_BR Portuguese (Brazil)

pt_PT Portuguese (Portugal)

pt_PT_EURO Portuguese (Portugal,Euro)

ro Romanian

ro_RO Romanian (Romania)

ru Russian

ru_RU Russian (Russia)

sh Serbo-Croatian

sh_YU Serbo-Croatian (Yugoslavia)

sk Slovak

sk_SK Slovak (Slovakia)

sl Slovenian

sl_SI Slovenian (Slovenia)

sq Albanian

sq_AL Albanian (Albania)

sr Serbian

sr_YU Serbian (Yugoslavia)

sv Swedish

sv_SE Swedish (Sweden)

th Thai

th_TH Thai (Thailand)

tr Turkish

tr_TR Turkish (Turkey)

uk Ukrainian

uk_UA Ukrainian (Ukraine)

zh 中文

zh_CN 中文 (中華人民共和國)

zh_HK 中文 (香港)

zh_TW 中文 (台灣)

======System property======== 

-- listing properties --

java.runtime.name=Java(TM) 2 Runtime Environment, Stand...

sun.boot.library.path=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_0...

java.vm.version=1.3.0_02

java.vm.vendor=Sun Microsystems Inc.

java.vendor.url=http://java.sun.com/

path.separator=;

java.vm.name=Java HotSpot(TM) Client VM

file.encoding.pkg=sun.io

java.vm.specification.name=Java Virtual Machine Specification

user.dir=D:\java\src\char_test

java.runtime.version=1.3.0_02

java.awt.graphicsenv=sun.awt.Win32GraphicsEnvironment

os.arch=x86

java.io.tmpdir=D:\TEMP\

line.separator=



java.vm.specification.vendor=Sun Microsystems Inc.

java.awt.fonts=

os.name=Windows 98

java.library.path=C:\WINDOWS;.;C:\WINDOWS\SYSTEM;C:\WIN...

java.specification.name=Java Platform API Specification

java.class.version=47.0

os.version=4.90

user.home=C:\WINDOWS

user.timezone=Asia/Shanghai

java.awt.printerjob=sun.awt.windows.WPrinterJob

file.encoding=GBK

java.specification.version=1.3

user.name=Sicci

java.class.path=d:\java\classes

java.vm.specification.version=1.0

java.home=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_02

user.language=zh

java.specification.vendor=Sun Microsystems Inc.

awt.toolkit=sun.awt.windows.WToolkit

java.vm.info=mixed mode

java.version=1.3.0_02

java.ext.dirs=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_0...

sun.boot.class.path=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_0...

java.vendor=Sun Microsystems Inc.

file.separator=\

java.vendor.url.bug=http://java.sun.com/cgi-bin/bugreport...

sun.cpu.endian=little

sun.io.unicode.encoding=UnicodeLittle

user.region=CN

sun.cpu.isalist=pentium i486 i386


Hello, it's: Tue Jul 30 11:53:27 CST 2002

======System available locales:======== 

en English

en_US English (United States)

ar Arabic

ar_AE Arabic (United Arab Emirates)

ar_BH Arabic (Bahrain)

ar_DZ Arabic (Algeria)

ar_EG Arabic (Egypt)

ar_IQ Arabic (Iraq)

ar_JO Arabic (Jordan)

ar_KW Arabic (Kuwait)

ar_LB Arabic (Lebanon)

ar_LY Arabic (Libya)

ar_MA Arabic (Morocco)

ar_OM Arabic (Oman)

ar_QA Arabic (Qatar)

ar_SA Arabic (Saudi Arabia)

ar_SD Arabic (Sudan)

ar_SY Arabic (Syria)

ar_TN Arabic (Tunisia)

ar_YE Arabic (Yemen)

be Byelorussian

be_BY Byelorussian (Belarus)

bg Bulgarian

bg_BG Bulgarian (Bulgaria)

ca Catalan

ca_ES Catalan (Spain)

ca_ES_EURO Catalan (Spain,Euro)

cs Czech

cs_CZ Czech (Czech Republic)

da Danish

da_DK Danish (Denmark)

de German

de_AT German (Austria)

de_AT_EURO German (Austria,Euro)

de_CH German (Switzerland)

de_DE German (Germany)

de_DE_EURO German (Germany,Euro)

de_LU German (Luxembourg)

de_LU_EURO German (Luxembourg,Euro)

el Greek

el_GR Greek (Greece)

en_AU English (Australia)

en_CA English (Canada)

en_GB English (United Kingdom)

en_IE English (Ireland)

en_IE_EURO English (Ireland,Euro)

en_NZ English (New Zealand)

en_ZA English (South Africa)

es Spanish

es_AR Spanish (Argentina)

es_BO Spanish (Bolivia)

es_CL Spanish (Chile)

es_CO Spanish (Colombia)

es_CR Spanish (Costa Rica)

es_DO Spanish (Dominican Republic)

es_EC Spanish (Ecuador)

es_ES Spanish (Spain)

es_ES_EURO Spanish (Spain,Euro)

es_GT Spanish (Guatemala)

es_HN Spanish (Honduras)

es_MX Spanish (Mexico)

es_NI Spanish (Nicaragua)

es_PA Spanish (Panama)

es_PE Spanish (Peru)

es_PR Spanish (Puerto Rico)

es_PY Spanish (Paraguay)

es_SV Spanish (El Salvador)

es_UY Spanish (Uruguay)

es_VE Spanish (Venezuela)

et Estonian

et_EE Estonian (Estonia)

fi Finnish

fi_FI Finnish (Finland)

fi_FI_EURO Finnish (Finland,Euro)

fr French

fr_BE French (Belgium)

fr_BE_EURO French (Belgium,Euro)

fr_CA French (Canada)

fr_CH French (Switzerland)

fr_FR French (France)

fr_FR_EURO French (France,Euro)

fr_LU French (Luxembourg)

fr_LU_EURO French (Luxembourg,Euro)

hr Croatian

hr_HR Croatian (Croatia)

hu Hungarian

hu_HU Hungarian (Hungary)

is Icelandic

is_IS Icelandic (Iceland)

it Italian

it_CH Italian (Switzerland)

it_IT Italian (Italy)

it_IT_EURO Italian (Italy,Euro)

iw Hebrew

iw_IL Hebrew (Israel)

ja Japanese

ja_JP Japanese (Japan)

ko Korean

ko_KR Korean (South Korea)

lt Lithuanian

lt_LT Lithuanian (Lithuania)

lv Latvian (Lettish)

lv_LV Latvian (Lettish) (Latvia)

mk Macedonian

mk_MK Macedonian (Macedonia)

nl Dutch

nl_BE Dutch (Belgium)

nl_BE_EURO Dutch (Belgium,Euro)

nl_NL Dutch (Netherlands)

nl_NL_EURO Dutch (Netherlands,Euro)

no Norwegian

no_NO Norwegian (Norway)

no_NO_NY Norwegian (Norway,Nynorsk)

pl Polish

pl_PL Polish (Poland)

pt Portuguese

pt_BR Portuguese (Brazil)

pt_PT Portuguese (Portugal)

pt_PT_EURO Portuguese (Portugal,Euro)

ro Romanian

ro_RO Romanian (Romania)

ru Russian

ru_RU Russian (Russia)

sh Serbo-Croatian

sh_YU Serbo-Croatian (Yugoslavia)

sk Slovak

sk_SK Slovak (Slovakia)

sl Slovenian

sl_SI Slovenian (Slovenia)

sq Albanian

sq_AL Albanian (Albania)

sr Serbian

sr_YU Serbian (Yugoslavia)

sv Swedish

sv_SE Swedish (Sweden)

th Thai

th_TH Thai (Thailand)

tr Turkish

tr_TR Turkish (Turkey)

uk Ukrainian

uk_UA Ukrainian (Ukraine)

zh Chinese

zh_CN Chinese (China)

zh_HK Chinese (Hong Kong)

zh_TW Chinese (Taiwan)

======System property======== 

-- listing properties --

java.runtime.name=Java(TM) 2 Runtime Environment, Stand...

sun.boot.library.path=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_0...

java.vm.version=1.3.0_02

java.vm.vendor=Sun Microsystems Inc.

java.vendor.url=http://java.sun.com/

path.separator=;

java.vm.name=Java HotSpot(TM) Client VM

file.encoding.pkg=sun.io

java.vm.specification.name=Java Virtual Machine Specification

user.dir=D:\java\src\char_test

java.runtime.version=1.3.0_02

java.awt.graphicsenv=sun.awt.Win32GraphicsEnvironment

os.arch=x86

java.io.tmpdir=D:\TEMP\

line.separator=



java.vm.specification.vendor=Sun Microsystems Inc.

java.awt.fonts=

os.name=Windows 98

java.library.path=C:\WINDOWS;.;C:\WINDOWS\SYSTEM;C:\WIN...

java.specification.name=Java Platform API Specification

java.class.version=47.0

os.version=4.90

user.home=C:\WINDOWS

user.timezone=Asia/Shanghai

java.awt.printerjob=sun.awt.windows.WPrinterJob

file.encoding=Cp1252

java.specification.version=1.3

user.name=Sicci

java.class.path=d:\java\classes

java.vm.specification.version=1.0

java.home=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_02

user.language=en

java.specification.vendor=Sun Microsystems Inc.

awt.toolkit=sun.awt.windows.WToolkit

java.vm.info=mixed mode

java.version=1.3.0_02

java.ext.dirs=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_0...

sun.boot.class.path=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_0...

java.vendor=Sun Microsystems Inc.

file.separator=\

java.vendor.url.bug=http://java.sun.com/cgi-bin/bugreport...

sun.cpu.endian=little

sun.io.unicode.encoding=UnicodeLittle

user.region=GB

sun.cpu.isalist=pentium i486 i386



結論:


JVM的預設編碼方式由系統的LOCALE設置確定,所以當設置成相同的LOCALE時,Linux和Windows下的預設編碼方式是沒有區別的(可以認為cp1252=ISO-8859-1都是一樣的西文編碼方式,只包含255以下的拉丁字元),因此測試2我只列出了LINUX下LOCALE分別設置成zh_CN和en_US測試結果輸出和在WINDOWS下分別按照不同的區域設置試驗的輸出結果是一樣的。


測試程序-2

==========


通過HelloUnicode.java程序,演示說明"Hello
world 世界你好"這個字元串(16個字元)在不同預設系統編碼方式下的處理效果。在編碼解碼的每個步驟之後,都列印出了相應字元串每個字元(charactor)的byte值,short值和所在的UNICODE區間。











Linux(J2SE1.3.1)
LANG=en_US LC_ALL=en_US
Linux(J2SE1.3.1)
LANG=zh_CN LC_ALL=zh_CN.GBK
====write hello world to files======

[test 1-1]: with system default encoding=ISO-8859-1

string=Hello world 世界你好 length=20

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='? byte=-54 short=202 LATIN_1_SUPPLEMENT

char[13]='? byte=-64 short=192 LATIN_1_SUPPLEMENT

char[14]='? byte=-67 short=189 LATIN_1_SUPPLEMENT

char[15]='? byte=-25 short=231 LATIN_1_SUPPLEMENT

char[16]='? byte=-60 short=196 LATIN_1_SUPPLEMENT

char[17]='? byte=-29 short=227 LATIN_1_SUPPLEMENT

char[18]='? byte=-70 short=186 LATIN_1_SUPPLEMENT

char[19]='? byte=-61 short=195 LATIN_1_SUPPLEMENT



第1步:在英文編碼環境下,雖然屏幕上正確的顯示了中文,但實際上它列印的是「半個」漢字,將結果寫入第1個文件
hello.orig.html

[test 1-2]: getBytes with platform default encoding and decoding as gb2312:

string=Hello world ???? length=16

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='?' byte=22 short=19990 CJK_UNIFIED_IDEOGRAPHS

char[13]='?' byte=76 short=30028 CJK_UNIFIED_IDEOGRAPHS

char[14]='?' byte=96 short=20320 CJK_UNIFIED_IDEOGRAPHS

char[15]='?' byte=125 short=22909 CJK_UNIFIED_IDEOGRAPHS



按系統預設編碼重新變成位元組流,然後按照GB2312方式解碼,這裡雖然列印出的是問號(因為在相應環境下系統對於255以上的字元全部用?顯示),但從相應的UNICODE
MAPPING和SHORT值我們可以知道字元是正確的中文

但下一步的寫入第2個文件html.gb2312.html,沒有指定編碼方式(按系統預設的ISO-8859-1編碼方式),因此從後面的測試2-2讀取的結果是真的'?'了


[test 1-3]: convert string to UTF8

string=Hello world 涓???浣?濂 length=24

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='? byte=-28 short=228 LATIN_1_SUPPLEMENT

char[13]='? byte=-72 short=184 LATIN_1_SUPPLEMENT

char[14]='? byte=-106 short=150 LATIN_1_SUPPLEMENT

char[15]='? byte=-25 short=231 LATIN_1_SUPPLEMENT

char[16]='? byte=-107 short=149 LATIN_1_SUPPLEMENT

char[17]='? byte=-116 short=140 LATIN_1_SUPPLEMENT

char[18]='? byte=-28 short=228 LATIN_1_SUPPLEMENT

char[19]='? byte=-67 short=189 LATIN_1_SUPPLEMENT

char[20]='? byte=-96 short=160 LATIN_1_SUPPLEMENT

char[21]='? byte=-27 short=229 LATIN_1_SUPPLEMENT

char[22]='? byte=-91 short=165 LATIN_1_SUPPLEMENT

char[23]='? byte=-67 short=189 LATIN_1_SUPPLEMENT



第3個試驗,將字元流按照UTF8方式編碼后,寫入第3個測試文件hello.utf8.html,我們可以看到UTF8對英文沒有影響,但對於其他文字使用了3位元組編碼方式,因此比GB2312編碼方式的存儲要大50%,


====reading and decoding from files======

[test 2-1]: read hello.orig.html: decoding with system default encoding

string=Hello world 世界你好 length=20

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='? byte=-54 short=202 LATIN_1_SUPPLEMENT

char[13]='? byte=-64 short=192 LATIN_1_SUPPLEMENT

char[14]='? byte=-67 short=189 LATIN_1_SUPPLEMENT

char[15]='? byte=-25 short=231 LATIN_1_SUPPLEMENT

char[16]='? byte=-60 short=196 LATIN_1_SUPPLEMENT

char[17]='? byte=-29 short=227 LATIN_1_SUPPLEMENT

char[18]='? byte=-70 short=186 LATIN_1_SUPPLEMENT

char[19]='? byte=-61 short=195 LATIN_1_SUPPLEMENT



按系統從中間存儲hello.orig.html文件中讀取相應文件,雖然是半個字讀取的,但由於能完整的還原,因此輸出顯示沒有錯誤。

其實PHP等應用很少出現字符集問題其實就是這個原因,全程都是按位元組流方式處理,很好的還原了輸入,但同時也失去了對字元的控制


[test 2-2]: read hello.gb2312.html: decoding as GB2312

string=Hello world ???? length=16

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='?' byte=63 short=63 BASIC_LATIN

char[13]='?' byte=63 short=63 BASIC_LATIN

char[14]='?' byte=63 short=63 BASIC_LATIN

char[15]='?' byte=63 short=63 BASIC_LATIN


這個'?'真的是問號char(63)了,很多數據就是這樣沒救了,



[test 2-3]: read hello.utf8.html: decoding as UTF8

string=Hello world ???? length=16

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='?' byte=22 short=19990 CJK_UNIFIED_IDEOGRAPHS

char[13]='?' byte=76 short=30028 CJK_UNIFIED_IDEOGRAPHS

char[14]='?' byte=96 short=20320 CJK_UNIFIED_IDEOGRAPHS

char[15]='?' byte=125 short=22909 CJK_UNIFIED_IDEOGRAPHS



great!
字元雖然顯示為'?',但實際上字元的解碼是正確的,從相應的UNICODE
MAPPING就可以看的出來。



====write hello world to files======

[test 1-1]: with system default encoding=GBK

string=Hello world 世界你好 length=16

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='世' byte=22 short=19990 CJK_UNIFIED_IDEOGRAPHS

char[13]='界' byte=76 short=30028 CJK_UNIFIED_IDEOGRAPHS

char[14]='你' byte=96 short=20320 CJK_UNIFIED_IDEOGRAPHS

char[15]='好' byte=125 short=22909 CJK_UNIFIED_IDEOGRAPHS

注意:在一個新的LOCALE下需要將源程序重新編譯,最早的位元組流到字元流的解碼過程從JAVAC就開始了



[test 1-2]: getBytes with platform default encoding and decoding as gb2312:

string=Hello world 世界你好 length=16

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='世' byte=22 short=19990 CJK_UNIFIED_IDEOGRAPHS

char[13]='界' byte=76 short=30028 CJK_UNIFIED_IDEOGRAPHS

char[14]='你' byte=96 short=20320 CJK_UNIFIED_IDEOGRAPHS

char[15]='好' byte=125 short=22909 CJK_UNIFIED_IDEOGRAPHS



在中文環境下,和上面預設的編碼解碼結果是一致的


[test 1-3]: convert string to UTF8

string=Hello world 涓???浣?濂 length=18

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='涓' byte=-109 short=28051 CJK_UNIFIED_IDEOGRAPHS

char[13]='?? byte=43 short=26667 CJK_UNIFIED_IDEOGRAPHS

char[14]='??' byte=107 short=26219 CJK_UNIFIED_IDEOGRAPHS

char[15]='浣' byte=99 short=28003 CJK_UNIFIED_IDEOGRAPHS

char[16]='?? byte=-78 short=29362 CJK_UNIFIED_IDEOGRAPHS

char[17]='ソ' byte=-67 short=12477 KATAKANA





====reading and decoding from files======

[test 2-1]: read hello.orig.html: decoding with system default encoding

string=Hello world 世界你好 length=16

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='世' byte=22 short=19990 CJK_UNIFIED_IDEOGRAPHS

char[13]='界' byte=76 short=30028 CJK_UNIFIED_IDEOGRAPHS

char[14]='你' byte=96 short=20320 CJK_UNIFIED_IDEOGRAPHS

char[15]='好' byte=125 short=22909 CJK_UNIFIED_IDEOGRAPHS





[test 2-2]: read hello.gb2312.html: decoding as GB2312

string=Hello world 世界你好 length=16

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='世' byte=22 short=19990 CJK_UNIFIED_IDEOGRAPHS

char[13]='界' byte=76 short=30028 CJK_UNIFIED_IDEOGRAPHS

char[14]='你' byte=96 short=20320 CJK_UNIFIED_IDEOGRAPHS

char[15]='好' byte=125 short=22909 CJK_UNIFIED_IDEOGRAPHS





[test 2-3]: read hello.utf8.html: decoding as UTF8

string=Hello world 世界你好 length=16

char[0]='H' byte=72 short=72 BASIC_LATIN

char[1]='e' byte=101 short=101 BASIC_LATIN

char[2]='l' byte=108 short=108 BASIC_LATIN

char[3]='l' byte=108 short=108 BASIC_LATIN

char[4]='o' byte=111 short=111 BASIC_LATIN

char[5]=' ' byte=32 short=32 BASIC_LATIN

char[6]='w' byte=119 short=119 BASIC_LATIN

char[7]='o' byte=111 short=111 BASIC_LATIN

char[8]='r' byte=114 short=114 BASIC_LATIN

char[9]='l' byte=108 short=108 BASIC_LATIN

char[10]='d' byte=100 short=100 BASIC_LATIN

char[11]=' ' byte=32 short=32 BASIC_LATIN

char[12]='世' byte=22 short=19990 CJK_UNIFIED_IDEOGRAPHS

char[13]='界' byte=76 short=30028 CJK_UNIFIED_IDEOGRAPHS

char[14]='你' byte=96 short=20320 CJK_UNIFIED_IDEOGRAPHS

char[15]='好' byte=125 short=22909 CJK_UNIFIED_IDEOGRAPHS



UNICODE方式的存儲幾乎可以不受環境字符集設置的影響


 

試驗2的一些結論:



  1. 所有的應用都是按照位元組流=>字元流=>位元組流方式進行的處理的:

    byte_stream


[火星人 ] JAVA的中文處理學習筆記已經有1373次圍觀

http://coctec.com/docs/linux/show-post-67752.html