作者: 車東 chedong@bigfoot.com
最後更新:2002-12-30 13:20:57
版權聲明:可以任意轉載,轉載時請務必標明原始出處和作者信息
關鍵詞:linux java mutlibyte encoding locale i18n i10n chinese
內容摘要:通過2個測試程序說明系統預設編碼方式和應用的編碼策略對字元處理的影響,選擇合適的編碼處理策略,構建更符合國際化規範的通用應用。
測試程序-1
==========
為了了解JAVA應用的編碼處理的機制,首先要了解操作系統對JVM預設編碼方式的影響,因此我做了一個Env.java,用於列印顯示不同系統下JVM的屬性和系統支持的LOCALE。程序很簡單:
/*
* Copyright (c) 2002 chedong@bigfoot.com
* $Id: Env.java,v 1.1 2002/07/30 09:48:12 chedong Exp $
*/
import java.util.*;
import java.text.*;
/**
* 目的:
* 顯示環境變數和JVM的預設屬性
* 輸入:無
* 輸出:
* 1 支持的LOCALE
* 2 JVM的預設屬性
*/
public class Env {
/**
* main entrance
*/
public static void main(String[] args) {
System.out.println("Hello, it's: " + new Date());
//print available locales
Locale list[] = DateFormat.getAvailableLocales();
System.out.println("======System available locales:======== ");
for (int i = 0; i < list.length; i++) {
System.out.println(list[i].toString() + "\t" + list[i].getDisplayName());
}
//print JVM default properties
System.out.println("======System property======== ");
System.getProperties().list(System.out);
}
}
最需要注意的是JVM的file.encoding屬性,這個屬性確定了JVM的預設的編碼/解碼方式:從而影響應用中所有位元組流==>字元流的解碼方式
字元流==>位元組流的編碼方式。
LINUX下的LOCALE可以通過 LANG=zh_CN; LC_ALL=zh_CN.GBK;
export LANG LC_ALL 設置。locale 命令可以顯示系統當前的環境設置
Windows的LOCALE可以通過 控制面板==>區域設置
設置實現
Linux(J2SE1.3.1) LANG=en_US LC_ALL=en_US | Linux(J2SE1.3.1) LANG=zh_CN LC_ALL=zh_CN.GBK | Windows(J2SE1.3.0) 區域設置:中國 中文 | Windows(J2SE1.3.0) 區域設置:英國 英文 |
Hello, it's: Tue Jul 30 11:05:44 CST 2002 ======System available locales:======== en English en_US English (United States) ar Arabic ar_AE Arabic (United Arab Emirates) ar_BH Arabic (Bahrain) ar_DZ Arabic (Algeria) ar_EG Arabic (Egypt) ar_IQ Arabic (Iraq) ar_JO Arabic (Jordan) ar_KW Arabic (Kuwait) ar_LB Arabic (Lebanon) ar_LY Arabic (Libya) ar_MA Arabic (Morocco) ar_OM Arabic (Oman) ar_QA Arabic (Qatar) ar_SA Arabic (Saudi Arabia) ar_SD Arabic (Sudan) ar_SY Arabic (Syria) ar_TN Arabic (Tunisia) ar_YE Arabic (Yemen) be Byelorussian be_BY Byelorussian (Belarus) bg Bulgarian bg_BG Bulgarian (Bulgaria) ca Catalan ca_ES Catalan (Spain) ca_ES_EURO Catalan (Spain,Euro) cs Czech cs_CZ Czech (Czech Republic) da Danish da_DK Danish (Denmark) de German de_AT German (Austria) de_AT_EURO German (Austria,Euro) de_CH German (Switzerland) de_DE German (Germany) de_DE_EURO German (Germany,Euro) de_LU German (Luxembourg) de_LU_EURO German (Luxembourg,Euro) el Greek el_GR Greek (Greece) en_AU English (Australia) en_CA English (Canada) en_GB English (United Kingdom) en_IE English (Ireland) en_IE_EURO English (Ireland,Euro) en_NZ English (New Zealand) en_ZA English (South Africa) es Spanish es_BO Spanish (Bolivia) es_AR Spanish (Argentina) es_CL Spanish (Chile) es_CO Spanish (Colombia) es_CR Spanish (Costa Rica) es_DO Spanish (Dominican Republic) es_EC Spanish (Ecuador) es_ES Spanish (Spain) es_ES_EURO Spanish (Spain,Euro) es_GT Spanish (Guatemala) es_HN Spanish (Honduras) es_MX Spanish (Mexico) es_NI Spanish (Nicaragua) et Estonian es_PA Spanish (Panama) es_PE Spanish (Peru) es_PR Spanish (Puerto Rico) es_PY Spanish (Paraguay) es_SV Spanish (El Salvador) es_UY Spanish (Uruguay) es_VE Spanish (Venezuela) et_EE Estonian (Estonia) fi Finnish fi_FI Finnish (Finland) fi_FI_EURO Finnish (Finland,Euro) fr French fr_BE French (Belgium) fr_BE_EURO French (Belgium,Euro) fr_CA French (Canada) fr_CH French (Switzerland) fr_FR French (France) fr_FR_EURO French (France,Euro) fr_LU French (Luxembourg) fr_LU_EURO French (Luxembourg,Euro) hr Croatian hr_HR Croatian (Croatia) hu Hungarian hu_HU Hungarian (Hungary) is Icelandic is_IS Icelandic (Iceland) it Italian it_CH Italian (Switzerland) it_IT Italian (Italy) it_IT_EURO Italian (Italy,Euro) iw Hebrew iw_IL Hebrew (Israel) ja Japanese ja_JP Japanese (Japan) ko Korean ko_KR Korean (South Korea) lt Lithuanian lt_LT Lithuanian (Lithuania) lv Latvian (Lettish) lv_LV Latvian (Lettish) (Latvia) mk Macedonian mk_MK Macedonian (Macedonia) nl Dutch nl_BE Dutch (Belgium) nl_BE_EURO Dutch (Belgium,Euro) nl_NL Dutch (Netherlands) nl_NL_EURO Dutch (Netherlands,Euro) no Norwegian no_NO Norwegian (Norway) no_NO_NY Norwegian (Norway,Nynorsk) pl Polish pl_PL Polish (Poland) pt Portuguese pt_BR Portuguese (Brazil) pt_PT Portuguese (Portugal) pt_PT_EURO Portuguese (Portugal,Euro) ro Romanian ro_RO Romanian (Romania) ru Russian ru_RU Russian (Russia) sh Serbo-Croatian sh_YU Serbo-Croatian (Yugoslavia) sk Slovak sk_SK Slovak (Slovakia) sl Slovenian sl_SI Slovenian (Slovenia) sq Albanian sq_AL Albanian (Albania) sr Serbian sr_YU Serbian (Yugoslavia) sv Swedish sv_SE Swedish (Sweden) th Thai th_TH Thai (Thailand) tr Turkish tr_TR Turkish (Turkey) uk Ukrainian uk_UA Ukrainian (Ukraine) zh Chinese zh_CN Chinese (China) zh_HK Chinese (Hong Kong) zh_TW Chinese (Taiwan) ======System property======== -- listing properties -- java.runtime.name=Java(TM) 2 Runtime Environment, Stand... sun.boot.library.path=/usr/java/jdk1.3.1_04/jre/lib/i386 java.vm.version=1.3.1_04-b02 java.vm.vendor=Sun Microsystems Inc. java.vendor.url=http://java.sun.com/ path.separator=: java.vm.name=Java HotSpot(TM) Client VM file.encoding.pkg=sun.io java.vm.specification.name=Java Virtual Machine Specification user.dir=/home/chedong/src/char_test java.runtime.version=1.3.1_04-b02 java.awt.graphicsenv=sun.awt.X11GraphicsEnvironment os.arch=i386 java.io.tmpdir=/tmp line.separator= java.vm.specification.vendor=Sun Microsystems Inc. java.awt.fonts= os.name=Linux java.library.path=/usr/java/jdk1.3.1_04/jre/lib/i386:/u... java.specification.name=Java Platform API Specification java.class.version=47.0 os.version=2.4.7-10 user.home=/home/chedong user.timezone=Asia/Shanghai java.awt.printerjob=sun.awt.motif.PSPrinterJob file.encoding=ISO-8859-1 java.specification.version=1.3 user.name=chedong java.class.path=/home/chedong/classes java.vm.specification.version=1.0 java.home=/usr/java/jdk1.3.1_04/jre user.language=en java.specification.vendor=Sun Microsystems Inc. java.vm.info=mixed mode java.version=1.3.1_04 java.ext.dirs=/usr/java/jdk1.3.1_04/jre/lib/ext sun.boot.class.path=/usr/java/jdk1.3.1_04/jre/lib/rt.jar:... java.vendor=Sun Microsystems Inc. file.separator=/ java.vendor.url.bug=http://java.sun.com/cgi-bin/bugreport... sun.cpu.endian=little sun.io.unicode.encoding=UnicodeLittle user.region=US sun.cpu.isalist= | Hello, it's: Tue Jul 30 11:07:34 CST 2002 ======System available locales:======== en 英文 en_US 英文 (美國) ar 阿拉伯文 ar_AE 阿拉伯文 (阿拉伯聯合大公國) ar_BH 阿拉伯文 (巴林) ar_DZ 阿拉伯文 (阿爾及利亞) ar_EG 阿拉伯文 (埃及) ar_IQ 阿拉伯文 (伊拉克) ar_JO 阿拉伯文 (約旦) ar_KW 阿拉伯文 (科威特) ar_LB 阿拉伯文 (黎巴嫩) ar_LY 阿拉伯文 (利比亞) ar_MA 阿拉伯文 (摩洛哥) ar_OM 阿拉伯文 (阿曼) ar_QA 阿拉伯文 (卡達) ar_SA 阿拉伯文 (沙烏地阿拉伯) ar_SD 阿拉伯文 (蘇丹) ar_SY 阿拉伯文 (敘利亞) ar_TN 阿拉伯文 (突尼西亞) ar_YE 阿拉伯文 (葉門) be 白俄羅斯文 be_BY 白俄羅斯文 (白俄羅斯) bg 保加利亞文 bg_BG 保加利亞文 (保加利亞) ca 加泰羅尼亞文 ca_ES 加泰羅尼亞文 (西班牙) ca_ES_EURO 加泰羅尼亞文 (西班牙,Euro) cs 捷克文 cs_CZ 捷克文 (捷克共和國) da 丹麥文 da_DK 丹麥文 (丹麥) de 德文 de_AT 德文 (奧地利) de_AT_EURO 德文 (奧地利,Euro) de_CH 德文 (瑞士) de_DE 德文 (德國) de_DE_EURO 德文 (德國,Euro) de_LU 德文 (盧森堡) de_LU_EURO 德文 (盧森堡,Euro) el 希臘文 el_GR 希臘文 (希臘) en_AU 英文 (澳大利亞) en_CA 英文 (加拿大) en_GB 英文 (英國) en_IE 英文 (愛爾蘭) en_IE_EURO 英文 (愛爾蘭,Euro) en_NZ 英文 (紐西蘭) en_ZA 英文 (南非) es 西班牙文 es_BO 西班牙文 (玻利維亞) es_AR 西班牙文 (阿根廷) es_CL 西班牙文 (智利) es_CO 西班牙文 (哥倫比亞) es_CR 西班牙文 (哥斯大黎加) es_DO 西班牙文 (多明尼加) es_EC 西班牙文 (厄瓜多) es_ES 西班牙文 (西班牙) es_ES_EURO 西班牙文 (西班牙,Euro) es_GT 西班牙文 (瓜地馬拉) es_HN 西班牙文 (宏都拉斯) es_MX 西班牙文 (墨西哥) es_NI 西班牙文 (尼加拉瓜) et 愛沙尼亞文 es_PA 西班牙文 (巴拿馬) es_PE 西班牙文 (秘魯) es_PR 西班牙文 (波多黎哥) es_PY 西班牙文 (巴拉圭) es_SV 西班牙文 (薩爾瓦多) es_UY 西班牙文 (烏拉圭) es_VE 西班牙文 (委內瑞拉) et_EE 愛沙尼亞文 (愛沙尼亞) fi 芬蘭文 fi_FI 芬蘭文 (芬蘭) fi_FI_EURO 芬蘭文 (芬蘭,Euro) fr 法文 fr_BE 法文 (比利時) fr_BE_EURO 法文 (比利時,Euro) fr_CA 法文 (加拿大) fr_CH 法文 (瑞士) fr_FR 法文 (法國) fr_FR_EURO 法文 (法國,Euro) fr_LU 法文 (盧森堡) fr_LU_EURO 法文 (盧森堡,Euro) hr 克羅埃西亞文 hr_HR 克羅埃西亞文 (克羅埃西亞) hu 匈牙利文 hu_HU 匈牙利文 (匈牙利) is 冰島文 is_IS 冰島文 (冰島) it 義大利文 it_CH 義大利文 (瑞士) it_IT 義大利文 (義大利) it_IT_EURO 義大利文 (義大利,Euro) iw 希伯來文 iw_IL 希伯來文 (以色列) ja 日文 ja_JP 日文 (日本) ko 朝鮮文 ko_KR 朝鮮文 (南朝鮮) lt 立陶宛文 lt_LT 立陶宛文 (立陶宛) lv 拉托維亞文(列托) lv_LV 拉托維亞文(列托) (拉脫維亞) mk 馬其頓文 mk_MK 馬其頓文 (馬其頓王國) nl 荷蘭文 nl_BE 荷蘭文 (比利時) nl_BE_EURO 荷蘭文 (比利時,Euro) nl_NL 荷蘭文 (荷蘭) nl_NL_EURO 荷蘭文 (荷蘭,Euro) no 挪威文 no_NO 挪威文 (挪威) no_NO_NY 挪威文 (挪威,Nynorsk) pl 波蘭文 pl_PL 波蘭文 (波蘭) pt 葡萄牙文 pt_BR 葡萄牙文 (巴西) pt_PT 葡萄牙文 (葡萄牙) pt_PT_EURO 葡萄牙文 (葡萄牙,Euro) ro 羅馬尼亞文 ro_RO 羅馬尼亞文 (羅馬尼亞) ru 俄文 ru_RU 俄文 (俄羅斯) sh 塞波尼斯-克羅埃西亞文 sh_YU 塞波尼斯-克羅埃西亞文 (南斯拉夫) sk 斯洛伐克文 sk_SK 斯洛伐克文 (斯洛伐克) sl 斯洛維尼亞文 sl_SI 斯洛維尼亞文 (斯洛維尼亞) sq 阿爾巴尼亞文 sq_AL 阿爾巴尼亞文 (阿爾巴尼亞) sr 塞爾維亞文 sr_YU 塞爾維亞文 (南斯拉夫) sv 瑞典文 sv_SE 瑞典文 (瑞典) th 泰文 th_TH 泰文 (泰國) tr 土耳其文 tr_TR 土耳其文 (土耳其) uk 烏克蘭文 uk_UA 烏克蘭文 (烏克蘭) zh 中文 zh_CN 中文 (中國) zh_HK 中文 (香港) zh_TW 中文 (台灣) ======System property======== -- listing properties -- java.runtime.name=Java(TM) 2 Runtime Environment, Stand... sun.boot.library.path=/usr/java/jdk1.3.1_04/jre/lib/i386 java.vm.version=1.3.1_04-b02 java.vm.vendor=Sun Microsystems Inc. java.vendor.url=http://java.sun.com/ path.separator=: java.vm.name=Java HotSpot(TM) Client VM file.encoding.pkg=sun.io java.vm.specification.name=Java Virtual Machine Specification user.dir=/home/chedong/src/char_test java.runtime.version=1.3.1_04-b02 java.awt.graphicsenv=sun.awt.X11GraphicsEnvironment os.arch=i386 java.io.tmpdir=/tmp line.separator= java.vm.specification.vendor=Sun Microsystems Inc. java.awt.fonts= os.name=Linux java.library.path=/usr/java/jdk1.3.1_04/jre/lib/i386:/u... java.specification.name=Java Platform API Specification java.class.version=47.0 os.version=2.4.7-10 user.home=/home/chedong user.timezone=Asia/Shanghai java.awt.printerjob=sun.awt.motif.PSPrinterJob file.encoding=GBK java.specification.version=1.3 user.name=chedong java.class.path=/home/chedong/classes java.vm.specification.version=1.0 java.home=/usr/java/jdk1.3.1_04/jre user.language=zh java.specification.vendor=Sun Microsystems Inc. java.vm.info=mixed mode java.version=1.3.1_04 java.ext.dirs=/usr/java/jdk1.3.1_04/jre/lib/ext sun.boot.class.path=/usr/java/jdk1.3.1_04/jre/lib/rt.jar:... java.vendor=Sun Microsystems Inc. file.separator=/ java.vendor.url.bug=http://java.sun.com/cgi-bin/bugreport... sun.cpu.endian=little sun.io.unicode.encoding=UnicodeLittle user.region=CN sun.cpu.isalist= | Hello, it's: Tue Jul 30 11:49:36 CST 2002 ======System available locales:======== en English en_US English (United States) ar Arabic ar_AE Arabic (United Arab Emirates) ar_BH Arabic (Bahrain) ar_DZ Arabic (Algeria) ar_EG Arabic (Egypt) ar_IQ Arabic (Iraq) ar_JO Arabic (Jordan) ar_KW Arabic (Kuwait) ar_LB Arabic (Lebanon) ar_LY Arabic (Libya) ar_MA Arabic (Morocco) ar_OM Arabic (Oman) ar_QA Arabic (Qatar) ar_SA Arabic (Saudi Arabia) ar_SD Arabic (Sudan) ar_SY Arabic (Syria) ar_TN Arabic (Tunisia) ar_YE Arabic (Yemen) be Byelorussian be_BY Byelorussian (Belarus) bg Bulgarian bg_BG Bulgarian (Bulgaria) ca Catalan ca_ES Catalan (Spain) ca_ES_EURO Catalan (Spain,Euro) cs Czech cs_CZ Czech (Czech Republic) da Danish da_DK Danish (Denmark) de German de_AT German (Austria) de_AT_EURO German (Austria,Euro) de_CH German (Switzerland) de_DE German (Germany) de_DE_EURO German (Germany,Euro) de_LU German (Luxembourg) de_LU_EURO German (Luxembourg,Euro) el Greek el_GR Greek (Greece) en_AU English (Australia) en_CA English (Canada) en_GB English (United Kingdom) en_IE English (Ireland) en_IE_EURO English (Ireland,Euro) en_NZ English (New Zealand) en_ZA English (South Africa) es Spanish es_AR Spanish (Argentina) es_BO Spanish (Bolivia) es_CL Spanish (Chile) es_CO Spanish (Colombia) es_CR Spanish (Costa Rica) es_DO Spanish (Dominican Republic) es_EC Spanish (Ecuador) es_ES Spanish (Spain) es_ES_EURO Spanish (Spain,Euro) es_GT Spanish (Guatemala) es_HN Spanish (Honduras) es_MX Spanish (Mexico) es_NI Spanish (Nicaragua) es_PA Spanish (Panama) es_PE Spanish (Peru) es_PR Spanish (Puerto Rico) es_PY Spanish (Paraguay) es_SV Spanish (El Salvador) es_UY Spanish (Uruguay) es_VE Spanish (Venezuela) et Estonian et_EE Estonian (Estonia) fi Finnish fi_FI Finnish (Finland) fi_FI_EURO Finnish (Finland,Euro) fr French fr_BE French (Belgium) fr_BE_EURO French (Belgium,Euro) fr_CA French (Canada) fr_CH French (Switzerland) fr_FR French (France) fr_FR_EURO French (France,Euro) fr_LU French (Luxembourg) fr_LU_EURO French (Luxembourg,Euro) hr Croatian hr_HR Croatian (Croatia) hu Hungarian hu_HU Hungarian (Hungary) is Icelandic is_IS Icelandic (Iceland) it Italian it_CH Italian (Switzerland) it_IT Italian (Italy) it_IT_EURO Italian (Italy,Euro) iw Hebrew iw_IL Hebrew (Israel) ja Japanese ja_JP Japanese (Japan) ko 韓文 ko_KR 韓文 (大韓民國) lt Lithuanian lt_LT Lithuanian (Lithuania) lv Latvian (Lettish) lv_LV Latvian (Lettish) (Latvia) mk Macedonian mk_MK Macedonian (Macedonia) nl Dutch nl_BE Dutch (Belgium) nl_BE_EURO Dutch (Belgium,Euro) nl_NL Dutch (Netherlands) nl_NL_EURO Dutch (Netherlands,Euro) no Norwegian no_NO Norwegian (Norway) no_NO_NY Norwegian (Norway,Nynorsk) pl Polish pl_PL Polish (Poland) pt Portuguese pt_BR Portuguese (Brazil) pt_PT Portuguese (Portugal) pt_PT_EURO Portuguese (Portugal,Euro) ro Romanian ro_RO Romanian (Romania) ru Russian ru_RU Russian (Russia) sh Serbo-Croatian sh_YU Serbo-Croatian (Yugoslavia) sk Slovak sk_SK Slovak (Slovakia) sl Slovenian sl_SI Slovenian (Slovenia) sq Albanian sq_AL Albanian (Albania) sr Serbian sr_YU Serbian (Yugoslavia) sv Swedish sv_SE Swedish (Sweden) th Thai th_TH Thai (Thailand) tr Turkish tr_TR Turkish (Turkey) uk Ukrainian uk_UA Ukrainian (Ukraine) zh 中文 zh_CN 中文 (中華人民共和國) zh_HK 中文 (香港) zh_TW 中文 (台灣) ======System property======== -- listing properties -- java.runtime.name=Java(TM) 2 Runtime Environment, Stand... sun.boot.library.path=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_0... java.vm.version=1.3.0_02 java.vm.vendor=Sun Microsystems Inc. java.vendor.url=http://java.sun.com/ path.separator=; java.vm.name=Java HotSpot(TM) Client VM file.encoding.pkg=sun.io java.vm.specification.name=Java Virtual Machine Specification user.dir=D:\java\src\char_test java.runtime.version=1.3.0_02 java.awt.graphicsenv=sun.awt.Win32GraphicsEnvironment os.arch=x86 java.io.tmpdir=D:\TEMP\ line.separator= java.vm.specification.vendor=Sun Microsystems Inc. java.awt.fonts= os.name=Windows 98 java.library.path=C:\WINDOWS;.;C:\WINDOWS\SYSTEM;C:\WIN... java.specification.name=Java Platform API Specification java.class.version=47.0 os.version=4.90 user.home=C:\WINDOWS user.timezone=Asia/Shanghai java.awt.printerjob=sun.awt.windows.WPrinterJob file.encoding=GBK java.specification.version=1.3 user.name=Sicci java.class.path=d:\java\classes java.vm.specification.version=1.0 java.home=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_02 user.language=zh java.specification.vendor=Sun Microsystems Inc. awt.toolkit=sun.awt.windows.WToolkit java.vm.info=mixed mode java.version=1.3.0_02 java.ext.dirs=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_0... sun.boot.class.path=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_0... java.vendor=Sun Microsystems Inc. file.separator=\ java.vendor.url.bug=http://java.sun.com/cgi-bin/bugreport... sun.cpu.endian=little sun.io.unicode.encoding=UnicodeLittle user.region=CN sun.cpu.isalist=pentium i486 i386 | Hello, it's: Tue Jul 30 11:53:27 CST 2002 ======System available locales:======== en English en_US English (United States) ar Arabic ar_AE Arabic (United Arab Emirates) ar_BH Arabic (Bahrain) ar_DZ Arabic (Algeria) ar_EG Arabic (Egypt) ar_IQ Arabic (Iraq) ar_JO Arabic (Jordan) ar_KW Arabic (Kuwait) ar_LB Arabic (Lebanon) ar_LY Arabic (Libya) ar_MA Arabic (Morocco) ar_OM Arabic (Oman) ar_QA Arabic (Qatar) ar_SA Arabic (Saudi Arabia) ar_SD Arabic (Sudan) ar_SY Arabic (Syria) ar_TN Arabic (Tunisia) ar_YE Arabic (Yemen) be Byelorussian be_BY Byelorussian (Belarus) bg Bulgarian bg_BG Bulgarian (Bulgaria) ca Catalan ca_ES Catalan (Spain) ca_ES_EURO Catalan (Spain,Euro) cs Czech cs_CZ Czech (Czech Republic) da Danish da_DK Danish (Denmark) de German de_AT German (Austria) de_AT_EURO German (Austria,Euro) de_CH German (Switzerland) de_DE German (Germany) de_DE_EURO German (Germany,Euro) de_LU German (Luxembourg) de_LU_EURO German (Luxembourg,Euro) el Greek el_GR Greek (Greece) en_AU English (Australia) en_CA English (Canada) en_GB English (United Kingdom) en_IE English (Ireland) en_IE_EURO English (Ireland,Euro) en_NZ English (New Zealand) en_ZA English (South Africa) es Spanish es_AR Spanish (Argentina) es_BO Spanish (Bolivia) es_CL Spanish (Chile) es_CO Spanish (Colombia) es_CR Spanish (Costa Rica) es_DO Spanish (Dominican Republic) es_EC Spanish (Ecuador) es_ES Spanish (Spain) es_ES_EURO Spanish (Spain,Euro) es_GT Spanish (Guatemala) es_HN Spanish (Honduras) es_MX Spanish (Mexico) es_NI Spanish (Nicaragua) es_PA Spanish (Panama) es_PE Spanish (Peru) es_PR Spanish (Puerto Rico) es_PY Spanish (Paraguay) es_SV Spanish (El Salvador) es_UY Spanish (Uruguay) es_VE Spanish (Venezuela) et Estonian et_EE Estonian (Estonia) fi Finnish fi_FI Finnish (Finland) fi_FI_EURO Finnish (Finland,Euro) fr French fr_BE French (Belgium) fr_BE_EURO French (Belgium,Euro) fr_CA French (Canada) fr_CH French (Switzerland) fr_FR French (France) fr_FR_EURO French (France,Euro) fr_LU French (Luxembourg) fr_LU_EURO French (Luxembourg,Euro) hr Croatian hr_HR Croatian (Croatia) hu Hungarian hu_HU Hungarian (Hungary) is Icelandic is_IS Icelandic (Iceland) it Italian it_CH Italian (Switzerland) it_IT Italian (Italy) it_IT_EURO Italian (Italy,Euro) iw Hebrew iw_IL Hebrew (Israel) ja Japanese ja_JP Japanese (Japan) ko Korean ko_KR Korean (South Korea) lt Lithuanian lt_LT Lithuanian (Lithuania) lv Latvian (Lettish) lv_LV Latvian (Lettish) (Latvia) mk Macedonian mk_MK Macedonian (Macedonia) nl Dutch nl_BE Dutch (Belgium) nl_BE_EURO Dutch (Belgium,Euro) nl_NL Dutch (Netherlands) nl_NL_EURO Dutch (Netherlands,Euro) no Norwegian no_NO Norwegian (Norway) no_NO_NY Norwegian (Norway,Nynorsk) pl Polish pl_PL Polish (Poland) pt Portuguese pt_BR Portuguese (Brazil) pt_PT Portuguese (Portugal) pt_PT_EURO Portuguese (Portugal,Euro) ro Romanian ro_RO Romanian (Romania) ru Russian ru_RU Russian (Russia) sh Serbo-Croatian sh_YU Serbo-Croatian (Yugoslavia) sk Slovak sk_SK Slovak (Slovakia) sl Slovenian sl_SI Slovenian (Slovenia) sq Albanian sq_AL Albanian (Albania) sr Serbian sr_YU Serbian (Yugoslavia) sv Swedish sv_SE Swedish (Sweden) th Thai th_TH Thai (Thailand) tr Turkish tr_TR Turkish (Turkey) uk Ukrainian uk_UA Ukrainian (Ukraine) zh Chinese zh_CN Chinese (China) zh_HK Chinese (Hong Kong) zh_TW Chinese (Taiwan) ======System property======== -- listing properties -- java.runtime.name=Java(TM) 2 Runtime Environment, Stand... sun.boot.library.path=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_0... java.vm.version=1.3.0_02 java.vm.vendor=Sun Microsystems Inc. java.vendor.url=http://java.sun.com/ path.separator=; java.vm.name=Java HotSpot(TM) Client VM file.encoding.pkg=sun.io java.vm.specification.name=Java Virtual Machine Specification user.dir=D:\java\src\char_test java.runtime.version=1.3.0_02 java.awt.graphicsenv=sun.awt.Win32GraphicsEnvironment os.arch=x86 java.io.tmpdir=D:\TEMP\ line.separator= java.vm.specification.vendor=Sun Microsystems Inc. java.awt.fonts= os.name=Windows 98 java.library.path=C:\WINDOWS;.;C:\WINDOWS\SYSTEM;C:\WIN... java.specification.name=Java Platform API Specification java.class.version=47.0 os.version=4.90 user.home=C:\WINDOWS user.timezone=Asia/Shanghai java.awt.printerjob=sun.awt.windows.WPrinterJob file.encoding=Cp1252 java.specification.version=1.3 user.name=Sicci java.class.path=d:\java\classes java.vm.specification.version=1.0 java.home=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_02 user.language=en java.specification.vendor=Sun Microsystems Inc. awt.toolkit=sun.awt.windows.WToolkit java.vm.info=mixed mode java.version=1.3.0_02 java.ext.dirs=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_0... sun.boot.class.path=C:\PROGRAM FILES\JAVASOFT\JRE\1.3.0_0... java.vendor=Sun Microsystems Inc. file.separator=\ java.vendor.url.bug=http://java.sun.com/cgi-bin/bugreport... sun.cpu.endian=little sun.io.unicode.encoding=UnicodeLittle user.region=GB sun.cpu.isalist=pentium i486 i386 |
結論:
JVM的預設編碼方式由系統的LOCALE設置確定,所以當設置成相同的LOCALE時,Linux和Windows下的預設編碼方式是沒有區別的(可以認為cp1252=ISO-8859-1都是一樣的西文編碼方式,只包含255以下的拉丁字元),因此測試2我只列出了LINUX下LOCALE分別設置成zh_CN和en_US測試結果輸出和在WINDOWS下分別按照不同的區域設置試驗的輸出結果是一樣的。
測試程序-2
==========
通過HelloUnicode.java程序,演示說明"Hello
world 世界你好"這個字元串(16個字元)在不同預設系統編碼方式下的處理效果。在編碼解碼的每個步驟之後,都列印出了相應字元串每個字元(charactor)的byte值,short值和所在的UNICODE區間。
Linux(J2SE1.3.1) LANG=en_US LC_ALL=en_US | Linux(J2SE1.3.1) LANG=zh_CN LC_ALL=zh_CN.GBK |
====write hello world to files====== [test 1-1]: with system default encoding=ISO-8859-1 string=Hello world 世界你好 length=20 char[0]='H' byte=72 short=72 BASIC_LATIN char[1]='e' byte=101 short=101 BASIC_LATIN char[2]='l' byte=108 short=108 BASIC_LATIN char[3]='l' byte=108 short=108 BASIC_LATIN char[4]='o' byte=111 short=111 BASIC_LATIN char[5]=' ' byte=32 short=32 BASIC_LATIN char[6]='w' byte=119 short=119 BASIC_LATIN char[7]='o' byte=111 short=111 BASIC_LATIN char[8]='r' byte=114 short=114 BASIC_LATIN char[9]='l' byte=108 short=108 BASIC_LATIN char[10]='d' byte=100 short=100 BASIC_LATIN char[11]=' ' byte=32 short=32 BASIC_LATIN char[12]='? byte=-54 short=202 LATIN_1_SUPPLEMENT char[13]='? byte=-64 short=192 LATIN_1_SUPPLEMENT char[14]='? byte=-67 short=189 LATIN_1_SUPPLEMENT char[15]='? byte=-25 short=231 LATIN_1_SUPPLEMENT char[16]='? byte=-60 short=196 LATIN_1_SUPPLEMENT char[17]='? byte=-29 short=227 LATIN_1_SUPPLEMENT char[18]='? byte=-70 short=186 LATIN_1_SUPPLEMENT char[19]='? byte=-61 short=195 LATIN_1_SUPPLEMENT 第1步:在英文編碼環境下,雖然屏幕上正確的顯示了中文,但實際上它列印的是「半個」漢字,將結果寫入第1個文件 hello.orig.html [test 1-2]: getBytes with platform default encoding and decoding as gb2312: [test 1-3]: convert string to UTF8 ====reading and decoding from files====== [test 2-2]: read hello.gb2312.html: decoding as GB2312 這個'?'真的是問號char(63)了,很多數據就是這樣沒救了, | ====write hello world to files====== [test 1-1]: with system default encoding=GBK string=Hello world 世界你好 length=16 char[0]='H' byte=72 short=72 BASIC_LATIN char[1]='e' byte=101 short=101 BASIC_LATIN char[2]='l' byte=108 short=108 BASIC_LATIN char[3]='l' byte=108 short=108 BASIC_LATIN char[4]='o' byte=111 short=111 BASIC_LATIN char[5]=' ' byte=32 short=32 BASIC_LATIN char[6]='w' byte=119 short=119 BASIC_LATIN char[7]='o' byte=111 short=111 BASIC_LATIN char[8]='r' byte=114 short=114 BASIC_LATIN char[9]='l' byte=108 short=108 BASIC_LATIN char[10]='d' byte=100 short=100 BASIC_LATIN char[11]=' ' byte=32 short=32 BASIC_LATIN char[12]='世' byte=22 short=19990 CJK_UNIFIED_IDEOGRAPHS char[13]='界' byte=76 short=30028 CJK_UNIFIED_IDEOGRAPHS char[14]='你' byte=96 short=20320 CJK_UNIFIED_IDEOGRAPHS char[15]='好' byte=125 short=22909 CJK_UNIFIED_IDEOGRAPHS 注意:在一個新的LOCALE下需要將源程序重新編譯,最早的位元組流到字元流的解碼過程從JAVAC就開始了 [test 1-3]: convert string to UTF8 |
試驗2的一些結論:
[火星人 ] JAVA的中文處理學習筆記已經有1373次圍觀