Definition
Incomputerhardware,codingreferstotheuseofcodestorepresentgroupsofdata,makingitinformationthatcanbeprocessedandanalyzedbyacomputer.Codeisasymbolusedtorepresentthings.Itcanberepresentedbynumbers,letters,specialsymbolsoracombinationofthem
Convertdataintocodesorcodedcharacters,andcanbetranslatedintotheoriginaldataform.Itistheprocessofcomputerwritinginstructions,partoftheprogramdesign.Inautomaticmapmapping,theprocessofexpressingthecontentofthemapwithnumbersandlettersaccordingtocertainrules,throughcoding,enablesthecomputertoidentifythegeographicalelementsofthemap.
N-bitbinarynumberscanbecombinedinto2tothen-thpowerofdifferentinformation,andaspecificcodegroupisspecifiedforeachinformation.Thisprocessisalsocalledencoding.
Therearetwotypesofcodescommonlyusedindigitalsystems,oneisbinarycode,andtheotherisdecimalcode.
Codingknowledge
Introduction
GB2312andGBKarecommonlyusedinGBcodingstandards.GB2312isasubsetofGBK,andGB2312codingrangeItis0xA1A1-0xFEFE,ifpureGB2312encoding,itisverysimpletodealwith,buttherearesomesmallhintswhendealingwithGBKcharacterset,let'stalkaboutthestandardofGBKencodingfirst:
GBKusesdoublebytesIndicatesthattheoverallencodingrangeis8140-FEFE,thefirstbyteisbetween81-FE,andthelastbyteisbetween40-FE,andthexx7Flineisexcluded.Atotalof23940codepoints,atotalof21886Chinesecharactersandgraphicsymbols,including21,003Chinesecharacters(includingradicalsandcomponents),and883graphicsymbols.
Encodingclassification
1.Chinesecharacterarea.Including:
a.GB2312Chinesecharacterarea.Thatis,GBK/2:B0A1-F7FE.6763ChinesecharactersofGB2312areincludedintheoriginalorder.
b.GB13000.1expandstheChinesecharacterarea.Including:
(1)GBK/3:8140-A0FE.Itcontains6080CJKChinesecharactersinGB13000.1.
(2)GBK/4:AA40-FEA0.Contains8160CJKChinesecharactersandsupplementaryChinesecharacters.
CJKChinesecharactersarearrangedinthefrontaccordingtotheUCScodesize;supplementaryChinesecharacters(includingradicalsandcomponents)arearrangedinthebackaccordingtothepagenumber/characterpositionof"KangxiDictionary".
2.Graphicsymbolarea.Including:
a.GB2312non-Chinesecharacternumberarea.Thatis,GBK/1:A1A1-A9FE.InadditiontothesymbolsofGB2312,
Therearealso10lowercaseRomannumeralsandGB12345supplementarysymbols.717symbolsarecounted.
b.GB13000.1expandsthenon-Chinesecharacterarea.Thatis,GBK/5:A840-A9A0.BIG-5non-Chinesecharactersymbols,structurecharactersand"○"arearrangedinthisarea.166symbolsarecounted.
3.User-definedarea:Itisdividedinto(1)(2)(3)threedistricts.
(1)AAA1-AFFE,with564codepoints.
(2)F8A1-FEFE,with658codepoints.
(3)A140-A7A0,with672codepoints.
Althougharea(3)isopentousers,itsuseisrestrictedbecauseitdoesnotruleoutthepossibilityofaddingnewcharactersinthisareainthefuture.
Hereareafewtips:
First,inphp,thecharacterencodingisbasedontheencodingsent,sowhatisusedisTheencodingenteredbytheuserwillnotbechangedautomatically,butinasp,thedefaultencodingisunicode,sowecaneasilygettheencodingcomparisontableofgbk->unicode,sothatitcanbedoneevenifthereisnobasiclibraryIt’seasytoconvertfromgbktoutf-8;
2.BecauseGBKisthehighestandlowestvalueis0x40,whichis64,sometimeswhenorganizingsomestringsinvolvingChinese,thecharactersaredividedItisbesttouseasciicodesbefore64,sothattherewillbenogarbledcharactersinreplacementorsegmentationunderanycircumstances.Themorecommonlyusedonesare",",";",":","","","",thesecharactersNevermesswithgbencoding.
System
ASCII
ThefileswecomeintocontactwitheverydayaredividedintoASCIIandBinary.ASCIIistheabbreviationof"AmericanStandardCodeforInformationInterchange",whichcanbecalled"AmericanStandard".TheAmericanStandardstipulatesthat128numbersfrom0to127areusedtorepresentthecanonicalcodesofinformation,including33controlcodes,aspacecode,and94imagecodes.TheimagecodeincludesEnglishuppercaseandlowercaseletters,Arabicnumerals,punctuationmarks,etc.TheEnglishcomputertextsweusuallyreadaretransmittedandstoredintheformofimagecodes.TheAmericanStandardistheuniversalcodeformostlargeandsmallcomputersintheworld.
However,acharacterinacomputerismostlyrepresentedbyaneight-digitbinarynumber.Inthisway,thereare256differentvalues,whichcanbeusedtorepresent256differentcharacters.SincetheAmericanStandardonlystipulates128codes,theremaining128codesarenotregulated,andtheirusagevaries.Inaddition,theusageofthe33controlcodesintheAmericanStandardisnotconsistentacrossmanufacturers.Inthisway,whenweexchangefilesbetweendifferentcomputers,itisnecessarytodistinguishbetweentwodifferenttypesoffiles.EverywordinthefirsttypeofdocumentisanAmericanstandardimagecodeoraspacecode.Suchfilesarecalled"ASCIITextFiles",orslightly"textfiles",andtheycanusuallybedirectlyexchangedbetweendifferentcomputersystems.Thesecondtypeofdocuments,thatis,documentscontainingcontrolcodesornon-USstandardcodes,usuallycannotbedirectlyexchangedbetweendifferentcomputersystems.Thistypeoffilehasagenericname,called"BinaryFiles"(BinaryFiles).
NationalStandard
"NationalStandard"istheabbreviationof"ChineseCharacterCodeforNationalStandardInformationExchangeofthePeople'sRepublicofChina".Thenationalstandardtable(basictable)arrangesmorethan7,000Chinesecharacters,punctuationmarks,foreignletters,etc.,intoasquarematrixwith94rowsand94columns.Eachhorizontalrowinthephalanxiscalleda"zone",andeachzonehasninety-four"positions".ThecoordinateofaChinesecharacterinthesquarematrixiscalledthe"locationcode"ofthecharacter.Forexample,theword"中"isinthe48thpositionofthe54thareainthesquarematrix,anditspositioncodeis5448.
Infact,thenumberis94.ItisthetotalnumberofimagecodesintheAmericanStandard.Thenationalstandardtablecontinuestousethisnumber,andtheoriginalintentionisprobablytousetwoAmericanstandardsymbolstorepresentaChinesecharacter.SincethecodeoftheAmericanstandardimagesymbolisfrom33to126,if32isaddedtotheChinesecharacterareacode,itwilloverlapwiththerangeoftheAmericanstandardimagecode.Asintheaboveexample,thewordareaandbitcodeof"中"areaddedwith32toobtain86,80.Thehexadecimaldigitsofthesetwonumbersareputtogethertoget5650,whichiscalledthe"nationalstandardcode"oftheword,andthetwoAmericanstandardsymbolscorrespondingtoit,VP,isthe"nationalstandardsymbol"oftheword"中"..
Inthisway,thereisaproblemofhowtodistinguishbetweenthenationalstandardandtheAmericanstandard.InadocumentthatusesamixtureofChineseandEnglish,does"VP"representtheword"中",ordoesitrepresentanEnglishinitials?WhentheSixthResearchInstituteoftheMinistryofElectronicsIndustrydevelopedCCDOS,itusedasimplesolution:add128tothetwodigitsofthenationalstandardcodetorisetothepositionofthenon-Americanstandardcode.(Thechangednationalstandardcodeisstillcalled"nationalstandard".)
Althoughthissolutionhassolvedtheoriginalproblem,newproblemshavearisen.Chinesefileshavebecome"binaryfiles",whichcanneitherbereliablyexchangedbetweendifferentcomputersystems,noraretheycompatiblewithmostsoftwareonthemarketdesignedwithAmericanstandardsymbols.
Inordertodistinguishtheabovetwo"nationalstandards",wecallthenationalstandardcodethatoverlapswiththeAmericanstandardimagecodeas"purenationalstandard",andthenationalstandardcodeofCCDOSplus128iscalled"quasi-nationalstandard".
GBK
GBKcodeisanextendedcharacterencodingofGBcode.Itencodesmorethan20,000simplifiedandtraditionalChinesecharacters.BoththesimplifiedversionofWin95andWin98useGBK.Systeminternalcode.
Fromthepracticalpointofview,MicrosofthasadoptedGBKcodesincethesimplifiedChineseversionofwin95.ItincludesTrueTypeSongtiandHeitiGBKfonts(providedbyBeijingZhongyiElectronics),whichcanbeusedforDisplayandprint,andprovidefourinputmethodsforGBKChinesecharacters.Inaddition,thebrowserIE4.0providesatwo-wayconversionfunctionofGBK-BIG5codeinthesimplifiedandtraditionalChineseversions.Inaddition,inthelanguagepackprovidedbyMicrosoftforIE,thetwofontlibrariesofSimplifiedChineseLanguageSupportKit(SimplifiedChineseLanguageSupportKit),SongandHei,arealsoGBKChinesecharacters(providedbyZhuhaiSitongComputerTypesettingSystemDevelopmentCompany).SomeotherChinesefontmanufacturershavealsobeguntoprovideTrueTypeorPostScriptGBKfonts.
Manyplug-inChineseplatforms,suchasNJStar,Richwin,etc.,provideGBKcodesupport,includingfontlibraries,inputmethods,andconvertersforGBKandotherChinesecodes.
OntheInternet,manywebpagesuseGBKcodes.
Butmostsearchenginescan'tsupportGBKChinesecharactersearchwell,andsomesearchenginesinmainlandChinacannotfullysupportGBKChinesecharactersearch.
Infact,GBKisanotherChinesecharactercodingstandard,thefullnameis"ChineseInternatialCodeSpecification"(ChineseInternatialCodeSpecification),promulgatedin1995.GBisthenationalstandard,andKisthefirstletteroftheChinesepinyinoftheChinesecharacter"extended".
GBKisdownwardlycompatiblewithGB-2312encodingandupwardlysupportstheISO10646.1internationalstandard.ItisaChaintechstandardforthetransitionfromtheformertothelatter.
GBKspecificationincludesallCJKChinesecharactersandsymbolsinISO10646.1,andsupplements.Specificallyinclude:allChinesecharactersandnon-ChinesecharacternumbersinGB2312;otherCJKChinesecharactersinGB13000.1.Atotalof20902ChinesecharactersinGB;52ChinesecharactersthatarenotincludedinGB13000.1inthe"SimplifiedGeneralTable";28radicalsandimportantcomponentsin"KangxiDictionary"and"Cihai"thatarenotincludedinGB13000.1;13ChinesecharacterstructuresSymbols;139graphicsymbolsinBIG-5thatarenotincludedinGB2312butexistinGB13000.1;6pinyinsymbolssupplementedbyGB12345;19verticalgraphicalsymbolssupplementedbyGB12345(GB12345supplementsGB2312Thereare29punctuationmarks,10ofwhicharenotincludedinGB13000.1,soGBKwillnotacceptthem);21ChinesecharactersselectedfromtheCJKcompatibilityareaofGB13000.1;31IBMOS/2specialsymbolsfromGB13000.1.GBKalsousesdouble-byterepresentation.Theoverallencodingrangeisbetween0x8140and0xFEFE,thefirstbyteisbetween0x81and0xFE,andtheendbyteisbetween0x40and0xFE.The0x××7Flineiseliminated,andatotalof23940codesItincludes21886Chinesecharactersandgraphicsymbols,including21,003Chinesecharacters(includingradicalsandcomponents)and883graphicsymbols.
BIG5
BIG5codeisaChinesecharacterencodingfortraditionalChinesecharacters,whichiswidelyusedincomputersystemsinTaiwanandHongKong.RefertothefollowingforthecodingrangeofBIG5codes.
HZcode
TheHZcodeisgeneratedbyChinesestudentsinordertoenableChinesecharacterinformationtobetransmitteddirectlyontheInternet.Sincemost(western)networksystemscurrentlyhave7bits,thehighestbitismasked,soGBcodescannotbetransmitteddirectly.HZcodesarestandardizedforthepurposeofdirectlytransmittingChinesecharacterinformationin7-bitnetworksystems.
Thecharacteristicofthe"HZ"schemeisthattheChineseandAmericanstandardcodesofthe"purenationalstandard"aremixed.Sohowdoes"HZ"distinguishbetweenthenationalstandardandtheAmericanstandard?Theanswerisactuallyverysimple:whenanationalstandardcodeisinsertedinthemiddleofastringofAmericanstandardcodes,weadd~infrontofthenationalstandardcodeand~attheend.Theseadditionalcodesarecalled"escapecode"and"escapecode"respectively.SincetheseadditionalcodesthemselvesarealsoAmericanstandardimagecodes,theentirefileislikeanAmericanstandardtextfile,whichcanbesafelytransmittedonthecomputernetwork,andisalsocompatiblewithmostEnglishtextprocessingsoftware.
CJKcode
ISO-2022isacodingstandardestablishedbytheInternationalOrganizationforStandardization(ISO)forcharactersinvariouslanguages.Two-byteencodingisadopted,inwhichtheChineseencodingiscalledISO-2022CN,andtheJapaneseandKoreanencodingsarecalledJPandKRrespectively.Generally,thethreearecollectivelyreferredtoastheCJKcode.TheCJKcodeismainlyusedintheInternetnetwork.
ISO
In1993,theinternationalstandardISO10646definedtheUniversalCharacterSet(UCS).UCSisasupersetofallothercharactersetstandards.Itguaranteesbidirectionalcompatibilitywithothercharactersets.Thatistosay,ifyoutranslateanytextstringintoUCSformatandthentranslateitbacktotheoriginalencoding,youwillnotloseanyinformation.
UCScontainscharactersusedtoexpressallknownlanguages.NotonlyincludesdescriptionsinLatin,Greek,Slavic,Hebrew,Arabic,ArmenianandGeorgian,butalsohieroglyphssuchasChinese,JapaneseandKorean,aswellashiragana,katakana,Bengali,andPunjabiLanguageGurmukhi,Tamil,Kannada,Malayalam,Thai,Lao,ChinesePinyin(Bopomofo),Hangul,Devangari,Gujarati,Oriya,Teluguandotherlanguages.Forlanguagesthathavenotyetbeenadded,theywillallbeaddedeventuallyastheyarebeingstudiedhowtobestencodetheminthecomputer.TheselanguagesincludeTibetian,Khmer,Runic(ancientNorsescript),Ethiopian,otherhieroglyphs,andvariousIndo-Europeanlanguages,aswellasselectedartisticlanguagessuchasTengwar,CirthandKlinTribute(Klingon).UCSalsoincludesalargenumberofgraphic,printing,mathematicalandscientificsymbols,includingallprovidedbyTeX,Postscript,MS-DOS,MS-Windows,Macintosh,OCRfonts,andmanyotherwordprocessingandpublishingsystems.characterof.
ISO10646definesa31-bitcharacterset.However,inthishugecodingspace,onlythefirst65534codebits(0x0000to0xFFFD)havebeenallocatedsofar.This16-bitsubsetofUCSiscalledBasicMultilingualPlane(BMP).Charactersotherthan16-bitBMPareveryspecialcharacters(suchashieroglyphs),andtheyareonlyusedbyexpertsinthefieldsofhistoryandscience.Accordingtothecurrentplan,inthefuture,theremaynolongerbecharactersassignedtothe21-bitencodingspacefrom0x000000to0x10FFFF,whichcoversmorethan1millionpotentialfuturecharacters.TheISO10646-1standardwasfirstpublishedin1993anddefinesthestructureofthecharactersetandcontentintheBMP.ThesecondpartofISO10646-2,whichdefinescharacterencodingsotherthanBMP,isinpreparation,butitmaytakeseveralyearstocomplete.NewcharactersarestilladdedtotheBMPcontinuously,buttheexistingcharactersarestableandwillnotchange.
UCSnotonlyassignsacodetoeachcharacter,butalsogivesitanofficialname.AhexadecimalnumberrepresentingaUCSorUnicodevalue,usuallyprefixedwith"U+",justlikeU+0041representsthecharacter"LatincapitalletterA".UCScharactersU+0000toU+007FareconsistentwithUS-ASCII(ISO646),andU+0000toU+00FFarealsoconsistentwithISO8859-1(Latin-1).FromU+E000toU+F8FF,alargerangeofcodesotherthanBMParereservedforprivateuse.
In1993,theUSC-4(UniversalCharacterSet)definedinISO10646usedawidthof4bytestoaccommodateaconsiderableamountofspace,butthisobesecharacterstandardwasatthetimeEventhe21stcenturyhasitsunrealisticside,thatis,itwillexcessivelyoccupystoragespaceandaffecttheefficiencyofinformationtransmission.Atthesametime,theUnicodeorganizationbegantodevelopa16-bitcharacterstandardwithUniversal,Unique,andUniformabout10yearsago.Inordertoavoidthecompetitionbetweenthetwo16-bitencodings,thetwoorganizationsbegantonegotiatein1992inordertofindacompromise.Point,thisistoday'sUCS-2(BMP,BasicMultilingualPlane,16bit)andUnicode,buttheyarestilldifferentsolutions.
Unicode
AboutUnicode,weneedtotraceitsorigin.
WhencomputersspreadtoEastAsia,theyencounteredcountriessuchasChina,Japan,andSouthKoreathatusedideographiccharactersinsteadofalphabeticlanguages.Inthelanguagesusedinthesecountries,thereareasmanyasthousandsofcommonlyusedcharacters,buttheoriginalcharactersusesingle-byteencoding.Themaximumnumberofcharactersthatcanbeaccommodatedinacodepageisonly2^8=256.Thisistrueforlanguagesthatuseideographiccharacters.Powerless.Sinceonebyteisnotenough,naturallypeopleusetwobytes,sothereisadouble-bytecodedcharacterset(DBCS).However,althoughtheideographiccharactersinthedouble-bytecharactersetusetwo-byteencoding,theASCIIcodeandJapanesekatakanaarestillexpressedinsingle-byte,whichbringsalotoftroubletotheprogrammer,becauseeveryWhenitcomestotheprocessingofDBCSstrings,itisalwaysnecessarytojudgewhetherabyteinitrepresentsacharacterorahalfcharacter.Ifitisahalfcharacter,isitthefirsthalforthesecondhalf?ThisshowsthatDBCSisnotaverygoodsolution.
Peopleareconstantlylookingforabettercharacterencodingscheme,andthefinalresultisthatUnicodewasborn.Unicodeisactuallyawide-bytecharacterset.Itusestwobytesor16-bitrepresentationforeachcharacter,sowhendealingwithcharacters,youdon'thavetoworryaboutprocessingonlyhalfofacharacter.
Unicodeisusedinnetworks,Windowssystemsandmanylargesoftware.
Typesofencoding
Encodingisacognitiveprocessofexplainingthebasicperceptionofincomingstimuli.Technicallyspeaking,thisisacomplex,multi-stageconversionprocess,fromamoreobjectivesensoryinput(suchaslightandsound)toasubjectivelymeaningfulexperience.
Characterencodingisasetofrulesthatcanbeusedtodetermineasetofnaturallanguagecharacters(suchasalphabetorsyllabletable),andasetofotherthings(suchasnumbersortelephonePulse)forpairing.
Textencoding
Textencodingusesamarkuplanguagetomarkthestructureandothercharacteristicsofatexttofacilitatecomputerprocessing.
Semanticsencoding
SemanticsencodingistouseformallanguageBtosemanticallyencodeformallanguageA,thatis,touselanguageBtoexpressallvocabularyoflanguageA(suchasprogramsorDescription)amethod.
Electronicencoding
Electronicencodingconvertsasignalintoacode,whichisoptimizedfortransmissionorstorage.Theconversionisusuallydonebyacodec.
PCMcoding
PCMpulsecodemodulationistheabbreviationofPulseCodeModulation.(Alsocalledpulsecodemodulation):Oneofthecodingmethodsofdigitalcommunication.Themainprocessistosamplevoice,imageandotheranalogsignalsatregularintervalstomakethemdiscretize.Atthesametime,thesampledvalueisroundedupandquantizedaccordingtothestratifiedunit,andthesampledvalueisrepresentedbyasetofbinarycodestorepresenttheamplitudeofthesampledpulsevalue.
Neuralencoding
Neuralencodingreferstohowinformationisdepictedinneurons.
Memoryencoding
Memoryencodingistheprocessofconvertingfeelingsintomemories.
Encryption
Encryptionistheprocessoftransforminginformationforconfidentiality.
Decoding
Transcodingistheprocessofconvertingencodingfromoneformattoanother.
Charactersetediting
Easytoimplementcodeconversion
First,useiconvfunctionfamilyforcodeconversion
WhenencodingconversiononLINUX,youcanuseiconvfunctionfamilyprogrammingtoachieve,youcanalsouseiconvcommandtoachieve,butthelatterisforfiles,thatis,thespecifiedfileisconvertedfromoneencodingtoanother..
Theheaderfileoftheiconvfunctionfamilyisiconv.h,whichmustbeincludedbeforeuse.
#include
Theiconvfunctionfamilyhasthreefunctions,theprototypesareasfollows:
(1)iconv_ticonv_open(constchar*tocode,constchar*fromcode)
Thisfunctionindicateswhichtwoencodingswillbeconverted,tocodeisthetargetencoding,andfromcodeistheoriginalencoding.Thisfunctionreturnsaconversionhandleforthefollowingtwofunctions.
(2)size_ticonv(iconv_tcd,char**inbuf,size_t*inbytesleft,char**outbuf,size_t*outbytesleft)
ThisfunctionreadscharactersfrominbufandconvertsThenoutputtooutbuf,inbytesleftisusedtorecordthenumberofcharactersthathavenotbeenconverted,andoutbytesleftisusedtorecordtheremainingspaceoftheoutputbuffer.(3)inticonv_close(iconv_tcd)
Thisfunctionisusedtoclosetheconversionhandleandreleaseresources.
Example1:ConversionexampleprogramimplementedinClanguage
/*fc:CodeconversionexampleCprogram*/
#include
#defineOUTLEN255
main()
{
char*in_utf8="姝e?ㄥ??瑁?"
char*in_gb2312="Installing"
charout[OUTLEN]
//unicodecodetogb2312code
rc=u2g(in_utf8,strlen(in_utf8),out,OUTLEN)
printf("unicode-->gb2312out=%sn",out)
//gb2312codeconvertedtounicodecode
p>rc=g2u(in_gb2312,strlen(in_gb2312),out,OUTLEN)
printf("gb2312-->unicodeout=%sn",out)
}
//Codeconversion:fromoneencodingtoanother
intcode_convert(char*from_charset,char*to_charset,char*inbuf,intinlen,char*outbuf,intoutlen)
{
iconv_tcd;
intrc;
char**pin=&inbuf;
char**pout=&outbuf;
cd=iconv_open(to_charset,from_charset);
if(cd==0)return-1;
memset(outbuf,0,outlen);
if(iconv(cd,pin,&inlen,pout,&outlen)==-1)return-1;
iconv_close(cd);
return0;
}
//ConvertUNICODEcodetoGB2312code
intu2g(char*inbuf,intinlen,char*outbuf,intoutlen)
{
returncode_convert("utf-8","gb2312",inbuf,inlen,outbuf,outlen);
}
//GB2312codeconvertedtoUNICODEcode
intg2u(char*inbuf,size_tinlen,char*outbuf,size_toutlen)
{
returncode_convert("gb2312","utf-8",inbuf,inlen,outbuf,outlen);
}
Example2:ConversionexampleprogramimplementedinC++language
/*f.cpp:CodeconversionexampleC++program*/
#include
#include
#defineOUTLEN255
p>usingnamespacestd;
//Codeconversionoperationclass
classCodeConverter{
private:
iconv_tcd;
public:
//Construction
CodeConverter(constchar*from_charset,constchar*to_charset){
cd=iconv_open(to_charset,from_charset;
}
//Destructuring
~CodeConverter(){
iconv_close(cd);
p>}
//Convertoutput
intconvert(char*inbuf,intinlen,char*outbuf,intoutlen){
char**pin=&inbuf;
char**pout=&outbuf;
memset(outbuf,0,outlen);
returniconv(cd,pin,(size_t*)&inlen,pout,(size_t*)&outlen);
}
};
intmain(intargc,char**argv)
{
char*in_utf8="姝e?ㄥ??瑁?";
char*in_gb2312="Installing";
charout[OUTLEN];
//utf-8-->gb2312
CodeConvertercc=CodeConverter("utf-8","gb2312");
cc.convert(in_utf8,strlen(in_utf8),out,OUTLEN);
cout<<"utf-8-->gb2312in="<<in_utf8<<",out="<<out<<endl;
//gb2312-->utf-8
CodeConvertercc2=CodeConverter("gb2312","utf-8");
cc2.convert(in_gb2312,strlen(in_gb2312),out,OUTLEN);
cout<<"gb2312-->utf-8in="<<in_gb2312<<",out="<<out<<endl;
}
Second,useiconvcommandforencodingconversion
WhenencodingconversiononLINUX,youcaneitheruseiconvfunctionfamilyprogrammingoriconvcommandToachieve,butthelatterisforthefile,thatis,thespecifiedfileisconvertedfromoneencodingtoanother.
Theiconvcommandisusedtoconverttheencodingofthespecifiedfile,thedefaultoutputistothestandardoutputdevice,andtheoutputfilecanalsobespecified.
Usage:iconv[Options...][File...]
Thefollowingoptionsareavailable:
Input/outputformatspecification:
-f,--from-code=Namerawtextencoding
-t,--to-code=Nameoutputencoding
Information:
-l,--listListallknowncharactersets
Outputcontrol:
-cIgnoreinvalidcharactersfromtheoutput
-o,--output=FILEoutputfile
-s,--silentturnoffwarning
--verboseprintprogressinformation
-?,--helpgiveListthesystemhelplist
--usageGivebriefusageinformation
-V,--versionPrinttheprogramversionnumber
Example:
iconv-futf-8-tgb2312aaa.txt>bbb.txt
Thiscommandreadstheaaa.txtfile,convertsfromutf-8encodingtogb2312encoding,anditsoutputisdirectedTothebbb.txtfile.
Summary:LINUXprovidesuswithapowerfulencodingconversiontool,whichbringsusconvenience.