coding



Definition

Incomputerhardware,codingreferstotheuseofcodestorepresentgroupsofdata,makingitinformationthatcanbeprocessedandanalyzedbyacomputer.Codeisasymbolusedtorepresentthings.Itcanberepresentedbynumbers,letters,specialsymbolsoracombinationofthem

Convertdataintocodesorcodedcharacters,andcanbetranslatedintotheoriginaldataform.Itistheprocessofcomputerwritinginstructions,partoftheprogramdesign.Inautomaticmapmapping,theprocessofexpressingthecontentofthemapwithnumbersandlettersaccordingtocertainrules,throughcoding,enablesthecomputertoidentifythegeographicalelementsofthemap.

N-bitbinarynumberscanbecombinedinto2tothen-thpowerofdifferentinformation,andaspecificcodegroupisspecifiedforeachinformation.Thisprocessisalsocalledencoding.

Therearetwotypesofcodescommonlyusedindigitalsystems,oneisbinarycode,andtheotherisdecimalcode.

Codingknowledge

Introduction

GB2312andGBKarecommonlyusedinGBcodingstandards.GB2312isasubsetofGBK,andGB2312codingrangeItis0xA1A1-0xFEFE,ifpureGB2312encoding,itisverysimpletodealwith,buttherearesomesmallhintswhendealingwithGBKcharacterset,let'stalkaboutthestandardofGBKencodingfirst:

GBKusesdoublebytesIndicatesthattheoverallencodingrangeis8140-FEFE,thefirstbyteisbetween81-FE,andthelastbyteisbetween40-FE,andthexx7Flineisexcluded.Atotalof23940codepoints,atotalof21886Chinesecharactersandgraphicsymbols,including21,003Chinesecharacters(includingradicalsandcomponents),and883graphicsymbols.

Encodingclassification

1.Chinesecharacterarea.Including:

a.GB2312Chinesecharacterarea.Thatis,GBK/2:B0A1-F7FE.6763ChinesecharactersofGB2312areincludedintheoriginalorder.

b.GB13000.1expandstheChinesecharacterarea.Including:

(1)GBK/3:8140-A0FE.Itcontains6080CJKChinesecharactersinGB13000.1.

(2)GBK/4:AA40-FEA0.Contains8160CJKChinesecharactersandsupplementaryChinesecharacters.

CJKChinesecharactersarearrangedinthefrontaccordingtotheUCScodesize;supplementaryChinesecharacters(includingradicalsandcomponents)arearrangedinthebackaccordingtothepagenumber/characterpositionof"KangxiDictionary".

2.Graphicsymbolarea.Including:

a.GB2312non-Chinesecharacternumberarea.Thatis,GBK/1:A1A1-A9FE.InadditiontothesymbolsofGB2312,

Therearealso10lowercaseRomannumeralsandGB12345supplementarysymbols.717symbolsarecounted.

b.GB13000.1expandsthenon-Chinesecharacterarea.Thatis,GBK/5:A840-A9A0.BIG-5non-Chinesecharactersymbols,structurecharactersand"○"arearrangedinthisarea.166symbolsarecounted.

3.User-definedarea:Itisdividedinto(1)(2)(3)threedistricts.

(1)AAA1-AFFE,with564codepoints.

(2)F8A1-FEFE,with658codepoints.

(3)A140-A7A0,with672codepoints.

Althougharea(3)isopentousers,itsuseisrestrictedbecauseitdoesnotruleoutthepossibilityofaddingnewcharactersinthisareainthefuture.

Hereareafewtips:

First,inphp,thecharacterencodingisbasedontheencodingsent,sowhatisusedisTheencodingenteredbytheuserwillnotbechangedautomatically,butinasp,thedefaultencodingisunicode,sowecaneasilygettheencodingcomparisontableofgbk->unicode,sothatitcanbedoneevenifthereisnobasiclibraryIt’seasytoconvertfromgbktoutf-8;

2.BecauseGBKisthehighestandlowestvalueis0x40,whichis64,sometimeswhenorganizingsomestringsinvolvingChinese,thecharactersaredividedItisbesttouseasciicodesbefore64,sothattherewillbenogarbledcharactersinreplacementorsegmentationunderanycircumstances.Themorecommonlyusedonesare",",";",":","","","",thesecharactersNevermesswithgbencoding.

System

ASCII

ThefileswecomeintocontactwitheverydayaredividedintoASCIIandBinary.ASCIIistheabbreviationof"AmericanStandardCodeforInformationInterchange",whichcanbecalled"AmericanStandard".TheAmericanStandardstipulatesthat128numbersfrom0to127areusedtorepresentthecanonicalcodesofinformation,including33controlcodes,aspacecode,and94imagecodes.TheimagecodeincludesEnglishuppercaseandlowercaseletters,Arabicnumerals,punctuationmarks,etc.TheEnglishcomputertextsweusuallyreadaretransmittedandstoredintheformofimagecodes.TheAmericanStandardistheuniversalcodeformostlargeandsmallcomputersintheworld.

However,acharacterinacomputerismostlyrepresentedbyaneight-digitbinarynumber.Inthisway,thereare256differentvalues,whichcanbeusedtorepresent256differentcharacters.SincetheAmericanStandardonlystipulates128codes,theremaining128codesarenotregulated,andtheirusagevaries.Inaddition,theusageofthe33controlcodesintheAmericanStandardisnotconsistentacrossmanufacturers.Inthisway,whenweexchangefilesbetweendifferentcomputers,itisnecessarytodistinguishbetweentwodifferenttypesoffiles.EverywordinthefirsttypeofdocumentisanAmericanstandardimagecodeoraspacecode.Suchfilesarecalled"ASCIITextFiles",orslightly"textfiles",andtheycanusuallybedirectlyexchangedbetweendifferentcomputersystems.Thesecondtypeofdocuments,thatis,documentscontainingcontrolcodesornon-USstandardcodes,usuallycannotbedirectlyexchangedbetweendifferentcomputersystems.Thistypeoffilehasagenericname,called"BinaryFiles"(BinaryFiles).

NationalStandard

"NationalStandard"istheabbreviationof"ChineseCharacterCodeforNationalStandardInformationExchangeofthePeople'sRepublicofChina".Thenationalstandardtable(basictable)arrangesmorethan7,000Chinesecharacters,punctuationmarks,foreignletters,etc.,intoasquarematrixwith94rowsand94columns.Eachhorizontalrowinthephalanxiscalleda"zone",andeachzonehasninety-four"positions".ThecoordinateofaChinesecharacterinthesquarematrixis​​calledthe"locationcode"ofthecharacter.Forexample,theword"中"isinthe48thpositionofthe54thareainthesquarematrix,anditspositioncodeis5448.

Infact,thenumberis94.ItisthetotalnumberofimagecodesintheAmericanStandard.Thenationalstandardtablecontinuestousethisnumber,andtheoriginalintentionisprobablytousetwoAmericanstandardsymbolstorepresentaChinesecharacter.SincethecodeoftheAmericanstandardimagesymbolisfrom33to126,if32isaddedtotheChinesecharacterareacode,itwilloverlapwiththerangeoftheAmericanstandardimagecode.Asintheaboveexample,thewordareaandbitcodeof"中"areaddedwith32toobtain86,80.Thehexadecimaldigitsofthesetwonumbersareputtogethertoget5650,whichiscalledthe"nationalstandardcode"oftheword,andthetwoAmericanstandardsymbolscorrespondingtoit,VP,isthe"nationalstandardsymbol"oftheword"中"..

Inthisway,thereisaproblemofhowtodistinguishbetweenthenationalstandardandtheAmericanstandard.InadocumentthatusesamixtureofChineseandEnglish,does"VP"representtheword"中",ordoesitrepresentanEnglishinitials?WhentheSixthResearchInstituteoftheMinistryofElectronicsIndustrydevelopedCCDOS,itusedasimplesolution:add128tothetwodigitsofthenationalstandardcodetorisetothepositionofthenon-Americanstandardcode.(Thechangednationalstandardcodeisstillcalled"nationalstandard".)

Althoughthissolutionhassolvedtheoriginalproblem,newproblemshavearisen.Chinesefileshavebecome"binaryfiles",whichcanneitherbereliablyexchangedbetweendifferentcomputersystems,noraretheycompatiblewithmostsoftwareonthemarketdesignedwithAmericanstandardsymbols.

Inordertodistinguishtheabovetwo"nationalstandards",wecallthenationalstandardcodethatoverlapswiththeAmericanstandardimagecodeas"purenationalstandard",andthenationalstandardcodeofCCDOSplus128iscalled"quasi-nationalstandard".

GBK

GBKcodeisanextendedcharacterencodingofGBcode.Itencodesmorethan20,000simplifiedandtraditionalChinesecharacters.BoththesimplifiedversionofWin95andWin98useGBK.Systeminternalcode.

Fromthepracticalpointofview,MicrosofthasadoptedGBKcodesincethesimplifiedChineseversionofwin95.ItincludesTrueTypeSongtiandHeitiGBKfonts(providedbyBeijingZhongyiElectronics),whichcanbeusedforDisplayandprint,andprovidefourinputmethodsforGBKChinesecharacters.Inaddition,thebrowserIE4.0providesatwo-wayconversionfunctionofGBK-BIG5codeinthesimplifiedandtraditionalChineseversions.Inaddition,inthelanguagepackprovidedbyMicrosoftforIE,thetwofontlibrariesofSimplifiedChineseLanguageSupportKit(SimplifiedChineseLanguageSupportKit),SongandHei,arealsoGBKChinesecharacters(providedbyZhuhaiSitongComputerTypesettingSystemDevelopmentCompany).SomeotherChinesefontmanufacturershavealsobeguntoprovideTrueTypeorPostScriptGBKfonts.

Manyplug-inChineseplatforms,suchasNJStar,Richwin,etc.,provideGBKcodesupport,includingfontlibraries,inputmethods,andconvertersforGBKandotherChinesecodes.

OntheInternet,manywebpagesuseGBKcodes.

Butmostsearchenginescan'tsupportGBKChinesecharactersearchwell,andsomesearchenginesinmainlandChinacannotfullysupportGBKChinesecharactersearch.

Infact,GBKisanotherChinesecharactercodingstandard,thefullnameis"ChineseInternatialCodeSpecification"(ChineseInternatialCodeSpecification),promulgatedin1995.GBisthenationalstandard,andKisthefirstletteroftheChinesepinyinoftheChinesecharacter"extended".

GBKisdownwardlycompatiblewithGB-2312encodingandupwardlysupportstheISO10646.1internationalstandard.ItisaChaintechstandardforthetransitionfromtheformertothelatter.

GBKspecificationincludesallCJKChinesecharactersandsymbolsinISO10646.1,andsupplements.Specificallyinclude:allChinesecharactersandnon-ChinesecharacternumbersinGB2312;otherCJKChinesecharactersinGB13000.1.Atotalof20902ChinesecharactersinGB;52ChinesecharactersthatarenotincludedinGB13000.1inthe"SimplifiedGeneralTable";28radicalsandimportantcomponentsin"KangxiDictionary"and"Cihai"thatarenotincludedinGB13000.1;13ChinesecharacterstructuresSymbols;139graphicsymbolsinBIG-5thatarenotincludedinGB2312butexistinGB13000.1;6pinyinsymbolssupplementedbyGB12345;19verticalgraphicalsymbolssupplementedbyGB12345(GB12345supplementsGB2312Thereare29punctuationmarks,10ofwhicharenotincludedinGB13000.1,soGBKwillnotacceptthem);21ChinesecharactersselectedfromtheCJKcompatibilityareaof​​GB13000.1;31IBMOS/2specialsymbolsfromGB13000.1.GBKalsousesdouble-byterepresentation.Theoverallencodingrangeisbetween0x8140and0xFEFE,thefirstbyteisbetween0x81and0xFE,andtheendbyteisbetween0x40and0xFE.The0x××7Flineiseliminated,andatotalof23940codesItincludes21886Chinesecharactersandgraphicsymbols,including21,003Chinesecharacters(includingradicalsandcomponents)and883graphicsymbols.

BIG5

BIG5codeisaChinesecharacterencodingfortraditionalChinesecharacters,whichiswidelyusedincomputersystemsinTaiwanandHongKong.RefertothefollowingforthecodingrangeofBIG5codes.

HZcode

TheHZcodeisgeneratedbyChinesestudentsinordertoenableChinesecharacterinformationtobetransmitteddirectlyontheInternet.Sincemost(western)networksystemscurrentlyhave7bits,thehighestbitismasked,soGBcodescannotbetransmitteddirectly.HZcodesarestandardizedforthepurposeofdirectlytransmittingChinesecharacterinformationin7-bitnetworksystems.

Thecharacteristicofthe"HZ"schemeisthattheChineseandAmericanstandardcodesofthe"purenationalstandard"aremixed.Sohowdoes"HZ"distinguishbetweenthenationalstandardandtheAmericanstandard?Theanswerisactuallyverysimple:whenanationalstandardcodeisinsertedinthemiddleofastringofAmericanstandardcodes,weadd~infrontofthenationalstandardcodeand~attheend.Theseadditionalcodesarecalled"escapecode"and"escapecode"respectively.SincetheseadditionalcodesthemselvesarealsoAmericanstandardimagecodes,theentirefileislikeanAmericanstandardtextfile,whichcanbesafelytransmittedonthecomputernetwork,andisalsocompatiblewithmostEnglishtextprocessingsoftware.

CJKcode

ISO-2022isacodingstandardestablishedbytheInternationalOrganizationforStandardization(ISO)forcharactersinvariouslanguages.Two-byteencodingisadopted,inwhichtheChineseencodingiscalledISO-2022CN,andtheJapaneseandKoreanencodingsarecalledJPandKRrespectively.Generally,thethreearecollectivelyreferredtoastheCJKcode.TheCJKcodeismainlyusedintheInternetnetwork.

ISO

In1993,theinternationalstandardISO10646definedtheUniversalCharacterSet(UCS).UCSisasupersetofallothercharactersetstandards.Itguaranteesbidirectionalcompatibilitywithothercharactersets.Thatistosay,ifyoutranslateanytextstringintoUCSformatandthentranslateitbacktotheoriginalencoding,youwillnotloseanyinformation.

UCScontainscharactersusedtoexpressallknownlanguages.NotonlyincludesdescriptionsinLatin,Greek,Slavic,Hebrew,Arabic,ArmenianandGeorgian,butalsohieroglyphssuchasChinese,JapaneseandKorean,aswellashiragana,katakana,Bengali,andPunjabiLanguageGurmukhi,Tamil,Kannada,Malayalam,Thai,Lao,ChinesePinyin(Bopomofo),Hangul,Devangari,Gujarati,Oriya,Teluguandotherlanguages.Forlanguages​​thathavenotyetbeenadded,theywillallbeaddedeventuallyastheyarebeingstudiedhowtobestencodetheminthecomputer.Theselanguages​​includeTibetian,Khmer,Runic(ancientNorsescript),Ethiopian,otherhieroglyphs,andvariousIndo-Europeanlanguages,aswellasselectedartisticlanguages​​suchasTengwar,CirthandKlinTribute(Klingon).UCSalsoincludesalargenumberofgraphic,printing,mathematicalandscientificsymbols,includingallprovidedbyTeX,Postscript,MS-DOS,MS-Windows,Macintosh,OCRfonts,andmanyotherwordprocessingandpublishingsystems.characterof.

ISO10646definesa31-bitcharacterset.However,inthishugecodingspace,onlythefirst65534codebits(0x0000to0xFFFD)havebeenallocatedsofar.This16-bitsubsetofUCSiscalledBasicMultilingualPlane(BMP).Charactersotherthan16-bitBMPareveryspecialcharacters(suchashieroglyphs),andtheyareonlyusedbyexpertsinthefieldsofhistoryandscience.Accordingtothecurrentplan,inthefuture,theremaynolongerbecharactersassignedtothe21-bitencodingspacefrom0x000000to0x10FFFF,whichcoversmorethan1millionpotentialfuturecharacters.TheISO10646-1standardwasfirstpublishedin1993anddefinesthestructureofthecharactersetandcontentintheBMP.ThesecondpartofISO10646-2,whichdefinescharacterencodingsotherthanBMP,isinpreparation,butitmaytakeseveralyearstocomplete.NewcharactersarestilladdedtotheBMPcontinuously,buttheexistingcharactersarestableandwillnotchange.

UCSnotonlyassignsacodetoeachcharacter,butalsogivesitanofficialname.AhexadecimalnumberrepresentingaUCSorUnicodevalue,usuallyprefixedwith"U+",justlikeU+0041representsthecharacter"LatincapitalletterA".UCScharactersU+0000toU+007FareconsistentwithUS-ASCII(ISO646),andU+0000toU+00FFarealsoconsistentwithISO8859-1(Latin-1).FromU+E000toU+F8FF,alargerangeofcodesotherthanBMParereservedforprivateuse.

In1993,theUSC-4(UniversalCharacterSet)definedinISO10646usedawidthof4bytestoaccommodateaconsiderableamountofspace,butthisobesecharacterstandardwasatthetimeEventhe21stcenturyhasitsunrealisticside,thatis,itwillexcessivelyoccupystoragespaceandaffecttheefficiencyofinformationtransmission.Atthesametime,theUnicodeorganizationbegantodevelopa16-bitcharacterstandardwithUniversal,Unique,andUniformabout10yearsago.Inordertoavoidthecompetitionbetweenthetwo16-bitencodings,thetwoorganizationsbegantonegotiatein1992inordertofindacompromise.Point,thisistoday'sUCS-2(BMP,BasicMultilingualPlane,16bit)andUnicode,buttheyarestilldifferentsolutions.

Unicode

AboutUnicode,weneedtotraceitsorigin.

WhencomputersspreadtoEastAsia,theyencounteredcountriessuchasChina,Japan,andSouthKoreathatusedideographiccharactersinsteadofalphabeticlanguages.Inthelanguages​​usedinthesecountries,thereareasmanyasthousandsofcommonlyusedcharacters,buttheoriginalcharactersusesingle-byteencoding.Themaximumnumberofcharactersthatcanbeaccommodatedinacodepageisonly2^8=256.Thisistrueforlanguages​​thatuseideographiccharacters.Powerless.Sinceonebyteisnotenough,naturallypeopleusetwobytes,sothereisadouble-bytecodedcharacterset(DBCS).However,althoughtheideographiccharactersinthedouble-bytecharactersetusetwo-byteencoding,theASCIIcodeandJapanesekatakanaarestillexpressedinsingle-byte,whichbringsalotoftroubletotheprogrammer,becauseeveryWhenitcomestotheprocessingofDBCSstrings,itisalwaysnecessarytojudgewhetherabyteinitrepresentsacharacterorahalfcharacter.Ifitisahalfcharacter,isitthefirsthalforthesecondhalf?ThisshowsthatDBCSisnotaverygoodsolution.

Peopleareconstantlylookingforabettercharacterencodingscheme,andthefinalresultisthatUnicodewasborn.Unicodeisactuallyawide-bytecharacterset.Itusestwobytesor16-bitrepresentationforeachcharacter,sowhendealingwithcharacters,youdon'thavetoworryaboutprocessingonlyhalfofacharacter.

Unicodeisusedinnetworks,Windowssystemsandmanylargesoftware.

Typesofencoding

Encodingisacognitiveprocessofexplainingthebasicperceptionofincomingstimuli.Technicallyspeaking,thisisacomplex,multi-stageconversionprocess,fromamoreobjectivesensoryinput(suchaslightandsound)toasubjectivelymeaningfulexperience.

Characterencodingisasetofrulesthatcanbeusedtodetermineasetofnaturallanguagecharacters(suchasalphabetorsyllabletable),andasetofotherthings(suchasnumbersortelephonePulse)forpairing.

Textencoding

Textencodingusesamarkuplanguagetomarkthestructureandothercharacteristicsofatexttofacilitatecomputerprocessing.

Semanticsencoding

SemanticsencodingistouseformallanguageBtosemanticallyencodeformallanguageA,thatis,touselanguageBtoexpressallvocabularyoflanguageA(suchasprogramsorDescription)amethod.

Electronicencoding

Electronicencodingconvertsasignalintoacode,whichisoptimizedfortransmissionorstorage.Theconversionisusuallydonebyacodec.

PCMcoding

PCMpulsecodemodulationistheabbreviationofPulseCodeModulation.(Alsocalledpulsecodemodulation):Oneofthecodingmethodsofdigitalcommunication.Themainprocessistosamplevoice,imageandotheranalogsignalsatregularintervalstomakethemdiscretize.Atthesametime,thesampledvalueisroundedupandquantizedaccordingtothestratifiedunit,andthesampledvalueisrepresentedbyasetofbinarycodestorepresenttheamplitudeofthesampledpulsevalue.

Neuralencoding

Neuralencodingreferstohowinformationisdepictedinneurons.

Memoryencoding

Memoryencodingistheprocessofconvertingfeelingsintomemories.

Encryption

Encryptionistheprocessoftransforminginformationforconfidentiality.

Decoding

Transcodingistheprocessofconvertingencodingfromoneformattoanother.

Charactersetediting

Easytoimplementcodeconversion

First,useiconvfunctionfamilyforcodeconversion

WhenencodingconversiononLINUX,youcanuseiconvfunctionfamilyprogrammingtoachieve,youcanalsouseiconvcommandtoachieve,butthelatterisforfiles,thatis,thespecifiedfileisconvertedfromoneencodingtoanother..

Theheaderfileoftheiconvfunctionfamilyisiconv.h,whichmustbeincludedbeforeuse.

#include

Theiconvfunctionfamilyhasthreefunctions,theprototypesareasfollows:

(1)iconv_ticonv_open(constchar*tocode,constchar*fromcode)

Thisfunctionindicateswhichtwoencodingswillbeconverted,tocodeisthetargetencoding,andfromcodeistheoriginalencoding.Thisfunctionreturnsaconversionhandleforthefollowingtwofunctions.

(2)size_ticonv(iconv_tcd,char**inbuf,size_t*inbytesleft,char**outbuf,size_t*outbytesleft)

ThisfunctionreadscharactersfrominbufandconvertsThenoutputtooutbuf,inbytesleftisusedtorecordthenumberofcharactersthathavenotbeenconverted,andoutbytesleftisusedtorecordtheremainingspaceoftheoutputbuffer.(3)inticonv_close(iconv_tcd)

Thisfunctionisusedtoclosetheconversionhandleandreleaseresources.

Example1:ConversionexampleprogramimplementedinClanguage

/*fc:CodeconversionexampleCprogram*/

#include

#defineOUTLEN255

main()

{

char*in_utf8="姝e?ㄥ??瑁?"

char*in_gb2312="Installing"

charout[OUTLEN]

//unicodecodetogb2312code

rc=u2g(in_utf8,strlen(in_utf8),out,OUTLEN)

printf("unicode-->gb2312out=%sn",out)

//gb2312codeconvertedtounicodecode

p>

rc=g2u(in_gb2312,strlen(in_gb2312),out,OUTLEN)

printf("gb2312-->unicodeout=%sn",out)

}

//Codeconversion:fromoneencodingtoanother

intcode_convert(char*from_charset,char*to_charset,char*inbuf,intinlen,char*outbuf,intoutlen)

{

iconv_tcd;

intrc;

char**pin=&inbuf;

char**pout=&outbuf;

cd=iconv_open(to_charset,from_charset);

if(cd==0)return-1;

memset(outbuf,0,outlen);

if(iconv(cd,pin,&inlen,pout,&outlen)==-1)return-1;

iconv_close(cd);

return0;

}

//ConvertUNICODEcodetoGB2312code

intu2g(char*inbuf,intinlen,char*outbuf,intoutlen)

{

returncode_convert("utf-8","gb2312",inbuf,inlen,outbuf,outlen);

}

//GB2312codeconvertedtoUNICODEcode

intg2u(char*inbuf,size_tinlen,char*outbuf,size_toutlen)

{

returncode_convert("gb2312","utf-8",inbuf,inlen,outbuf,outlen);

}

Example2:ConversionexampleprogramimplementedinC++language

/*f.cpp:CodeconversionexampleC++program*/

#include

#include

#defineOUTLEN255

p>

usingnamespacestd;

//Codeconversionoperationclass

classCodeConverter{

private:

iconv_tcd;

public:

//Construction

CodeConverter(constchar*from_charset,constchar*to_charset){

cd=iconv_open(to_charset,from_charset;

}

//Destructuring

~CodeConverter(){

iconv_close(cd);

p>

}

//Convertoutput

intconvert(char*inbuf,intinlen,char*outbuf,intoutlen){

char**pin=&inbuf;

char**pout=&outbuf;

memset(outbuf,0,outlen);

returniconv(cd,pin,(size_t*)&inlen,pout,(size_t*)&outlen);

}

};

intmain(intargc,char**argv)

{

char*in_utf8="姝e?ㄥ??瑁?";

char*in_gb2312="Installing";

charout[OUTLEN];

//utf-8-->gb2312

CodeConvertercc=CodeConverter("utf-8","gb2312");

cc.convert(in_utf8,strlen(in_utf8),out,OUTLEN);

cout<<"utf-8-->gb2312in="<<in_utf8<<",out="<<out<<endl;

//gb2312-->utf-8

CodeConvertercc2=CodeConverter("gb2312","utf-8");

cc2.convert(in_gb2312,strlen(in_gb2312),out,OUTLEN);

cout<<"gb2312-->utf-8in="<<in_gb2312<<",out="<<out<<endl;

}

Second,useiconvcommandforencodingconversion

WhenencodingconversiononLINUX,youcaneitheruseiconvfunctionfamilyprogrammingoriconvcommandToachieve,butthelatterisforthefile,thatis,thespecifiedfileisconvertedfromoneencodingtoanother.

Theiconvcommandisusedtoconverttheencodingofthespecifiedfile,thedefaultoutputistothestandardoutputdevice,andtheoutputfilecanalsobespecified.

Usage:iconv[Options...][File...]

Thefollowingoptionsareavailable:

Input/outputformatspecification:

-f,--from-code=Namerawtextencoding

-t,--to-code=Nameoutputencoding

Information:

-l,--listListallknowncharactersets

Outputcontrol:

-cIgnoreinvalidcharactersfromtheoutput

-o,--output=FILEoutputfile

-s,--silentturnoffwarning

--verboseprintprogressinformation

-?,--helpgiveListthesystemhelplist

--usageGivebriefusageinformation

-V,--versionPrinttheprogramversionnumber

Example:

iconv-futf-8-tgb2312aaa.txt>bbb.txt

Thiscommandreadstheaaa.txtfile,convertsfromutf-8encodingtogb2312encoding,anditsoutputisdirectedTothebbb.txtfile.

Summary:LINUXprovidesuswithapowerfulencodingconversiontool,whichbringsusconvenience.

This article is from the network, does not represent the position of this station. Please indicate the origin of reprint
TOP