Summary
Foranyformofcommunication,compresseddatacommunicationcanonlyworkwhenthesenderandreceiveroftheinformationcanunderstandtheencodingmechanism.Forexample,thearticleismeaningfulonlyiftherecipientknowsthatthearticleneedstobeexplainedinEnglishcharacters.Similarly,onlywhentherecipientknowstheencodingmethodcanheunderstandthecompresseddata.Somecompressionalgorithmstakeadvantageofthisfeaturetoencryptdataduringthecompressionprocess,suchasusingpasswordencryptiontoensurethatonlyauthorizedpartiescanobtainthedatacorrectly.
Datacompressioncanbeachievedbecausemostreal-worlddatahasstatisticalredundancy.Forexample,theletter"e"ismorecommonlyusedinEnglishthantheletter"z",anditisveryunlikelythattheletter"q"willbefollowedby"z".Losslesscompressionalgorithmsusuallymakeuseofstatisticalredundancy,sothattheycanrepresentthesender'sdatamoreconcisely,butstillcompletely.
Ifacertaindegreeoffidelitylossisallowed,thenfurthercompressioncanbeachieved.Forexample,whenpeoplelookatpicturesorTVscreens,theymaynotnoticethatsomedetailsareimperfect.Similarly,twoaudiorecordingsamplesequencesmaysoundthesame,butinfacttheyarenotexactlythesame.Lossycompressionalgorithmsusefewerbitstorepresentimages,video,oraudiowithminordifferences.
Becauseitcanhelpreducetheconsumptionofexpensiveresourcessuchasharddiskspaceandconnectionbandwidth,compressionisveryimportant.However,compressionrequiresinformationprocessingresources,whichmayalsobeexpensive.Therefore,thedesignofthedatacompressionmechanismrequiresacompromisebetweencompressioncapability,distortion,requiredcomputingresources,andotherdifferentfactorsthatneedtobeconsidered.
Somemechanismsarereversible,sothattheoriginaldatacanberestored.Thismechanismiscalledlosslessdatacompression;othermechanismsallowacertaindegreeofdatalossinordertoachieveahighercompressionrate.ThismechanismThisiscalledlossydatacompression.
However,thereareoftensomefilesthatcannotbecompressedbylosslessdatacompressionalgorithms.Infact,anycompressionalgorithmfordatathatdoesnotcontaindiscernablepatternscannotbecompressed.Attemptingtocompressdatathathasbeencompressedusuallyresultsinexpandeddata,andattemptstocompressencrypteddatausuallyalsoresultinthisresult.
Infact,lossydatacompressionwilleventuallyreachthepointwhereitwon’twork.Let'stakeanextremeexample.Eachtimethecompressionalgorithmremovesthelastbyteofthefile,afterthisalgorithmcontinuestocompressuntilthefilebecomesempty,thecompressionalgorithmwillnotcontinuetowork.
Classification
Therearemanywaysofdatacompression.Therearedifferentdatacompressionmethods(thatis,encodingmethods)fordatawithdifferentcharacteristics.Thefollowingareclassifiedfromseveralaspects.
(1)Real-timecompressionandnon-real-timecompression
Forexample,tomakeanIPphonecallistoconvertthevoicesignalintoadigitalsignal,compressitatthesametime,andthentransmititthroughtheInternet.ThisdatacompressionTheprocessisinstantaneous.Real-timecompressionisgenerallyusedinthetransmissionofvideoandaudiodata.Instantcompressioniscommonlyusedinspecializedhardwaredevices,suchascompressioncards.
Non-instantcompressionisoftenusedbycomputerusers.Thiskindofcompressionisperformedwhenneeded,andhasnoinstantaneity.Forexample,compressapicture,anarticle,apieceofmusic,etc.Non-instantcompressiongenerallydoesnotrequirespecialequipment,justinstallandusethecorrespondingcompressionsoftwaredirectlyonthecomputer.
(2)Datacompressionandfilecompression
Infact,datacompressionincludesfilecompression.Dataoriginallyreferstoanydigitalinformation,includingvariousfilesusedincomputers.Butsometimes,datarefersspecificallytosometime-sensitivedata,whichisoftencollected,processedortransmittedinrealtime.Filecompressionrefersspecificallytothecompressionofdatathatwillbestoredonphysicalmediasuchasdisks,suchasthecompressionofanarticledata,apieceofmusicdata,andapieceofprogramcodeddata.
(3)Losslesscompressionandlossycompression
Losslesscompressionusesstatisticalredundancyofdataforcompression.Thetheoreticallimitofdatastatisticalredundancyis2:1to5:1,sothecompressionratiooflosslesscompressionisgenerallylower.Thistypeofmethodiswidelyusedintextdata,programs,andimagedatainspecialapplicationsthatrequireprecisestorageofdatacompression.Thelossycompressionmethodtakesadvantageoftheinsensitivityofhumanvisionandhearingtocertainfrequencycomponentsinimagesandsounds,allowingacertainamountofinformationtobelostduringthecompressionprocess.Althoughtheoriginaldatacannotbecompletelyrestored,thelostparthaslessimpactontheunderstandingoftheoriginalimage,butinexchangeforalargercompressionratio.Lossycompressioniswidelyusedinthecompressionofvoice,imageandvideodata.
Principle
Infact,therearemanydataredundancyinmultimediainformation.Forexample,ifmanypixelsarethesameinthestaticbuildingbackground,blueskyandgreenspaceinanimage,ifstoredpointbypoint,alotofspacewillbewasted.Thisiscalledspatialredundancy.Foranotherexample,intheadjacentsequenceofTVandanimation,onlymovingobjectshavealittlechange,andonlythedifferencepartcanbestored.Thisiscalledtimeredundancy.Inaddition,therearestructuralredundancy,visualredundancy,etc.,whichprovideconditionsfordatacompression.
Inshort,thetheoreticalbasisofcompressionisinformationtheory.Fromtheperspectiveofinformation,compressionistoremovetheredundancyintheinformation,thatis,toremovethedefiniteorinferableinformation,andtoretaintheuncertaininformation,thatis,toreplacetheoriginaldescriptionwithadescriptionthatisclosertotheessenceoftheinformation.Theredundantdescription,theessentialthingistheamountofinformation.
Application
Averysimplecompressionmethodisrun-lengthencoding.Thismethodusessimpleencodingsuchasdataanddatalengthtoreplacethesamecontinuousdata.Thisislosslessdatacompression.Aninstanceof.Thismethodisoftenusedinofficecomputerstomakebetteruseofdiskspace,ortomakebetteruseofbandwidthinacomputernetwork.Forsymbolicdatasuchasspreadsheets,text,executablefiles,etc.,losslessnessisaverycriticalrequirement,becauseinmostcases,evenasingledatabitchangeisunacceptableexceptforsomelimitedcircumstances.
Forvideoandaudiodata,acertaindegreeofqualitydegradationisacceptableaslongastheimportantpartofthedataisnotlost.Byusingthelimitationsofthehumanperceptionsystem,storagespacecanbegreatlysavedandthequalityoftheresultsobtainedisnotsignificantlydifferentfromthequalityoftheoriginaldata.Theselossydatacompressionmethodsusuallyrequireacompromisebetweencompressionspeed,compresseddatasize,andqualityloss.
Lossyimagecompressionisusedindigitalcamerastogreatlyincreasethestoragecapacity,whiletheimagequalityisalmostnotreduced.ThelossyMPEG-2codecvideocompressionusedforDVDalsoachievesasimilarfunction.
Inlossyaudiocompression,psychoacousticmethodsareusedtoremoveinaudibleordifficulttohearcomponentsfromthesignal.Thecompressionofhumanspeechoftenusesmorespecializedtechniques,sopeoplesometimesdistinguish"speechcompression"or"speechcoding"asanindependentresearchfieldfrom"audiocompression".Differentaudioandvoicecompressionstandardsbelongtothecategoryofaudiocodecs.Forexample,voicecompressionisusedforInternettelephony,whileaudiocompressionisusedforCDrippinganddecodingwithMP3players.
Theory
Thetheoreticalbasisofcompressionisinformationtheory(itiscloselyrelatedtoalgorithmicinformationtheory)andrate-distortiontheory.TheresearchworkinthisfieldismainlylaidbyClaudeShannon.Fundamentalpapersinthisareawerepublishedinthelate1940sandearly1950s.DoyleandCarlsonwrotein2000thatdatacompression"hasoneofthesimplestandmostbeautifuldesigntheoriesinallengineeringfields."Cryptographyandcodingtheoryarealsocloselyrelateddisciplines,andtheideaofdatacompressionhasadeepconnectionwithstatisticalinference.
Manylosslessdatacompressionsystemscanberegardedasfour-stepmodels.Lossydatacompressionsystemsusuallycontainmoresteps,suchasprediction,frequencytransformation,andquantization.
Popularalgorithms
Lempel-Ziv(LZ)compressionmethodisoneofthemostpopularlosslessstoragealgorithms.DEFLATEisavariantofLZ,itisoptimizedfordecompressionspeedandcompressionrate,althoughitscompressionspeedmaybeveryslow,PKZIP,gzipandPNGareallusingDEFLATE.LZW(Lempel-Ziv-Welch)isaUnisyspatentuntiltheexpirationdateofthepatentinJune2003.ThismethodisusedforGIFimages.AlsoworthmentioningistheLZR(LZ-Renau)method,whichisthebasisoftheZipmethod.TheLZmethodusesatable-basedcompressionmodel,inwhichtheentriesinthetablearereplacedwithrepeateddatastrings.FormostLZmethods,thistableisdynamicallygeneratedfromtheinitialinputdata.ThistableisoftenmaintainedusingHuffmancoding(forexample,SHRI,LZX).AgoodperformanceLZ-basedencodingmechanismisLZX,whichisusedinMicrosoft'sCABformat.
Algorithmcoding
Thebestcompressiontoolusesthepredictionresultsoftheprobabilitymodelforarithmeticcoding.ArithmeticcodingwasinventedbyJormaRissanenandturnedintoapracticalmethodbyWitten,Neal,andCleary.Thismethodcanachievebettercompressionthanthewell-knownHuffmanalgorithm,anditisverysuitableforadaptivedatacompression.Thepredictionofadaptivedatacompressioniscloselyrelatedtothecontext.ArithmeticcodinghasbeenusedinthebinaryimagecompressionstandardJBIGandthedocumentcompressionstandardDejaVu.ThetextinputsystemDasherisaninversearithmeticencoder.
Type
Datacompressioncanbedividedintotwotypes,oneiscalledlosslesscompression,andtheotheriscalledlossycompression.
Losslesscompressionreferstotheuseofcompresseddataforreconstruction(orrestoring,decompressing),thereconstructeddataisexactlythesameastheoriginaldata;losslesscompressionisusedforsignalsthatrequirereconstructionandWhentheoriginalsignalisexactlythesame.Averycommonexampleisthecompressionofdiskfiles.Thelosslesscompressionalgorithmcangenerallycompressthedataofordinaryfilesto1/2to1/4oftheoriginal.SomecommonlyusedlosslesscompressionalgorithmsareHuffman(Huffman)algorithmandLZW(Lenpel-Ziv&Welch)compressionalgorithm.
Lossycompressionreferstotheuseofcompresseddataforreconstruction.Thereconstructeddataisdifferentfromtheoriginaldata,butitdoesnotaffectpeople'smisunderstandingoftheinformationexpressedbytheoriginaldata.Lossycompressionissuitableforoccasionswherethereconstructedsignaldoesnothavetobeexactlythesameastheoriginalsignal.Forexample,lossycompressioncanbeusedforimageandsoundcompression,becauseitoftencontainsmoredatathanourvisualsystemandauditorysystemcanreceive,andsomedataislostsoasnottoproducethemeaningexpressedbysoundorimage.Misunderstanding,butcangreatlyimprovethecompressionratio.