Introduction
Distributedstoragesystemistostoredatainmultipleindependentdevices.Thetraditionalnetworkstoragesystemusesacentralizedstorageservertostorealldata.Thestorageserverbecomesthebottleneckofsystemperformanceandthefocusofreliabilityandsecurity,whichcannotmeettheneedsoflarge-scalestorageapplications.Thedistributednetworkstoragesystemadoptsanexpandablesystemstructure,usesmultiplestorageserverstosharethestorageload,anduseslocationserverstolocatestorageinformation.Itnotonlyimprovesthereliability,availability,andaccessefficiencyofthesystem,itisalsoeasytoexpand.
Keytechnology
Metadatamanagement
Inthebigdataenvironment,thevolumeofmetadataisalsoverylarge.Theaccessperformanceisthekeytotheperformanceoftheentiredistributedfilesystem.Commonmetadatamanagementcanbedividedintocentralizedanddistributedmetadatamanagementarchitectures.Thecentralizedmetadatamanagementarchitectureusesasinglemetadataserver,whichissimpletoimplement.Butthereareproblemssuchassinglepointoffailure.Thedistributedmetadatamanagementarchitecturedispersesmetadataonmultiplenodes.Furthermore,theperformancebottleneckofthemetadataserverissolved.Italsoimprovesthescalabilityofthemetadatamanagementarchitecture,buttheimplementationismorecomplicatedandtheproblemofmetadataconsistencyisintroduced.Inaddition,thereisadistributedarchitecturewithoutametadataserver,whichorganizesdatathroughonlinealgorithmsanddoesnotrequireadedicatedmetadataserver.Butthisarchitectureisverydifficulttoguaranteedataconsistency.Theimplementationismorecomplicated.Thefiledirectorytraversaloperationisinefficientandlacksthefilesystemglobalmonitoringandmanagementfunction.
Systemelasticexpansiontechnology
Inthebigdataenvironment,thedatascaleandcomplexityincreaseveryrapidly,whichrequireshighsystemexpansionperformance..Torealizethehighscalabilityofthestoragesystem,twoimportantissuesmustbesolvedfirst,includingthedistributionofmetadataandthetransparentmigrationofdata.Thedistributionofmetadataismainlyrealizedthroughstaticsubtreepartitioningtechnology,thelatterfocusesontheoptimizationofdatamigrationalgorithms.Inaddition,thebigdatastoragesystemishuge.Thenodefailurerateishigh,socertainadaptivemanagementfunctionsneedtobecompleted.Thesystemmustbeabletoestimatethenumberofnodesrequiredbasedontheamountofdataandtheworkloadofcalculations,anddynamicallymovedatabetweennodes.Toachieveloadbalancing;atthesametime.Whenanodefails,thedatamustbeabletoberestoredthroughamechanismsuchasacopy,withoutaffectingtheupper-layerapplication.
Optimizationtechnologyinthestoragehierarchy
Whenbuildingastoragesystem.Itneedstobeconsideredbasedoncostandperformance.Therefore,storagesystemsusuallyusemultiplelayersofstoragedeviceswithdifferentcostperformancetoformastoragehierarchy.Thescaleofbigdataislarge,sobuildinganefficientandreasonablestoragehierarchycanreducesystemenergyconsumptionandconstructioncostswhileensuringsystemperformance,andusetheprincipleofdataaccesslocality.Thestoragehierarchycanbeoptimizedfromtwoaspects.Fromtheperspectiveofimprovingperformance,youcananalyzeapplicationcharacteristics,identifyhotdataandcacheorprefetchit,andimproveaccessperformancethroughefficientcacheprefetchalgorithmsandreasonablecachecapacityratios.Fromtheperspectiveofcostreduction,theuseofinformationlifecyclemanagementmethodstomigratecolddatawithlowaccessfrequencytolow-speedandcheapstoragedevicescangreatlyreducesystemconstructioncostsandenergyconsumptionattheexpenseofoverallsystemperformance.
Storageoptimizationtechnologyforapplicationsandloads
Thetraditionaldatastoragemodelneedstosupportasmanyapplicationsaspossible,soitneedstohavegoodversatility.Bigdatahasthecharacteristicsoflarge-scale,highdynamics,andfastprocessing.Thegeneraldatastoragemodelisusuallynotthemodelthatcanimprovetheapplicationperformancethemost.Thebigdatastoragesystempaysmuchmoreattentiontotheperformanceofupper-layerapplicationsthanthepursuitofversatility.Tooptimizestorageforapplicationsandloadsistocoupledatastoragewithapplications.Simplifyorexpandthefunctionsofthedistributedfilesystem,customizeanddeeplyoptimizethefilesystemaccordingtospecificapplications,specificloads,andspecificcomputingmodels,sothatapplicationscanachievethebestperformance.ThistypeofoptimizationtechnologymanageslargedataexceedingpetabytesontheinternalstoragesystemsofInternetcompaniessuchasGoogleandFacebook,andcanachieveveryhighperformance.
Factorstoconsider
Consistency
Thedistributedstoragesystemneedstousemultipleserverstostoredatatogether.Asthenumberofserversincreases,theserverfails.Theprobabilityisalsoincreasing.Inordertoensurethatthesystemisstillavailableintheeventofaserverfailure.Thegeneralpracticeistodivideapieceofdataintomultiplepiecesandstorethemindifferentservers.However,duetotheexistenceoffailuresandparallelstorage,theremaybeinconsistenciesbetweenmultiplecopiesofthesamedata.Thenatureofensuringthatthedataofmultiplecopiesiscompletelyconsistentisreferredtohereasconsistency.
Availability
Adistributedstoragesystemrequiresmultipleserverstoworkatthesametime.Whenthenumberofserversincreases,itisinevitablethatsomeofthemwillfail.Wehopethatthissituationwillnotcausetoomuchimpactontheentiresystem.Afterapartofthenodesinthesystemfails,thesystemasawholedoesnotaffecttheread/writerequestsofthecustomerserviceside,whichiscalledavailability.
Partitionfaulttolerance
Multipleserversinthedistributedstoragesystemareconnectedthroughthenetwork.However,wecannotguaranteethatthenetworkisalwaysunobstructed.Distributedsystemsneedtobefault-toleranttodealwithproblemscausedbynetworkfailures.Asatisfactorysituationisthatwhenanetworkisbrokendownintomultiplepartsduetoafailure,thedistributedstoragesystemcanstillwork.