introduction
Distributedstoragesystem istostoredatainmultipleindependentdevices.Thetraditionalnetworkstoragesystemusesacentralizedstorageservertostorealldata.Thestorageserverbecomesthebottleneckofsystemperformanceandthefocusofreliabilityandsecurity, whichcannotmeettheneedsoflarge-scalestorageapplications.Thedistributednetworkstoragesystemadoptsanexpandablesystemstructure, usesmultiplestorageserverstosharethestorageload, anduseslocationserverstolocatestorageinformation.Itnotonlyimprovesthereliability, disponibilité, andaccessefficiencyofthesystem, itisalsoeasytoexpand.
Technologie de clé
Gestion des métadonnées
Inthebigdataenvironment, thevolumeofmetadataisalsoverylarge.Theaccessperformanceisthekeytotheperformanceoftheentiredistributedfilesystem.Commonmetadatamanagementcanbedividedintocentralizedanddistributedmetadatamanagementarchitectures.Thecentralizedmetadatamanagementarchitectureusesasinglemetadataserver, whichissimpletoimplement.Butthereareproblemssuchassinglepointoffailure.Thedistributedmetadatamanagementarchitecturedispersesmetadataonmultiplenodes.Furthermore, theperformancebottleneckofthemetadataserverissolved.Italsoimprovesthescalabilityofthemetadatamanagementarchitecture, buttheimplementationismorecomplicatedandtheproblemofmetadataconsistencyisintroduced.Inaddition, thereisadistributedarchitecturewithoutametadataserver, whichorganizesdatathroughonlinealgorithmsanddoesnotrequireadedicatedmetadataserver.Butthisarchitectureisverydifficulttoguaranteedataconsistency.Theimplementationismorecomplicated.Thefiledirectorytraversaloperationisinefficientandlacksthefilesystemglobalmonitoringandmanage fonction mentale.
SystèmeElasticexpansionTechnologie
Inthebigdataenvironment, thedatascaleandcomplexityincreaseveryrapidly, whichrequireshighsystemexpansionperformance..Torealizethehighscalabilityofthestoragesystem, twoimportantissuesmustbesolvedfirst, includingthedistributionofmetadataandthetransparentmigrationofdata.Thedistributionofmetadataismainlyrealizedthroughstaticsubtreepartitioningtechnology, thelatterfocusesontheoptimizationofdatamigrationalgorithms.Inaddition, thebigdatastoragesystemishuge.Thenodefailurerateishigh, socertainadaptivemanagementfunctionsneedtobecompleted.Thesystemmustbeabletoestimatethenumberofnodesrequiredbasedontheamountofdataandtheworkloadofcalculations, anddynamicallymovedatabetweennodes.Toachieveloadbalancing; atthesametime.Whenanodefails, thedatamustbeabletoberestoredthroughamechanismsuchasacopy, withoutaffectingtheupper-layerapplication.
La technologie d'optimisation dans la hiérarchie de stockage
Whenbuildingastoragesystem.Itneedstobeconsideredbasedoncostandperformance.Therefore, storagesystemsusuallyusemultiplelayersofstoragedeviceswithdifferentcostperformancetoformastoragehierarchy.Thescaleofbigdataislarge, sobuildinganefficientandreasonablestoragehierarchycanreducesystemenergyconsumptionandconstructioncostswhileensuringsystemperformance, andusetheprincipleofdataaccesslocality.Thestoragehierarchycanbeoptimizedfromtwoaspects.Fromtheperspectiveofimprovingperformance, youcananalyzeapplicationcharacteristics, identifyhotdataandcacheorprefetchit, andimproveaccessperformancethroughefficientcacheprefetchalgorithmsandreasonablecachecapacityratios.Fromtheperspectiveofcostreduction, theuseofinformationlifecyclemanagementmethodstomigratecolddatawithlowaccessfrequencytolow-speedandcheapstoragedevicescangreatlyreducesystemconstructioncostsandenergyconsumptionattheexpenseofoverallsystemperformance.
La technologie d'optimisation du stockage pour les applications et les charges
Thetraditionaldatastoragemodelneedstosupportasmanyapplicationsaspossible, soitneedstohavegoodversatility.Bigdatahasthecharacteristicsoflarge échelle, highdynamics, andfastprocessing.Thegeneraldatastoragemodelisusuallynotthemodelthatcanimprovetheapplicationperformancethemost.Thebigdatastoragesystempaysmuchmoreattentiontotheperformanceofupper-layerapplicationsthanthepursuitofversatility.Tooptimizestorageforapplicationsandloadsistocoupledatastoragewithapplications.Simplifyorexpandthefunctionsofthedistributedfilesystem, customizeanddeeplyoptimizethefilesystemaccordingtospecificapplications, specificloads, andspecificcomputingmodels, sothatapplicationscanachievethebestperformance.ThistypeofoptimizationtechnologymanageslargedataexceedingpetabytesontheinternalstoragesystemsofInternetcompaniessuchasGoogleandFacebook, andcanachieveveryhighperformance.
Facteurs à considérer
Cohérence
Thedistributedstoragesystemneedstousemultipleserverstostoredatatogether.Asthenumberofserversincreases, theserverfails.Theprobabilityisalsoincreasing.Inordertoensurethatthesystemisstillavailableintheeventofaserverfailure.Thegeneralpracticeistodivideapieceofdataintomultiplepiecesandstorethemindifferentservers.However, duetotheexistenceoffailuresandparallelstorage, theremaybeinconsistenciesbetweenmultiplecopiesofthesamedata.Thenatureofensuringthatthedataofmultiplecopiesiscompletelyconsistentisreferredtohereasconsistency.
Disponibilité
Un système de stockage distribué nécessite plusieurs serveurs pour fonctionner en même temps. Lorsque le nombre de serveurs augmente, il est inévitable que certains d'entre eux tombent en panne. Espérons que cette situation n'aura pas trop d'impact sur l'ensemble du système.
Tolérance aux pannes de partition
Plusieurs serveurs du système de stockage distribué sont connectés via le réseau. Cependant, nous ne pouvons garantir que le réseau est toujours dégagé.