GraphLab
-
Upload
tushar-sudhakar-jee -
Category
Software
-
view
101 -
download
0
Transcript of GraphLab
A presentation by Tushar Sudhakar Jee
A Distributed Framework for Machine Learning and Data Mining in the Cloud
BulkSynchronousParallel(BSP)•AbridgingmodelfordesigningparallelAlgorithms(eg:messagerelaying).
•ImplementedbyGooglePregel2010.•Themodelconsistsof:
1.Concurrentcomputation:Everyparticipatingprocessormayperformlocalcomputations.2.Communication:Theprocesses
exchangedatabetweenthemselvestofacilitateremotedatastoragecapabilities.
3.Barriersynchronisation :Whenaprocessreachesthispoint(thebarrier),itwaitsuntilallotherprocesseshavereachedthesamebarrier.
BulkSynchronousParallel(BSP)•Advantages:
1.NoworriesaboutRaceconditions.2.BarrierguaranteesDataconsistency.3.Simplertomakefaulttolerant,savedataonbarrier.
•Disadvantages:1.Costlyperformancepenaltiessinceruntimeofeachphaseisdecidedby
slowestmachine.2.Failtosupportthepropertiesofasynchronous,graph-parallelanddynamic
computation,criticaltoMachineLearningandDataMiningCommunity.
Asynchronous processing•ImplementedbyGraphLab2010,2012.•Advantages:
1.Directlytargetspropertiesofasynchronous,graph-parallelanddynamiccomputation,criticaltoMachineLearningandDataMiningCommunity.
2.Involvesupdatingparametersusingmostrecentvaluesasinput,mostcloselyrelatedtoSequentialexecution.
•Disadvantages:1.Raceconditions canhappenallthetime.
WhyGraphLab?•ImplementingMachineLearningandDataMiningalgorithmsinparalleloncurrentsystemslikeHadoop,MPIandMapReduceisprohibitivelycomplexandcostly.
•Ittargets asynchronous,dynamic,graph-parallelcomputationintheshared-memorysettingasneededbytheMLDMcommunity.
MLDMAlgorithmProperties
•GraphStructuredComputation•AsynchronousIterativeComputation•DynamicComputation•Serializability
GraphStructuredComputation•RecentadvancesinMLDMfocusonmodelingthedependenciesbetweendata,asitallowsextractingmoresignalfromnoisydata.Forexample,modelingthedependenciesbetweensimilarshoppersallowsustomakebetterproductrecommendationsthantreatingtheminisolation.
•Consequently,therehasbeenarecentshifttowardgraph-parallelabstractionslikePregelandGraphLabthatnaturallyexpressComputationaldependencies.
AsynchronousIterativeComputation•Synchronoussystems updateallparameterssimultaneously(inparallel)usingparametervaluesfromtheprevioustimestepasinput.•Asynchronous systemsupdateparametersusingthemostrecentparametervaluesasinput.
ManyMLDMalgorithmsbenefitfromasynchronoussystems.
DynamicComputation
•Dynamic computationallowsthealgorithmtosavetimesinceitonlyrecomputesverticeswithrecentlyupdatedneighbors.
•Static computationrequiresthealgorithmtoupdateallverticesequallyoften.Thiswastestimerecomputingverticesthathavealreadyconverged.
Serializability
•Serializabilityensuresthatallparallelexecutionshaveanequivalentsequentialexecution,therebyeliminatingraceconditions.•MLDMalgorithmsconvergefasterifserializabilityisensured.Gibbssampling,requiresserializabilityforcorrectness.
DistributedGraphLabAbstraction
•DataGraph•Updatefunction•SyncOperation•GraphLabExecutionModel•EnsuringSerializability
DataGraph•TheGraphLababstractionstorestheprogramstateasadirectedgraphcalledthedatagraph,G=(V,E,D),whereD istheuserdefineddata.
Datahererepresentsmodelparameters,algorithmstate,andstatisticaldata.
Data graph
DataGraph(PageRankexample):
Updatefunction•Anupdatefunction isastatelessprocedurethatmodifiesthedatawithinthescopeofavertexandschedulesthefutureexecutionoftheupdatefunctionsonothervertices.
•Thefunctiontakesavertexv anditsscopeSvandreturnsnewversionsofthedatainthescopeaswellasasetverticesT:
Update: f(v,Sv)->(Sv,T)
Updatefunction(PageRankexample):
•Theupdatefunction forPageRankcomputesaweightedsumofthecurrentranksofneighboringverticesandassignsitastherankofthecurrentvertex.
•TheNeighborsarescheduledforupdateonlyifthevalueofthecurrentvertexcrossesthethreshold.
Updatefunction(PageRankexample):
SyncOperation•Itisanassociativecommutativesumdefinedoverallscopesinthegraph.•SupportsnormalizationcommoninMLDMalgorithms.•Runscontinuouslyinthebackgroundtomaintainupdatedestimatesoftheglobalvalue.•Ensuringserializabilityofthesyncoperationiscostlyandrequiressynchronization.
TheGraphLabExecutionModel•ThemodelallowstheGraphLabruntimeenginetodeterminebestorderinwhichtorunvertices.•SincemanyMLDMalgorithmsbenefitfromprioritization,GraphLababstractionallowsuserstoassignprioritiestoverticesinT.
EnsuringSerializability•Itimpliesthatforeveryparallelexecution,thereexistsasequentialexecutionofupdatefunctionsthatwouldgivethesameresults.
•ForSerializabilityensurenooverlappinginscopesofconcurrentlyexecutingscopesofupdatefunctions.
•Thegreatertheconsistency,thelowertheparallelism.
EnsuringSerializability(FullConsistency):•Thismodelensuresthatscopesofconcurrentlyexecutingupdatefunctionsdonotoverlap.
•Updatefunctionhascomplete read/writeaccesstoentirescope.•Concurrentlyexecutingupdatefunctionsmustbeatleasttwoverticesapartlimitingparallelism.
EnsuringSerializability(EdgeConsistency):•Thismodelensureseachupdatefunctionhasexclusiveread/writeaccesstoitsvertexandadjacentedges,butreadonlyaccesstoadjacentvertices.
•Increases parallelismbyallowingupdatefunctionswithslightlyoverlappingscopestoruninparallel.
EnsuringSerializability(VertexConsistency):
•Thismodelprovideswriteaccess tothecentralvertexdata.• Itallowsallupdatefunctionstoberuninparallel,providingmaximumparallelism.
•Itistheleastconsistent.
UsingGraphLab(K-means):•UsingGraphLabCreateK-meanswithdatasetfromtheJune2014KagglecompetitiontoclassifyschizophrenicsubjectsbasedonMRIscans.
•Theoriginaldataconsistsoftwosetsoffeatures:functionalnetworkconnectivity(FNC)featuresandsource-basedmorphometry(SBM)features,incorporatedintoasingleSFrame withSFrame.join.•DatadownloadedfrompublicAWSS3bucket.
DistributedGraphLabDesign•DistributedDataGraph•DistributedGraphLabEngines•Faulttolerance•Systemdesign
DistributedDataGraph•Agraphispartitionedintok partswherek ismuchgreaterthanthenumberofmachines.
•Eachpart,calledanatom isstoredasaseparatefileonadistributedstoragesystem(AmazonS3).
DistributedGraphLabEngines:1.EmulatestheGraphLabexecutionmodelandisresponsiblefor:
•Executingupdatefunctions.
•Executingsyncoperations.
•MaintainingthesetofscheduledverticesT.
•Ensuringserializabilitywithrespecttotheappropriateconsistencymodel2.Types:
•ChromaticEngine•DistributedLockingEngine
•Itusesvertexcoloring tosatisfytheedgeconsistencymodelbyexecutingsynchronouslyallverticesofthesamecolorinthevertexsetTbeforeproceedingtothenextcolor.•Fullconsistencymodelissatisfiedbyensuringthatnovertexsharesthesamecolorasanyofitsdistancetwoneighbors.•Vertexconsistency modelissatisfiedbyassigningallverticesthesamecolor.•ItexecutesthesetofscheduledverticesTpartiallyasynchronously.
ChromaticEngine:
Edge Consistency model using Chromatic Engine.
DistributedLockingEngine1.Whyuseit?
•Chromaticenginedoesnotprovidesufficientschedulingflexibility.•Chromaticenginepresupposesavailabilityofgraphcoloringwhichmightnotalwaysbereadilyavailable.
2.TheDistributedLockingEngine usesmutualexclusionbyassociatingareaders-writerslockwitheachvertex.3.Vertexconsistency isachievedbyacquiringawrite-lockonthecentralvertexofeachrequestedscope.
4.Fullconsistencyisachievedbyacquiringwrite-locksonthecentralvertexandalladjacentvertices.5.Edgeconsistency isachievedbyacquiringawrite-lockonthecentralvertexandreadlocksonadjacentvertices.
DistributedLockingEngine(Pipelinedlocking)
•Eachmachinemaintainsapipeline ofverticesforwhichlockshavebeenrequested,butnotyetfulfilled.•Thepipeliningsystemusescallbacksinsteadofreaders/writerlockssincethelatterwouldhaltthepipeline.
•Pipeliningreduceslatencybysynchronizinglockeddataimmediatelyafteramachinecompletesitslocallock.
Chandy-LamportSnapshotAlgorithm
FaultTolerance•UsingadistributedcheckpointmechanismcalledSnapshotUpdate faulttoleranceisintroducedinGraphLab.
•SnapshotUpdatecanbedeployedsynchronouslyorasynchronously.
•Asynchronoussnapshotsaremoreefficientandguaranteeconsistentsnapshotunderthefollowingconditions:a)Edgeconsistencyisusedonallupdatefunctions.b)Schedulecompletesbeforethescopeisunlocked.c)SnapshotUpdateisprioritizedoverotherupdates.
Synchronoussnapshothavethe“flatline”whereasasynchronoussnapshotsallowcomputationtoproceed.
Systemdesign● IntheInitializationphase theatomfilerepresentationofdatagraphis
constructed.● IntheGraphLabExecutionPhase atomfilesareassignedtoindividual
executionenginesfromtheDFS.
Systemdesign(LockingEngineDesign)•PartitionofdistributedgraphmanagedwithinLocalGraphstorage.•Cacheusedtoprovideaccesstoremotedata.•SchedulermanagesverticesinTassignedtotheprocess.•Eachblockmakesuseofblockbelowit.
Applications
● NetflixMovieRecommendation● VideoCo-segmentation(Coseg)● NamedEntityRecognition(NER)
NetflixMovieRecommendation● Itmakesuseofcollaborativefilteringtopredict
themovieratingsforeachuserbasedontheratingsofsimilarusers.
● Thealternatingleastsquares(ALS)algorithmisusedtoiterativelycomputealow-rankmatrixfactorization.
● ThesparsematrixR definesabipartitegraphconnectingeachuserwiththemoviesthattheyrated.
● Vertices areusers(rowsU)andmovies(columnsV)andedges containtheratingsforauser-moviepair.
● GraphLabupdatefunctionpredictsratings(edge-values).
•ALSrotatesbetweenfixingoneoftheunknownsuiorvj.Whenoneisfixedtheothercanbecomputedbysolvingthe least-squaresproblem. TheALSalgorithmisas:
•R={rij }nu ×nv isuser-moviematrixwhereeachitemRijrepresentstheratingscoreofitemjbyuseriwhereri,j =<ui,vj>∀i,j.
•Urepresenttheuserfeaturematrix andVrepresentthemoviefeaturematrix.•Dimensionofthefeaturespace(d) isasystemparameterthatisdeterminedbyahold-outdatasetorcross- validation.• Thelowrankapproximationproblem isthusformulatedasfollowstolearnthefactorvectors(ui,vj):
Wherepi,j =<ui,vj >isthepredictedrating,λistheregularizationcoefficientandKistheSetofknownratingsfromtheSparsematrixR.
NetflixScalingwithIntensity•Plottedisthespeedupachievedforvaryingvaluesofdimensionality(d).•Extrapolatingtoobtainthetheoreticallyoptimalruntime,theestimatedoverheadofDistributedGraphLabat64machinesis12xford=5and4.9xford=100.
NetflixComparisons•GraphLabimplementationwascomparedagainstHadoopandMPIusingbetween4to64machines.
•GraphLabperformsbetween40-60timesfasterthanHadoop.•ItalsoslightlyoutperformedtheoptimizedMPIimplementation.
VideoCo-segmentation(Coseg)•Videoco-segmentationautomaticallyidentifiesandclustersspatio-temporalsegmentsofvideothatsharesimilartextureandcolorcharacteristics.
•Framesofhigh-resolutionvideoarepre-processedbycoarseningeachframetoaregulargridofrectangularsuper-pixels.
•TheCoSegalgorithmpredictsthebestlabel(e.g.,sky,building,grassetc.)foreachsuperpixelusingGaussianMixtureModel(GMM)inconjunctionwithLoopyBeliefPropagation(LBP).
• ThetwoalgorithmsarecombinedtoformanExpectation-MaximizationproblemalternatingbetweenLBPtocomputethelabelforeachsuper-pixelgiventheGMMandthenupdatingtheGMMgiventhelabelsfromLBP.
•TheGraphLabupdatefunctionexecutestheLBPlocaliterativeupdatewhereupdatesexpectedtochangevaluessignificantlyare prioritized.TheBPupdatefunctionisas:
•Thelockingengineprovidesnearlyoptimalweakscaling:theruntimedoesnotincreasesignificantlyasthesizeofthegraphincreasesproportionatelywiththenumberofmachines.
•Itwasalsoobservedthatincreasingthepipelinelengthincreasedperformancesignificantlyandcompensatedforpoorpartitioning.
NamedEntityRecognition(NER)• NamedEntityRecognitionisthetaskofdeterminingthetype(e.g.,Person,Place,orThing)ofanoun-phrase (e.g.Obama,Chicago,orCar)fromitscontext (e.g.“President..”,“Livesnear..”,or“boughta..”).•TheDatagraphforNERisbipartitewithonesetofverticescorrespondingtonoun-phrasesandtheothertocontexts.•TheCoEMvertexprogramupdatesestimateddistributionforavertex(eithernoun-phraseorcontext)basedonthecurrentdistributionforneighboringvertices.
•BelowisCoEMalgorithminwhichadjacentverticesarerescheduled,ifthetypedistributionsatavertexchangesbymorethansomepredefinedthreshold.
NERComparisons•GraphLabimplementationofNERachieved20-30xspeedupoverHadoopandwascomparabletotheoptimizedMPI.•GraphLabscaledpoorlyachievingonlya3ximprovementusing16xmoremachines,majorlyduetolargevertexdatasize,denseconnectivity,andpoorpartitioning.
Comparison(Netflix/CoSeg/NER)
Comparison(Netflix/CoSeg/NER)
•Overallnetworkutilization:NetflixandCoSeghaveverylowbandwidthrequirementswhileNERappearstosaturatewhen#machines>24.•Snapshotoverhead:Overheadofperformingacompletesnapshotofthegraphevery|V|updatesishighestforCoSeg,whenrunningona64machinecluster.
EC2CostEvaluation
• Theprice-runtimecurves(log-logscale)forGraphLabandHadoopillustratethecostofdeployingeithersystem.
•FortheNetflixapplication,GraphLabisabouttwoordersofmagnitudemorecost-effectivethanHadoop.
Conclusion•Inthepaperwetalkedabout:
•RequirementofMLDMalgorithms.•Graphlabextendedtothedistributedsettingby:
•Relaxingtheschedulerequirements•Introducinganewdistributeddata-graph•Introducingnewexecutionengines•Introducingfaulttolerance.
•DistributedGraphlaboutperformsHadoopby20-60xandiscompetitivewithtailoredMPIimplementations.
FutureWork• Extendingabstractionandruntimetosupportdynamicallyevolvinggraphsandexternalstorageingraphdatabases.•FurtherresearchintotheoryandapplicationofDynamicasynchronousgraphparallelcomputationthushelpingindefiningemergingfieldofbiglearning.
References•Y.Low,J.Gonzalez,A.Kyrola,D.Bickson,C.Guestrin,andJ.M.Hellerstein.Graphlab:Anewparallelframeworkformachinelearning.InUAI,pages340–349,2010.
•Y.Low,J.Gonzalez,A.Kyrola,D.Bickson,C.Guestrin,andJ.M.Hellerstein.DistributedGraphlab:AFrameworkforMachineLearningandDataMiningintheCloud.
•Thesis:ParallelandDistributedSystemsforProbabilisticReasoningJosephGonzalez,CMU-ML-12-111,December21,2012.
•GraphLabpptbyYuchengLow,JosephGonzalez,AapoKyrola,DannyBickson,CarlosGuestrin.
•Y.Zhou,D.Wilkinson,R.Schreiber,andR.Pan.Large-scaleparallelcollaborativefilteringforthenetflixprize.InAAIM,pages337–348,2008.
•ChristopherR.Aberger.Recommender:AnAnalysisofCollaborativeFilteringTechniques.•GraphLabcreate:http://graphlab.org.•CS-425DistributedSystemsppt:Chandy-LamportSnapshotAlgorithmandMulticastCommunicationbyKlaraNahrstedt
• CSE547GraphParallelProblemsSynchronousvsAsynchronousComputationpdfbyEmilyFox.