GraphLab

Post on 21-Feb-2017

101 views 0 download

Transcript of GraphLab

A presentation by Tushar Sudhakar Jee

A Distributed Framework for Machine Learning and Data Mining in the Cloud

BulkSynchronousParallel(BSP)•AbridgingmodelfordesigningparallelAlgorithms(eg:messagerelaying).

•ImplementedbyGooglePregel2010.•Themodelconsistsof:

1.Concurrentcomputation:Everyparticipatingprocessormayperformlocalcomputations.2.Communication:Theprocesses

exchangedatabetweenthemselvestofacilitateremotedatastoragecapabilities.

3.Barriersynchronisation :Whenaprocessreachesthispoint(thebarrier),itwaitsuntilallotherprocesseshavereachedthesamebarrier.

BulkSynchronousParallel(BSP)•Advantages:

1.NoworriesaboutRaceconditions.2.BarrierguaranteesDataconsistency.3.Simplertomakefaulttolerant,savedataonbarrier.

•Disadvantages:1.Costlyperformancepenaltiessinceruntimeofeachphaseisdecidedby

slowestmachine.2.Failtosupportthepropertiesofasynchronous,graph-parallelanddynamic

computation,criticaltoMachineLearningandDataMiningCommunity.

Asynchronous processing•ImplementedbyGraphLab2010,2012.•Advantages:

1.Directlytargetspropertiesofasynchronous,graph-parallelanddynamiccomputation,criticaltoMachineLearningandDataMiningCommunity.

2.Involvesupdatingparametersusingmostrecentvaluesasinput,mostcloselyrelatedtoSequentialexecution.

•Disadvantages:1.Raceconditions canhappenallthetime.

WhyGraphLab?•ImplementingMachineLearningandDataMiningalgorithmsinparalleloncurrentsystemslikeHadoop,MPIandMapReduceisprohibitivelycomplexandcostly.

•Ittargets asynchronous,dynamic,graph-parallelcomputationintheshared-memorysettingasneededbytheMLDMcommunity.

MLDMAlgorithmProperties

•GraphStructuredComputation•AsynchronousIterativeComputation•DynamicComputation•Serializability

GraphStructuredComputation•RecentadvancesinMLDMfocusonmodelingthedependenciesbetweendata,asitallowsextractingmoresignalfromnoisydata.Forexample,modelingthedependenciesbetweensimilarshoppersallowsustomakebetterproductrecommendationsthantreatingtheminisolation.

•Consequently,therehasbeenarecentshifttowardgraph-parallelabstractionslikePregelandGraphLabthatnaturallyexpressComputationaldependencies.

AsynchronousIterativeComputation•Synchronoussystems updateallparameterssimultaneously(inparallel)usingparametervaluesfromtheprevioustimestepasinput.•Asynchronous systemsupdateparametersusingthemostrecentparametervaluesasinput.

ManyMLDMalgorithmsbenefitfromasynchronoussystems.

DynamicComputation

•Dynamic computationallowsthealgorithmtosavetimesinceitonlyrecomputesverticeswithrecentlyupdatedneighbors.

•Static computationrequiresthealgorithmtoupdateallverticesequallyoften.Thiswastestimerecomputingverticesthathavealreadyconverged.

Serializability

•Serializabilityensuresthatallparallelexecutionshaveanequivalentsequentialexecution,therebyeliminatingraceconditions.•MLDMalgorithmsconvergefasterifserializabilityisensured.Gibbssampling,requiresserializabilityforcorrectness.

DistributedGraphLabAbstraction

•DataGraph•Updatefunction•SyncOperation•GraphLabExecutionModel•EnsuringSerializability

DataGraph•TheGraphLababstractionstorestheprogramstateasadirectedgraphcalledthedatagraph,G=(V,E,D),whereD istheuserdefineddata.

Datahererepresentsmodelparameters,algorithmstate,andstatisticaldata.

Data graph

DataGraph(PageRankexample):

Updatefunction•Anupdatefunction isastatelessprocedurethatmodifiesthedatawithinthescopeofavertexandschedulesthefutureexecutionoftheupdatefunctionsonothervertices.

•Thefunctiontakesavertexv anditsscopeSvandreturnsnewversionsofthedatainthescopeaswellasasetverticesT:

Update: f(v,Sv)->(Sv,T)

Updatefunction(PageRankexample):

•Theupdatefunction forPageRankcomputesaweightedsumofthecurrentranksofneighboringverticesandassignsitastherankofthecurrentvertex.

•TheNeighborsarescheduledforupdateonlyifthevalueofthecurrentvertexcrossesthethreshold.

Updatefunction(PageRankexample):

SyncOperation•Itisanassociativecommutativesumdefinedoverallscopesinthegraph.•SupportsnormalizationcommoninMLDMalgorithms.•Runscontinuouslyinthebackgroundtomaintainupdatedestimatesoftheglobalvalue.•Ensuringserializabilityofthesyncoperationiscostlyandrequiressynchronization.

TheGraphLabExecutionModel•ThemodelallowstheGraphLabruntimeenginetodeterminebestorderinwhichtorunvertices.•SincemanyMLDMalgorithmsbenefitfromprioritization,GraphLababstractionallowsuserstoassignprioritiestoverticesinT.

EnsuringSerializability•Itimpliesthatforeveryparallelexecution,thereexistsasequentialexecutionofupdatefunctionsthatwouldgivethesameresults.

•ForSerializabilityensurenooverlappinginscopesofconcurrentlyexecutingscopesofupdatefunctions.

•Thegreatertheconsistency,thelowertheparallelism.

EnsuringSerializability(FullConsistency):•Thismodelensuresthatscopesofconcurrentlyexecutingupdatefunctionsdonotoverlap.

•Updatefunctionhascomplete read/writeaccesstoentirescope.•Concurrentlyexecutingupdatefunctionsmustbeatleasttwoverticesapartlimitingparallelism.

EnsuringSerializability(EdgeConsistency):•Thismodelensureseachupdatefunctionhasexclusiveread/writeaccesstoitsvertexandadjacentedges,butreadonlyaccesstoadjacentvertices.

•Increases parallelismbyallowingupdatefunctionswithslightlyoverlappingscopestoruninparallel.

EnsuringSerializability(VertexConsistency):

•Thismodelprovideswriteaccess tothecentralvertexdata.• Itallowsallupdatefunctionstoberuninparallel,providingmaximumparallelism.

•Itistheleastconsistent.

UsingGraphLab(K-means):•UsingGraphLabCreateK-meanswithdatasetfromtheJune2014KagglecompetitiontoclassifyschizophrenicsubjectsbasedonMRIscans.

•Theoriginaldataconsistsoftwosetsoffeatures:functionalnetworkconnectivity(FNC)featuresandsource-basedmorphometry(SBM)features,incorporatedintoasingleSFrame withSFrame.join.•DatadownloadedfrompublicAWSS3bucket.

DistributedGraphLabDesign•DistributedDataGraph•DistributedGraphLabEngines•Faulttolerance•Systemdesign

DistributedDataGraph•Agraphispartitionedintok partswherek ismuchgreaterthanthenumberofmachines.

•Eachpart,calledanatom isstoredasaseparatefileonadistributedstoragesystem(AmazonS3).

DistributedGraphLabEngines:1.EmulatestheGraphLabexecutionmodelandisresponsiblefor:

•Executingupdatefunctions.

•Executingsyncoperations.

•MaintainingthesetofscheduledverticesT.

•Ensuringserializabilitywithrespecttotheappropriateconsistencymodel2.Types:

•ChromaticEngine•DistributedLockingEngine

•Itusesvertexcoloring tosatisfytheedgeconsistencymodelbyexecutingsynchronouslyallverticesofthesamecolorinthevertexsetTbeforeproceedingtothenextcolor.•Fullconsistencymodelissatisfiedbyensuringthatnovertexsharesthesamecolorasanyofitsdistancetwoneighbors.•Vertexconsistency modelissatisfiedbyassigningallverticesthesamecolor.•ItexecutesthesetofscheduledverticesTpartiallyasynchronously.

ChromaticEngine:

Edge Consistency model using Chromatic Engine.

DistributedLockingEngine1.Whyuseit?

•Chromaticenginedoesnotprovidesufficientschedulingflexibility.•Chromaticenginepresupposesavailabilityofgraphcoloringwhichmightnotalwaysbereadilyavailable.

2.TheDistributedLockingEngine usesmutualexclusionbyassociatingareaders-writerslockwitheachvertex.3.Vertexconsistency isachievedbyacquiringawrite-lockonthecentralvertexofeachrequestedscope.

4.Fullconsistencyisachievedbyacquiringwrite-locksonthecentralvertexandalladjacentvertices.5.Edgeconsistency isachievedbyacquiringawrite-lockonthecentralvertexandreadlocksonadjacentvertices.

DistributedLockingEngine(Pipelinedlocking)

•Eachmachinemaintainsapipeline ofverticesforwhichlockshavebeenrequested,butnotyetfulfilled.•Thepipeliningsystemusescallbacksinsteadofreaders/writerlockssincethelatterwouldhaltthepipeline.

•Pipeliningreduceslatencybysynchronizinglockeddataimmediatelyafteramachinecompletesitslocallock.

Chandy-LamportSnapshotAlgorithm

FaultTolerance•UsingadistributedcheckpointmechanismcalledSnapshotUpdate faulttoleranceisintroducedinGraphLab.

•SnapshotUpdatecanbedeployedsynchronouslyorasynchronously.

•Asynchronoussnapshotsaremoreefficientandguaranteeconsistentsnapshotunderthefollowingconditions:a)Edgeconsistencyisusedonallupdatefunctions.b)Schedulecompletesbeforethescopeisunlocked.c)SnapshotUpdateisprioritizedoverotherupdates.

Synchronoussnapshothavethe“flatline”whereasasynchronoussnapshotsallowcomputationtoproceed.

Systemdesign● IntheInitializationphase theatomfilerepresentationofdatagraphis

constructed.● IntheGraphLabExecutionPhase atomfilesareassignedtoindividual

executionenginesfromtheDFS.

Systemdesign(LockingEngineDesign)•PartitionofdistributedgraphmanagedwithinLocalGraphstorage.•Cacheusedtoprovideaccesstoremotedata.•SchedulermanagesverticesinTassignedtotheprocess.•Eachblockmakesuseofblockbelowit.

Applications

● NetflixMovieRecommendation● VideoCo-segmentation(Coseg)● NamedEntityRecognition(NER)

NetflixMovieRecommendation● Itmakesuseofcollaborativefilteringtopredict

themovieratingsforeachuserbasedontheratingsofsimilarusers.

● Thealternatingleastsquares(ALS)algorithmisusedtoiterativelycomputealow-rankmatrixfactorization.

● ThesparsematrixR definesabipartitegraphconnectingeachuserwiththemoviesthattheyrated.

● Vertices areusers(rowsU)andmovies(columnsV)andedges containtheratingsforauser-moviepair.

● GraphLabupdatefunctionpredictsratings(edge-values).

•ALSrotatesbetweenfixingoneoftheunknownsuiorvj.Whenoneisfixedtheothercanbecomputedbysolvingthe least-squaresproblem. TheALSalgorithmisas:

•R={rij }nu ×nv isuser-moviematrixwhereeachitemRijrepresentstheratingscoreofitemjbyuseriwhereri,j =<ui,vj>∀i,j.

•Urepresenttheuserfeaturematrix andVrepresentthemoviefeaturematrix.•Dimensionofthefeaturespace(d) isasystemparameterthatisdeterminedbyahold-outdatasetorcross- validation.• Thelowrankapproximationproblem isthusformulatedasfollowstolearnthefactorvectors(ui,vj):

Wherepi,j =<ui,vj >isthepredictedrating,λistheregularizationcoefficientandKistheSetofknownratingsfromtheSparsematrixR.

NetflixScalingwithIntensity•Plottedisthespeedupachievedforvaryingvaluesofdimensionality(d).•Extrapolatingtoobtainthetheoreticallyoptimalruntime,theestimatedoverheadofDistributedGraphLabat64machinesis12xford=5and4.9xford=100.

NetflixComparisons•GraphLabimplementationwascomparedagainstHadoopandMPIusingbetween4to64machines.

•GraphLabperformsbetween40-60timesfasterthanHadoop.•ItalsoslightlyoutperformedtheoptimizedMPIimplementation.

VideoCo-segmentation(Coseg)•Videoco-segmentationautomaticallyidentifiesandclustersspatio-temporalsegmentsofvideothatsharesimilartextureandcolorcharacteristics.

•Framesofhigh-resolutionvideoarepre-processedbycoarseningeachframetoaregulargridofrectangularsuper-pixels.

•TheCoSegalgorithmpredictsthebestlabel(e.g.,sky,building,grassetc.)foreachsuperpixelusingGaussianMixtureModel(GMM)inconjunctionwithLoopyBeliefPropagation(LBP).

• ThetwoalgorithmsarecombinedtoformanExpectation-MaximizationproblemalternatingbetweenLBPtocomputethelabelforeachsuper-pixelgiventheGMMandthenupdatingtheGMMgiventhelabelsfromLBP.

•TheGraphLabupdatefunctionexecutestheLBPlocaliterativeupdatewhereupdatesexpectedtochangevaluessignificantlyare prioritized.TheBPupdatefunctionisas:

•Thelockingengineprovidesnearlyoptimalweakscaling:theruntimedoesnotincreasesignificantlyasthesizeofthegraphincreasesproportionatelywiththenumberofmachines.

•Itwasalsoobservedthatincreasingthepipelinelengthincreasedperformancesignificantlyandcompensatedforpoorpartitioning.

NamedEntityRecognition(NER)• NamedEntityRecognitionisthetaskofdeterminingthetype(e.g.,Person,Place,orThing)ofanoun-phrase (e.g.Obama,Chicago,orCar)fromitscontext (e.g.“President..”,“Livesnear..”,or“boughta..”).•TheDatagraphforNERisbipartitewithonesetofverticescorrespondingtonoun-phrasesandtheothertocontexts.•TheCoEMvertexprogramupdatesestimateddistributionforavertex(eithernoun-phraseorcontext)basedonthecurrentdistributionforneighboringvertices.

•BelowisCoEMalgorithminwhichadjacentverticesarerescheduled,ifthetypedistributionsatavertexchangesbymorethansomepredefinedthreshold.

NERComparisons•GraphLabimplementationofNERachieved20-30xspeedupoverHadoopandwascomparabletotheoptimizedMPI.•GraphLabscaledpoorlyachievingonlya3ximprovementusing16xmoremachines,majorlyduetolargevertexdatasize,denseconnectivity,andpoorpartitioning.

Comparison(Netflix/CoSeg/NER)

Comparison(Netflix/CoSeg/NER)

•Overallnetworkutilization:NetflixandCoSeghaveverylowbandwidthrequirementswhileNERappearstosaturatewhen#machines>24.•Snapshotoverhead:Overheadofperformingacompletesnapshotofthegraphevery|V|updatesishighestforCoSeg,whenrunningona64machinecluster.

EC2CostEvaluation

• Theprice-runtimecurves(log-logscale)forGraphLabandHadoopillustratethecostofdeployingeithersystem.

•FortheNetflixapplication,GraphLabisabouttwoordersofmagnitudemorecost-effectivethanHadoop.

Conclusion•Inthepaperwetalkedabout:

•RequirementofMLDMalgorithms.•Graphlabextendedtothedistributedsettingby:

•Relaxingtheschedulerequirements•Introducinganewdistributeddata-graph•Introducingnewexecutionengines•Introducingfaulttolerance.

•DistributedGraphlaboutperformsHadoopby20-60xandiscompetitivewithtailoredMPIimplementations.

FutureWork• Extendingabstractionandruntimetosupportdynamicallyevolvinggraphsandexternalstorageingraphdatabases.•FurtherresearchintotheoryandapplicationofDynamicasynchronousgraphparallelcomputationthushelpingindefiningemergingfieldofbiglearning.

References•Y.Low,J.Gonzalez,A.Kyrola,D.Bickson,C.Guestrin,andJ.M.Hellerstein.Graphlab:Anewparallelframeworkformachinelearning.InUAI,pages340–349,2010.

•Y.Low,J.Gonzalez,A.Kyrola,D.Bickson,C.Guestrin,andJ.M.Hellerstein.DistributedGraphlab:AFrameworkforMachineLearningandDataMiningintheCloud.

•Thesis:ParallelandDistributedSystemsforProbabilisticReasoningJosephGonzalez,CMU-ML-12-111,December21,2012.

•GraphLabpptbyYuchengLow,JosephGonzalez,AapoKyrola,DannyBickson,CarlosGuestrin.

•Y.Zhou,D.Wilkinson,R.Schreiber,andR.Pan.Large-scaleparallelcollaborativefilteringforthenetflixprize.InAAIM,pages337–348,2008.

•ChristopherR.Aberger.Recommender:AnAnalysisofCollaborativeFilteringTechniques.•GraphLabcreate:http://graphlab.org.•CS-425DistributedSystemsppt:Chandy-LamportSnapshotAlgorithmandMulticastCommunicationbyKlaraNahrstedt

• CSE547GraphParallelProblemsSynchronousvsAsynchronousComputationpdfbyEmilyFox.