Fcv poster parikh

1
Image Binary descrip0ons Rela0ve descrip0ons not natural, not open, perspec0ve more natural than tallbuilding; less natural than forest; more open than tallbuilding; less open than coast; more perspec0ve than tallbuilding; not natural, not open, perspec0ve more natural than insidecity; less natural than highway; more open than street; less open than coast; more perspec0ve than highway; less perspec0ve than insidecity natural, open, perspec0ve more natural than tallbuilding; less natural than mountain; more open than mountain; less perspec0ve than opencountry; White, not Smiling, VisibleForehead more White than AlexRodriguez; more Smiling than JaredLeto; less Smiling than ZacEfron; more VisibleForehead than JaredLeto; less VisibleForehead than MileyCyrus White, not Smiling, not VisibleForehead more White than AlexRodriguez; less White than MileyCyrus; less Smiling than HughLaurie; more VisibleForehead than ZacEfron; less VisibleForehead than MileyCyrus not Young, BushyEyebrows, RoundFace more Young than CliveOwen; less Young than ScarleMJohansson; more BushyEyebrows than ZacEfron; less BushyEyebrows than AlexRodriguez; more RoundFace than CliveOwen; less RoundFace than ZacEfron 2. Learning Rela-ve A0ributes Categorical (binary) aMributes are restric0ve and can be unnatural Rela-ve A0ributes Devi Parikh (TTIC) and Kristen Grauman (UT Aus0n) 6. Datasets Proposed idea: Rela-ve A0ributes Richer communica0on between humans and machines Describe images or categories rela0vely e.g. “dogs are furrier than giraffes”, “find less congested downtown Chicago scene than Learn a ranking func0on for each aMribute 1. Main Idea 4. Rela-ve Zeroshot Learning Mo-va-on: Natural Not Natural ? Smiling Not Smiling ? Enables new applica-ons Novel zeroshot learning from aMribute comparisons Precise automa0cally generated textual descrip0ons of images 5. Describing Images Rela-vely { } { } , ... }, ( ) { , ... For each aMribute r m (x i )= w T m x i Learn a scoring func0on (i, j ) O m : w T m x i > w T m x j (i, j ) S m : w T m x i = w T m x j that best sa0sfies constraints: Supervision is a m , S m : O m : Maxmargin learning to rank formula-on Young: Smiling: Learnt rela-ve a0ributes Rela-ve a0ributes space Training: Images from S seen and descrip0ons of U unseen categories Tes0ng: Categorize image into N (=S+U) categories Young: H C S Z M Unseen categories Smiling: Z M Need not use all aMributes Need not relate to all S Smiling Youth S H Z C M Auto generate textual descrip-on of: Density: Learnt rela-ve a0ributes 3. Ranking Func-on vs. Binary Classifier Score min 1 2 ||w T m || 2 2 + C ξ 2 ij + γ 2 ij s.t w T m (x i x j ) 1 ξ ij , (i, j ) O m ; ξ 0; γ 0 % correctly ordered pairs Classifier Ranker Outdoor scenes 80% 89% Celebrity faces 67% 82% Rela-ve a0ributes space Density 1/8 dataset “more dense than , less dense than CCH H H C F HH M F F IF “more dense than Highways, less dense than ForestsBinary Relative OSR TI S HC OMF natural 00001111 TISHCOMF open 00011110 TFISMHCO perspective 11110000 OCMFHIST large-objects 11100000 FOMISHCT diagonal-plane 11110000 FOMCISHT close-depth 11110001 CMOTISHF PubFig ACHJ MS V Z Masculine-looking 11110011 SMZVJAHC White 01111111 ACHZJSMV Young 00001101 VHCJASZM Smiling 11101101 JVHACSZM Chubby 10000000 VJHCZMSA Visible-forehead 11101110 JZMSACHV Bushy-eyebrows 01010000 MSZVHACJ Narrow-eyes 01100011 MJSAHCVZ Pointy-nose 00100001 ACJMVSZH Big-lips 10001100 HJVZCMAS Round-face 10001100 HVJCZASM 0 1 2 3 4 5 0 20 40 60 Accuracy # unseen categories 0 1 2 3 4 5 0 20 40 60 Accuracy # unseen categories DAP SRA Proposed 1 2 5 15 0 20 40 60 Accuracy # labeled pairs 1 2 5 15 0 20 40 60 Accuracy # labeled pairs DAP SRA Proposed 6 5 4 3 2 1 0 20 40 60 Accuracy # att to describe unseen 11 10 9 8 7 6 5 4 3 2 1 0 20 40 60 Accuracy # att to describe unseen DAP SRA Proposed 1 2 3 0 20 40 60 Accuracy Looseness of constraints 1 2 3 0 20 40 60 Accuracy Looseness of constraints DAP SRA Proposed 1 2 3 0 20 40 60 80 100 # top choices % correct image in top choices Binary Relative Outdoor Scene Recogni-on (OSR): 2688 images, 8 categories: coast (C), forest (F), highway (H), insidecity (I), mountain (M), opencountry (O), street (S) and tallbuilding (T), gist features; Public Figure Face (PubFig): 800 images, 8 categories: Alex Rodriguez (A), Clive Owen (C), Hugh Laurie (H), Jared Leto (J), Miley Cyrus (M), ScarleM Johansson (S), Viggo Mortensen (V) and Zac Efron (Z), gist and color features 8. Zeroshot Learning Results Whereas conven0onal Binary descrip-on: “not dense” Not dense: Dense: Rela-ve descrip-on: 7. Image Descrip-on Results w b w m Number of unseen categories Amt. of labeled data to learn a0ributes Amount of descrip-on Quality of descrip-on Baselines: Direct AMribute Predic0on (DAP) [Lampert et al. 2009] (binary) Classifier instead of ranker (SRA) ξ ij 0; γ ij 0 ; |w T m (x i x j )| γ ij , (i, j ) S m ; Adapted objec0ve from [Joachims, 2002] How do learned ranking func0ons differ from classifier outputs? Infer image category using maxlikelihood More smiling than Less smiling than More VisFHead than Less VisFHead than More chubby than Less chubby than Human subject experiment: Which image is? OSR PubFig ˆ c(x) = argmax m P (a c m |x) classical recogni0on problem binary ~ rela0ve supervision << baseline supervision can give unique ordering on all classes An aMribute is more discrimina0ve when used rela0vely Rela0ve aMributes jointly carve out space for unseen category ? ? ? Example descrip-ons

Transcript of Fcv poster parikh

Page 1: Fcv poster parikh

Image   Binary  descrip0ons   Rela0ve  descrip0ons  

not  natural,  not  open,  perspec0ve    

more  natural  than  tallbuilding;  less  natural  than  forest;  more  open  than  tallbuilding;  less  open  than  coast;  more  perspec0ve  than  tallbuilding;  

not  natural,  not  open,  perspec0ve  

more  natural  than  insidecity;  less  natural  than  highway;  more  open  than  street;  less  open  than  coast;  more  perspec0ve  than  highway;  less  

perspec0ve  than  insidecity  natural,  open,  perspec0ve  

more  natural  than  tallbuilding;  less  natural  than  mountain;  more  open  than  mountain;  less  perspec0ve  than  opencountry;    

White,  not  Smiling,  VisibleForehead  

more  White  than  AlexRodriguez;  more  Smiling  than  JaredLeto;  less  Smiling  than  ZacEfron;  more  VisibleForehead  than  JaredLeto;  less  

VisibleForehead  than  MileyCyrus  

White,  not  Smiling,  not  VisibleForehead  

more  White  than  AlexRodriguez;  less  White  than  MileyCyrus;  less  Smiling  than  HughLaurie;  more  VisibleForehead  than  ZacEfron;  less  

VisibleForehead  than  MileyCyrus  not  Young,  

BushyEyebrows,  RoundFace  

more  Young  than  CliveOwen;  less  Young  than  ScarleMJohansson;  more  BushyEyebrows  than  ZacEfron;  less  BushyEyebrows  than  AlexRodriguez;  

more  RoundFace  than  CliveOwen;  less  RoundFace  than  ZacEfron  

2.  Learning  Rela-ve  A0ributes  

Categorical  (binary)  aMributes  are  restric0ve  and  can  be  unnatural  

Rela-ve  A0ributes  Devi  Parikh  (TTIC)  and  Kristen  Grauman  (UT  Aus0n)  

6.  Datasets  Proposed  idea:  Rela-ve  A0ributes   Richer  communica0on  between  

humans  and  machines   Describe  images  or  categories  rela0vely  

e.g.  “dogs  are  furrier  than  giraffes”,  “find  less  congested  downtown  Chicago  scene  than  

 Learn  a  ranking  func0on  for  each  aMribute  

1.  Main  Idea   4.  Rela-ve  Zero-­‐shot  Learning  Mo-va-on:  

Natural   Not  Natural  ?  

Smiling   Not  Smiling  ?  

Enables  new  applica-ons   Novel  zero-­‐shot  learning  from  aMribute  

comparisons   Precise  automa0cally  generated  textual  

descrip0ons  of  images  

5.  Describing  Images  Rela-vely  { }{ },. . .∼},( ){ ,

. . .�For  each  aMribute  

rm(xi) = wTmxiLearn  a  scoring  func0on    

∀(i, j) ∈ Om : wTmxi > wT

mxj ∀(i, j) ∈ Sm : wTmxi = wT

mxj

that  best  sa0sfies  constraints:  

Supervision  is  am, Sm:Om:

Max-­‐margin  learning  to  rank  formula-on  

� ∼ �Young:   Smiling:  …  

Learnt  rela-ve  a0ributes  

Rela-ve  a0ributes  space  

 Training:  Images  from  S  seen  and  descrip0ons  of  U  unseen  categories  

 Tes0ng:  Categorize  image  into  N  (=S+U)  categories  

Young:   H  C  S  � � Z  M  �

Unseen  categories  

Smiling:   Z  M  � Need  not  use  all  aMributes   Need  not  relate  to  all  S  

Smiling  

Youth  

S  

H   Z  C  

M  

Auto  -­‐  generate  textual  descrip-on  of:  

�Density:   …  Learnt  rela-ve  a0ributes  

3.  Ranking  Func-on  vs.  Binary  Classifier  Score  

min

�1

2||wT

m||22 + C

��ξ2ij +

�γ2ij

��

s.t wTm(xi − xj) ≥ 1− ξij , ∀(i, j) ∈ Om; |wT

m(xi − xj)| ≤ γij , ∀(i, j) ∈ Sm;

ξij ≥ 0; γij ≥ 0

%  correctly  ordered  pairs   Classifier   Ranker  

Outdoor  scenes   80%   89%  

Celebrity  faces   67%   82%  

Rela-ve  a0ributes  space  

Density  

1/8  dataset  

“more  dense  than    ,  less  dense  than  

C   C   H   H   H   C   F   H   H   M   F   F   I   F  

“more  dense  than  Highways,  less  dense  than  Forests”  

”  

Binary RelativeOSR TI S HCOMF

natural 0 0 0 0 1 1 1 1 T≺I∼S≺H≺C∼O∼M∼Fopen 0 0 0 1 1 1 1 0 T∼F≺I∼S≺M≺H∼C∼O

perspective 1 1 1 1 0 0 0 0 O≺C≺M∼F≺H≺I≺S≺Tlarge-objects 1 1 1 0 0 0 0 0 F≺O∼M≺I∼S≺H∼C≺Tdiagonal-plane 1 1 1 1 0 0 0 0 F≺O∼M≺C≺I∼S≺H≺Tclose-depth 1 1 1 1 0 0 0 1 C≺M≺O≺T∼I∼S∼H∼FPubFig ACHJ MS VZ

Masculine-looking 1 1 1 1 0 0 1 1 S≺M≺Z≺V≺J≺A≺H≺CWhite 0 1 1 1 1 1 1 1 A≺C≺H≺Z≺J≺S≺M≺VYoung 0 0 0 0 1 1 0 1 V≺H≺C≺J≺A≺S≺Z≺MSmiling 1 1 1 0 1 1 0 1 J≺V≺H≺A∼C≺S∼Z≺MChubby 1 0 0 0 0 0 0 0 V≺J≺H≺C≺Z≺M≺S≺A

Visible-forehead 1 1 1 0 1 1 1 0 J≺Z≺M≺S≺A∼C∼H∼VBushy-eyebrows 0 1 0 1 0 0 0 0 M≺S≺Z≺V≺H≺A≺C≺JNarrow-eyes 0 1 1 0 0 0 1 1 M≺J≺S≺A≺H≺C≺V≺ZPointy-nose 0 0 1 0 0 0 0 1 A≺C≺J∼M∼V≺S≺Z≺HBig-lips 1 0 0 0 1 1 0 0 H≺J≺V≺Z≺C≺M≺A≺S

Round-face 1 0 0 0 1 1 0 0 H≺V≺J≺C≺Z≺A≺S≺M

0 1 2 3 4 50

20

40

60

80

Accu

racy

# unseen categories

OSR

0 1 2 3 4 50

20

40

60

Accu

racy

# unseen categories

PubFig

DAP SRA Proposed

1 2 5 150

20

40

60

Accu

racy

# labeled pairs

OSR

1 2 5 150

20

40

60

Accu

racy

# labeled pairs

PubFig

DAP SRA Proposed

6 5 4 3 2 10

20

40

60

Accu

racy

# att to describe unseen

OSR

11109 8 7 6 5 4 3 2 10

20

40

60

Accu

racy

# att to describe unseen

PubFig

DAP SRA Proposed

1 2 30

20

40

60

Accu

racy

Looseness of constraints

OSR

1 2 30

20

40

60

Accu

racy

Looseness of constraints

PubFig

DAP SRA Proposed

1 2 30

20

40

60

80

100

# top choices

% c

orre

ct im

age

in to

p ch

oice

s

BinaryRelative

  Outdoor  Scene  Recogni-on  (OSR):  2688  images,  8  categories:  coast  (C),  forest  (F),  highway  (H),  inside-­‐city  (I),  mountain  (M),  open-­‐country  (O),  street  (S)  and  tall-­‐building  (T),  gist  features;    

  Public  Figure  Face  (PubFig):  800  images,  8  categories:  Alex  Rodriguez  (A),  Clive  Owen  (C),  Hugh  Laurie  (H),  Jared  Leto  (J),  Miley  Cyrus  (M),  ScarleM  Johansson  (S),  Viggo  Mortensen  (V)  and  Zac  Efron  (Z),  gist  and  color  features  

8.  Zero-­‐shot  Learning  Results  

Whereas  conven0onal  Binary  descrip-on:   “not  dense”  

Not  dense:  Dense:  

Rela-ve  descrip-on:  

7.  Image  Descrip-on  Results  

wb wm

Number  of  unseen  categories  

Amt.  of  labeled  data  to  learn  a0ributes  

Amount  of  descrip-on  

Quality  of  descrip-on  

Baselines:     Direct  AMribute  Predic0on  (DAP)              [Lampert  et  al.  2009]  (binary)  

   Classifier  instead  of  ranker  (SRA)  

min

�1

2||wT

m||22 + C

��ξ2ij +

�γ2ij

��

s.t wTm(xi − xj) ≥ 1− ξij , ∀(i, j) ∈ Om; |wT

m(xi − xj)| ≤ γij , ∀(i, j) ∈ Sm;

ξij ≥ 0; γij ≥ 0

min

�1

2||wT

m||22 + C

��ξ2ij +

�γ2ij

��

s.t wTm(xi − xj) ≥ 1− ξij , ∀(i, j) ∈ Om; |wT

m(xi − xj)| ≤ γij , ∀(i, j) ∈ Sm;

ξij ≥ 0; γij ≥ 0

Adapted  objec0ve  from  [Joachims,  2002]  

How  do  learned  ranking  func0ons  

differ  from  classifier  outputs?  

Infer  image  category  using  max-­‐likelihood  

More  smiling  than  

Less  smiling  than  

More  VisFHead  than  

Less  VisFHead  than  

More  chubby  than  

Less  chubby  than  

Human  subject  experiment:  Which  image  is?  

OSR   PubFig  

c(x) = argmax�

m

P (acm|x)

classical  recogni0on  problem   binary  ~  rela0ve  supervision  

<<  baseline  supervision   can  give  unique  ordering  on  all  classes  

An  aMribute  is  more  discrimina0ve  when  used  rela0vely  

Rela0ve  aMributes  jointly  carve  out  space  for  unseen  category  

”  

?   ?   ?  Example  descrip-ons