Computational Construction Grammar for Visual Question Answering

determined-noun-phrase-unit-42

resulting structure

the-11

super-nominal-unit-14

Meaning:

Example 2

The first example is still rather small, containing only five predicates. In this example, we take a more complex question: "What number of red cubes are the same size as the blue ball?".

Comprehending "what number of red cubes are the same size as the blue ball"

Applying
FCG CONSTRUCTION SET (170)
in comprehension

initial structure

application process

* ball-morph-cxn (morph 0.50), cubes-morph-cxn (morph 0.50)

* size-lex-cxn (lex 0.50), red-lex-cxn (lex 0.50 red *color), cube-lex-cxn (lex 0.50 cube *shape), blue-lex-cxn (lex 0.50 blue *color), sphere-lex-cxn (lex 0.50 sphere *shape)

9, 8.00: base-nominal-cxn (nom 0.50)

* base-nominal-cxn (nom 0.50), nominal-cxn (nom 0.50), base-nominal-cxn (nom 0.50), nominal-cxn (nom 0.50)

23, 12.00: unique-determined-cxn (cxn 0.50)

* the-same-t-as-relate-cxn (cxn 0.50), what-number-of-x-are-cxn (cxn 0.50)

28, 15.00: same-relate-count-cxn (cxn 0.50 t)

18, 12.00: the-same-t-as-compare-cxn (cxn 0.50)

19, 13.00: unique-determined-cxn (cxn 0.50)

20, 14.00: what-number-of-x-are-cxn (cxn 0.50)

13, 12.00: the-same-t-cxn (cxn 0.50)

14, 13.00: unique-determined-cxn (cxn 0.50)

15, 14.00: what-number-of-x-are-cxn (cxn 0.50)

applied constructions

ball-morph-cxn (morph 0.50)

cubes-morph-cxn (morph 0.50)

size-lex-cxn (lex 0.50)

blue-lex-cxn (lex 0.50 blue *color)

sphere-lex-cxn (lex 0.50 sphere *shape)

* things-morph-cxn (morph 0.50), balls-morph-cxn (morph 0.50)

... and 5 more

resulting structure

Meaning:

Example 3

In this third example, we show that the meaning representation does not necessarily have to be a sequence of predicates. It can also be a tree-structure, as demonstrated by the question "Are there an equal number of blue things and green balls?".

Comprehending "are there an equal number of blue things and green balls"

Applying
FCG CONSTRUCTION SET (170)
in comprehension

initial structure

application process

* sphere-lex-cxn (lex 0.50 sphere *shape), green-lex-cxn (lex 0.50 green *color), blue-lex-cxn (lex 0.50 blue *color), thing-lex-cxn (lex 0.50 thing *thing)

8, 7.00: base-nominal-cxn (nom 0.50)

* base-nominal-cxn (nom 0.50), nominal-cxn (nom 0.50), base-nominal-cxn (nom 0.50), nominal-cxn (nom 0.50)

12, 11.00: an-equal-number-cxn (cxn 0.50)

13, 12.00: compare-equal-countable-cxn (cxn 0.50 t)

applied constructions

things-morph-cxn (morph 0.50)

balls-morph-cxn (morph 0.50)

sphere-lex-cxn (lex 0.50 sphere *shape)

green-lex-cxn (lex 0.50 green *color)

blue-lex-cxn (lex 0.50 blue *color)

thing-lex-cxn (lex 0.50 thing *thing)

* large-morph-cxn (morph 0.50), left-of-morph-cxn (morph 0.50), thing-morph-cxn (morph 0.50), small-morph-cxn (morph 0.50), cylinder-morph-cxn (morph 0.50), cube-morph-cxn (morph 0.50), metal-morph-cxn (morph 0.50)

resulting structure

super-nominal-unit-17

nominal-unit-21

balls-2

green-2

super-nominal-unit-18

Meaning:

Example 4

There are many different question types in the CLEVR dataset. In this fourth and final example, we demonstrate yet another type of question: "Do the large metal cube left of the red thing and the small cylinder have the same color?".

Comprehending "do the large metal cube left of the red thing and the small cylinder have the same color"

Applying
FCG CONSTRUCTION SET (170)
in comprehension

initial structure

application process

* small-lex-cxn (lex 0.50 small *size), metal-lex-cxn (lex 0.50 metal *material), cylinder-lex-cxn (lex 0.50 cylinder *shape), color-lex-cxn (lex 0.50), large-lex-cxn (lex 0.50 large *size), thing-lex-cxn (lex 0.50 thing *thing), red-lex-cxn (lex 0.50 red *color), cube-lex-cxn (lex 0.50 cube *shape), left-lex-cxn (lex 0.50 left *relation)

19, 17.00: base-nominal-cxn (nom 0.50)

18, 17.00: base-nominal-cxn (nom 0.50)

17, 17.00: base-nominal-cxn (nom 0.50)

* base-nominal-cxn (nom 0.50), base-nominal-cxn (nom 0.50)

25, 20.00: nominal-cxn (nom 0.50)

24, 20.00: nominal-cxn (nom 0.50)

23, 20.00: nominal-cxn (nom 0.50)

26, 21.00: nominal-cxn (nom 0.50)

* nominal-cxn (nom 0.50), nominal-cxn (nom 0.50)

32, 24.00: the-same-t-cxn (cxn 0.50)

33, 25.00: unique-determined-cxn (cxn 0.50)

* unique-determined-cxn (cxn 0.50), unique-determined-cxn (cxn 0.50), base-relate-cxn (cxn 0.50)

40, 29.00: compare-do-cxn (cxn 0.50 t)

37, 26.00: unique-determined-cxn (cxn 0.50)

34, 25.00: unique-determined-cxn (cxn 0.50)

35, 25.00: unique-determined-cxn (cxn 0.50)

30, 22.00: nominal-cxn (nom 0.50)

27, 21.00: nominal-cxn (nom 0.50)

28, 21.00: nominal-cxn (nom 0.50)

21, 18.00: base-nominal-cxn (nom 0.50)

applied constructions

large-morph-cxn (morph 0.50)

left-of-morph-cxn (morph 0.50)

thing-morph-cxn (morph 0.50)

small-morph-cxn (morph 0.50)

cylinder-morph-cxn (morph 0.50)

metal-morph-cxn (morph 0.50)

small-lex-cxn (lex 0.50 small *size)

metal-lex-cxn (lex 0.50 metal *material)

cylinder-lex-cxn (lex 0.50 cylinder *shape)

... and 19 more

resulting structure

Meaning:

III. Formulation

Fluid Construction Grammar is a bidirectional formalism. It allows to map utterances to meanings, but also meanings to utterances. This is also true for the CLEVR Grammar. In this section, we demonstrate this formulation process. What is noteworthy, because of the design of the CLEVR dataset, is that multiple questions map to the same semantic representation, e.g. the questions "What material is the red cube?" and "There is a red cube; what is its material?". Using FCG's 'formulate-all' operation, which explores the entire search space instead of stopping at the first solution, we can show all different questions obtained from a single input meaning. We demonstrate this using the semantic representation of the example sentence "What material is the red cube?"

Formulating (all solutions)

Computing all solutions for application of
FCG CONSTRUCTION SET (170)
in formulation

initial structure

application process

* red-lex-cxn (lex 0.50 red *color), material-lex-cxn (lex 0.50), cube-lex-cxn (lex 0.50 cube *shape)

* base-nominal-cxn (nom 0.50), nominal-cxn (nom 0.50)

41, 6.00: unique-determined-cxn (cxn 0.50)

* what-is-the-t-of-cxn (cxn 0.50), hop-query-property-cxn (cxn 0.50 t)

48, 9.00: block-morph-cxn (morph 0.50)

49, 7.00: has-what-t-cxn (cxn 0.50)

50, 8.00: cube-morph-cxn (morph 0.50)

51, 7.00: hop-x-is-made-of-what-cxn (cxn 0.50 t)

52, 8.00: cube-morph-cxn (morph 0.50)

34, 6.00: what-t-is-cxn (cxn 0.50)

36, 7.00: unique-determined-cxn (cxn 0.50)

37, 8.00: hop-what-material-is-x-made-of-cxn (cxn 0.50 t)

38, 9.00: block-morph-cxn (morph 0.50)

39, 8.00: hop-query-property-cxn (cxn 0.50 t)

40, 9.00: block-morph-cxn (morph 0.50)

30, 6.00: what-t-is-it-cxn (cxn 0.50)

32, 7.00: unique-determined-cxn (cxn 0.50)

33, 8.00: cube-morph-cxn (morph 0.50)

18, 6.00: unique-declared-cxn (cxn 0.50)

* what-t-is-it-cxn (cxn 0.50), hop-query-property-anaphoric-cxn (cxn 0.50 t)

23, 9.00: block-morph-cxn (morph 0.50)

24, 7.00: what-t-is-cxn (cxn 0.50)

25, 8.00: block-morph-cxn (morph 0.50)

26, 7.00: what-is-the-t-of-cxn (cxn 0.50)

27, 8.00: block-morph-cxn (morph 0.50)

28, 7.00: has-what-t-cxn (cxn 0.50)

29, 8.00: block-morph-cxn (morph 0.50)

13, 6.00: is-what-t-cxn (cxn 0.50)

14, 7.00: unique-declared-cxn (cxn 0.50)

15, 8.00: cube-morph-cxn (morph 0.50)

16, 7.00: unique-determined-cxn (cxn 0.50)

17, 8.00: cube-morph-cxn (morph 0.50)

7, 6.00: what-is-its-t-cxn (cxn 0.50)

* unique-declared-cxn (cxn 0.50), hop-query-property-anaphoric-cxn (cxn 0.50 t)

10, 9.00: block-morph-cxn (morph 0.50)

11, 7.00: unique-determined-cxn (cxn 0.50)

12, 8.00: cube-morph-cxn (morph 0.50)

solution 1

applied constructions

what-is-its-t-cxn (cxn 0.50)

unique-declared-cxn (cxn 0.50)

hop-query-property-anaphoric-cxn (cxn 0.50 t)

determined-noun-phrase-unit-54

resulting structure

declaration-unit-15

solution 2

applied constructions

unique-declared-cxn (cxn 0.50)

what-t-is-it-cxn (cxn 0.50)

hop-query-property-anaphoric-cxn (cxn 0.50 t)

determined-noun-phrase-unit-58

resulting structure

solution 3

applied constructions

hop-what-material-is-x-made-of-cxn (cxn 0.50 t)

determined-noun-phrase-unit-62

resulting structure

determiner-unit-18

solution 4

applied constructions

determined-noun-phrase-unit-62

resulting structure

determiner-unit-18

nominal-unit-30

cube-unit-2

red-unit-2

solution 5

applied constructions

what-is-the-t-of-cxn (cxn 0.50)

determined-noun-phrase-unit-63

resulting structure

solution 6

applied constructions

hop-x-is-made-of-what-cxn (cxn 0.50 t)

determined-noun-phrase-unit-63

resulting structure

determiner-unit-19

1, 1.00: cube-morph-cxn (morph 0.50)

Utterances:

"there is a red block ; what is its material"
"there is a red block ; what material is it"
"what material is the red block made of"
"what material is the red block"
"what is the material of the red block"
"the red cube is made of what material"

The 'formulate-all' operation for this example returns six possible questions. In theory, there are many more possible solutions because of the large amount of synonyms for the different nouns and adjectives. A cube can also be a block, metal can also be shiny and so on. To avoid an enormous search space, the CLEVR Grammar was configured to only explore the grammatical variation and ignore this lexical variation.

IV. Operational VQA system

In order to have a fully operational VQA system, the semantic representation that is the result of FCG's comprehension operation needs to be executed. In particular, it needs to be executed on a specific scene of objects. For this, we use a procedural semantics framework called Incremental Recruitment Language (IRL). In this section, we demonstrate both the comprehension process of an input question and the subsequent execution process of the resulting semantic representation on a scene of objects. The scene of objects is shown in the image below.

Comprehending "what material is the red cube"

Applying
FCG CONSTRUCTION SET (170)
in comprehension

initial structure

application process

* cube-lex-cxn (lex 0.50 cube *shape), red-lex-cxn (lex 0.50 red *color), material-lex-cxn (lex 0.50)

* base-nominal-cxn (nom 0.50), nominal-cxn (nom 0.50)

* unique-determined-cxn (cxn 0.50), what-t-is-cxn (cxn 0.50)

9, 9.00: hop-query-property-cxn (cxn 0.50 t)

applied constructions