Downes.ca ~ Stephen's Web ~ Towards artificial general intelligence via a multimodal foundation model

Towards artificial general intelligence via a multimodal foundation model

Nanyi Fei, et al., Nature, Jun 03, 2022
Commentary by Stephen Downes

This article describes a strategy to produce artificial general intelligence (AGI) with what might be called weak semantic correlation with image-text pairs. Traditional models seek strong correlation, and are therefore based on feature detection mechanisms that can be organized into (say) sentences. This requires a lot of overhead for limited capacity; "only millions of image-caption pairs are collected by years of human annotation"). Instead, "since there do not necessarily exist fine-grained region-word matches between image and text modalities, we drop the time-consuming object detectors." Without this overhead, the system is able to find patterns for vague concepts (eg., 'nature' or 'time') and find non-obvious and non-explicit between things, like metaphors and analogies.

Today: 0 Total: 435 [Direct link] [Share]

View full size