Content-type: text/html Downes.ca ~ Stephen's Web ~ Towards artificial general intelligence via a multimodal foundation model

Stephen Downes

Knowledge, Learning, Community

This article describes a strategy to produce artificial general intelligence (AGI) with what might be called weak semantic correlation with image-text pairs. Traditional models seek strong correlation, and are therefore based on feature detection mechanisms that can be organized into (say) sentences. This requires a lot of overhead for limited capacity; "only millions of image-caption pairs are collected by years of human annotation"). Instead, "since there do not necessarily exist fine-grained region-word matches between image and text modalities, we drop the time-consuming object detectors." Without this overhead, the system is able to find patterns for vague concepts (eg., 'nature' or 'time') and find non-obvious and non-explicit between things, like metaphors and analogies.

Today: 44 Total: 1565 [Direct link] [Share]


Stephen Downes Stephen Downes, Casselman, Canada
stephen@downes.ca

Copyright 2024
Last Updated: May 16, 2024 10:21 p.m.

Canadian Flag Creative Commons License.

Force:yes