This article describes a strategy to produce artificial general intelligence (AGI) with what might be called weak semantic correlation with image-text pairs. Traditional models seek strong correlation, and are therefore based on feature detection mechanisms that can be organized into (say) sentences. This requires a lot of overhead for limited capacity; "only millions of image-caption pairs are collected by years of human annotation"). Instead, "since there do not necessarily exist fine-grained region-word matches between image and text modalities, we drop the time-consuming object detectors." Without this overhead, the system is able to find patterns for vague concepts (eg., 'nature' or 'time') and find non-obvious and non-explicit between things, like metaphors and analogies.
Today: 1 Total: 107 [Share]
] [