Most of the AI people have been talking about recently has been based on language learning. These models are notoriously bad at common sense concepts such as the basic laws of physics. This article discusses an AI model trained not on language but on video to see whether it develops a common sense understanding of what might be called intuitive physics - "our basic grasp of how the physical world works. We expect objects to behave predictably—they don't suddenly appear or disappear, move through solid barriers, or arbitrarily change their shape or color." It discusses two main approaches: structured models "suggesting humans have innate 'core knowledge' systems (and) pixel-based generative models." It proposes a 'middle ground' model, V-JEPA, which "consistently and accurately distinguished between physically plausible and implausible videos."
Today: Total: [Share]
] [