Information is hierarchical in nature. Humans naturally see the world in terms of objects, made of objects, made of objects, but ML algorithms do not operate like that, and it is difficult for them to properly recognize objects, especially in a complex scene. GPT4V changes all that and can produce an exhaustive list of beliefs about the objects in an image, their relationship, but also the objective and conditions in which such relationship happens. E.g. a woman uses sunglasses to protect her eyes in bright daylight. GPT4 is then used to extract accurate fields of information from GPT4V-produced beliefs, such as the subject, object, action, objective and condition in which such action takes place. The results are quite impressive. The information is then sent to Neo4J to visualize it as a knowledge graph.
Category tags: