The system generates emotions from images, primarily intended for game NPCs so that they would react emotionally to the game environment.
The sentiment analysis model is an LSTM-based Dense Neural Network that are fed Word2Vec embeddings. The model was trained using generated data from cohere.ai using the prompt:
"I felt <emotion> when I saw <img2text>"
System Flow: img2txt -> cohere.ai generator -> text2emote -> LSTM+DenseNN
The images are passed to a CLIP Interrogator (BLIP + CLIP (ViT-32-B)) to generate text descriptions. Such text descriptions are elaborated by cohere.ai generator to generate emotional responses using the prompts:
"When I saw <img2txt output>, I felt emotions such as"