Current content moderators can process either text or images, but not both. Separate applications are needed to moderate multiple content types. Also, the moderators produce binary output that doesn’t explain why certain content is not appropriate. Our solution is the multimodal moderator. One application that can check if text or image is appropriate or not. It can “understand” the message and explain why certain content is not appropriate. The technology we used is Zapier with Discord as the front end. Zapier is a no-code solution to connect with an AI model, and does not require a constantly running program to connect with Discord. GPT-4 Vision hosted on Clarifai platform is the engine that can check if text or image is appropriate. It sends back a narrative to explain the reasoning.
Category tags: