After ChatGPT, Microsoft working on AI model that takes images as cues
As the war over artificial intelligence (AI) chatbots heat up, Microsoft has unveiled Kosmos-1, a new AI model that can also respond to visual cues or images, apart from text prompts or messages.
The multimodal large language model (MLLM) can help in an array of new tasks, including image captioning, visual question answering and more.
Kosmos-1 can pave the way for the next-stage beyond ChatGPT’s text prompts.
“A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context and follow instructions,” said Microsoft’s AI researchers in a paper.