ChatGPT Evolves: Generative AI Chatbot Now Understands Voice and Images

Technology Written by
ChatGPT Evolves: Generative AI Chatbot Now Understands Voice and Images

ChatGPT Evolves: Generative AI Chatbot Now Understands Voice and Images

OpenAI”s ChatGPT has recently received a significant upgrade, introducing new features that enhance its usability. The large language model chatbot can now interact with users through voice inputs and respond to picture commands, making interactions more natural and intuitive.

These new capabilities are part of OpenAI”s efforts to provide a more versatile interface for users. While these enhancements are undoubtedly exciting, it”s important to note that they are not available to free users. OpenAI is rolling out the voice and image features in ChatGPT exclusively for Plus and Enterprise users. This rollout will take place over the next two weeks, and developers can also expect to access these features soon.

With the new voice input feature, users can engage in dynamic conversations with ChatGPT using their voice. Whether it”s a casual chat on the go, requesting a bedtime story for the family, or settling a debate at the dinner table, voice interactions bring a new level of convenience.

To utilise the image feature, users can simply tap the photo button to capture or select an image. On iOS or Android, users should tap the plus button first. This feature allows users to discuss multiple images or use the chatbot”s drawing tool to guide interactions. ChatGPT”s image understanding capabilities are powered by multimodal models like GPT-3.5 and GPT-4, which apply their language reasoning skills to a variety of visual content, including photos, screenshots, and documents containing both text and images.

To get started with voice interactions, users can navigate to Settings > Features in the mobile app and opt into voice conversations. They can then tap the headphone button on the home screen”s top-right corner and choose from five different available voices. This voice capability is supported by a new text-to-speech model, which can generate human-like audio from text and short speech samples. OpenAI has collaborated with professional voice actors to create these voices and utilises its open-source speech recognition system, Whisper, to transcribe spoken words into text.

Additionally, the image recognition feature adds a visual dimension to interactions. Users can easily capture or select images, discuss visual content, and even utilize a drawing tool to enhance their communication. This functionality is powered by advanced multimodal models, ensuring a seamless integration of language and visual reasoning.

To initiate voice interactions, users can opt-in through the mobile app settings, choosing from a selection of distinct voices generated by a sophisticated text-to-speech model. OpenAI”s collaboration with professional voice actors and the use of its open-source speech recognition system, Whisper, guarantee high-quality voice interactions. These enhancements mark a significant step in ChatGPT”s evolution, offering users a more immersive and intuitive experience. As ChatGPT continues to evolve, it holds the potential to become an even more indispensable tool for a wide range of applications.