As part of Google’s vision to build a new generation of Artificial Intelligence (AI) models, inspired by the way people understand and interact with the world, Google on Wednesday introduced Gemini, the most capable and general model we’ve ever built, as per the multinational technology company renowned for its significant influence and dominance in the tech industry.
Gemini, according to the tech giant, is the first model to outperform human experts on MMLU (Massive Multitask Language Understanding), one of the most popular methods to test the knowledge and problem solving abilities of AI models.
According to Google documents, Gemini comes in three sizes; Ultra (Google claims it as the most capable and largest model for highly-complex tasks), Pro (according to Google, it’s their best model for scaling across a wide range of tasks) and Nano (Google places it as the most efficient model for on-device tasks).
Known for its innovation, Google invests heavily in emerging technologies such as artificial intelligence, machine learning, and quantum computing. The company”s influence extends beyond technology into areas like digital advertising, where it plays a pivotal role in shaping online experiences.
“Gemini is the result of large-scale collaborative efforts by teams across Google, including our colleagues at Google Research. It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video,” says Demis Hassabis, CEO and co-founder of Google DeepMind, a subsidiary of Alphabet Inc., Google”s parent company, and is known for its cutting-edge work in the field of machine learning and AI research.
“We designed Gemini to be natively multimodal, pre-trained from the start on different modalities. Then we fine-tuned it with additional multimodal data to further refine its effectiveness. This helps Gemini seamlessly understand and reason about all kinds of inputs from the ground up, far better than existing multimodal models — and its capabilities are state of the art in nearly every domain,” Hassabis added.