Showing posts with label LLM. Show all posts
Showing posts with label LLM. Show all posts

Reliance and Meta Forge ~$97M AI Alliance to Accelerate India’s Enterprise Intelligence

Reliance and Meta Forge ~$97M AI Alliance to Accelerate India’s Enterprise Intelligence

In a bold move to reshape India’s AI landscape, Reliance Industries Ltd (RIL) has announced a ₹855 crore ( about $96.96 million USD) joint venture with Meta Platforms Inc., aimed at delivering scalable, enterprise-grade artificial intelligence solutions across sectors.

The announcement was made during the 48th Annual General Meeting of Reliance, where Chairman Mukesh Ambani unveiled a suite of AI initiatives designed to democratize intelligence for every Indian business.

Strategic Partnership: Meta’s Llama Meets Reliance’s Scale

The joint venture will leverage Meta’s open-source Llama models and combine them with Reliance’s deep domain expertise in telecom, retail, energy, and manufacturing.
  • Full-stack Platform-as-a-Service (PaaS) for Indian enterprises
  • Pre-configured AI tools for sales, customer engagement, IT operations, and finance
  • Sector-specific solutions for retail, telecom, energy, and manufacturing
“We will democratize AI for every Indian organization — from ambitious SMBs to corporates.”
Mukesh Ambani

“With Reliance’s reach and scale, we can bring AI to every corner of India.”
Mark Zuckerberg

Investment & Ownership

  • Total Investment: ₹855 crore (~$100 million)
  • Ownership Split:
    • Reliance: 70%
    • Meta: 30%
  • Independent operation with a mandate to build sovereign AI infrastructure

AI Infrastructure: Jamnagar Cloud Region

The JV complements Reliance’s newly announced Google Cloud region in Jamnagar:
  • Hosts Google’s AI hypercomputer
  • Runs entirely on Reliance’s green energy
  • Provides secure, scalable environments for generative AI development

National Impact

This partnership aligns with India’s broader AI ambitions, including the ₹10,370 crore IndiaAI Mission:
  • Empowers startups, SMBs, and corporates with affordable AI tools
  • Accelerates digital transformation across key industries
  • Supports India’s push for sovereign, ethical AI infrastructure

What’s Next
  • First suite of AI services expected by early 2026
  • Pilot programs underway in retail and telecom
  • Analysts see this as a pivotal moment in India’s tech evolution

Why LLMs Work Better with RAG—and What That Means for Enterprises

Why LLMs Work Better with RAG—and What That Means for Enterprises

LLMs have transformed how we interact with information and technology. From chatbots and content creation tools to coding assistants and research aids, these models have shown impressive capabilities across domains. However, they are not without limitations. One of the most promising solutions to these limitations is Retrieval-Augmented Generation, or RAG. When combined, LLMs and RAG offer a powerful, more accurate, and enterprise-ready AI experience.

In the article below, Soham Dutta, Principal Technologist & Founding Member at DaveAI, explains why LLMs work better with Retrieval-Augmented Generation, or RAG. 

Why LLMs Work Better with RAG—and What That Means for Enterprises
Soham Datta – DaveAI

The Limitations of Standalone LLMs

LLMs are trained on large amounts of data from the internet, books, academic papers, and more. During training, they learn to predict words and generate human-like text based on statistical patterns. But despite their language skills, these models do not truly understand facts. They cannot browse the internet, access live databases, or pull in real-time updates. Their knowledge is frozen at the time of training.

This can lead to a problem called hallucination, where the model generates incorrect or fictional information. Even when it sounds confident, it might be wrong. For example, if a user asks a financial LLM about the latest stock prices, the model cannot give an accurate answer unless it is connected to current data.

Another issue is that LLMs do not know anything specific about your organization unless that information was included in the training data. If you are a business leader hoping to use an LLM to answer questions about internal documents, customer data, or product catalogs, a standard LLM simply cannot help unless that information is added through other means.

What is Retrieval-Augmented Generation (RAG)?

RAG is a method that helps LLMs provide better, more reliable answers by adding a retrieval step before generating a response. When a user asks a question, the system first searches a connected knowledge base, like internal company documents or a web database. It then retrieves the most relevant pieces of information and feeds them to the LLM, along with the original query.

This combination allows the LLM to generate a response that is both fluent and accurate. Instead of guessing, the model uses real, retrieved content as its base. This method greatly reduces hallucination and helps the model stay grounded in the latest available facts.

For example, if a company uses RAG to connect its LLM to a database of technical manuals, the AI assistant can provide accurate support based on those manuals. If the company updates a policy document, the LLM can reflect those updates immediately because it fetches the content at the time of the query, not from a static memory.

How RAG Enhances LLMs for Business Use

Enterprises are quickly realizing that the combination of RAG and LLMs creates smarter, more practical solutions for real-world use cases. With this pairing, businesses can offer AI assistants that understand natural language and also access company-specific knowledge.

In customer service, a RAG-enabled chatbot can answer questions by searching up-to-date FAQs, support tickets, or policy documents. This allows the company to offer detailed responses without training the model on every possible question. In marketing, a content generation tool can pull from brand guidelines or campaign briefs to generate on-brand content every time.

Sales teams can benefit as well. Instead of digging through scattered CRM records or pricing sheets, they can ask a smart assistant to retrieve the latest client data and generate a tailored email. Legal teams can scan contracts or compliance documents through natural queries. Engineers can find product specs or configuration settings without reading long manuals.

Enterprise-focused platforms like DaveAI are already demonstrating how LLMs paired with real-time data retrieval can transform product discovery and guided selling across digital channels.

By making enterprise data accessible through natural language, LLMs with RAG reduce the time spent searching for information and increase the accuracy of business decisions.

Benefits for Enterprise Adoption

The biggest benefit of RAG is that it makes AI systems more trustworthy. Enterprises cannot rely on hallucinated or out-of-date information. With RAG, they can control the source of truth. This improves user trust and opens the door for adoption across departments. RAG also supports real-time updates. If an organization adds new documents or changes an internal process, the system reflects those changes immediately. There is no need to retrain the LLM or wait for future versions. This creates a dynamic, living knowledge environment.

Scalability is another key advantage. RAG allows companies to use one central model while connecting it to different data sources for various use cases. Whether it is HR, finance, or operations, each department can maintain its own knowledge base, while the model serves as a unified language interface. In terms of security, RAG systems can be designed to respect internal access controls. Only authorized users can query sensitive information, and audit logs can track who accessed what. This level of control is important for industries like finance, healthcare, and law, where compliance matters.

Finally, RAG improves personalization. A model can retrieve user-specific documents, emails, or records to tailor responses. This leads to more helpful interactions and a smoother user experience.

Implementation Challenges and Future Outlook

While the benefits are significant, setting up a RAG system is not without effort. First, businesses need to prepare their data. This includes converting documents into machine-readable formats and splitting them into smaller chunks that the model can process. Organizing this data into a searchable vector database is essential. Next comes integration. The retrieval engine, LLM, and user interface must be connected in a seamless pipeline. Tools like LangChain, Haystack, and commercial platforms like OpenAI’s API or Google’s Vertex AI are making this easier, but it still requires technical planning.

Performance is another consideration. Retrieving documents and generating a response takes time, so systems need to be optimized for low latency. Techniques like caching frequent queries and indexing relevant documents can help improve speed. Despite these challenges, the trend is clear. More and more companies are investing in RAG-based solutions because the payoff is strong. As generative AI continues to grow, RAG will be a key part of making it usable, safe, and valuable in enterprise environments.

Conclusion

LLMs are a powerful step forward in language technology, but they reach their full potential when paired with Retrieval-Augmented Generation. RAG gives LLMs the ability to access live, reliable, and domain-specific information. For enterprises, this means better accuracy, real-time relevance, and smarter decision-making across functions. While implementation takes planning, the combination of LLM and RAG is quickly becoming a cornerstone of modern AI strategy. Businesses that adopt this approach early will be better positioned to lead in the AI-driven future.

Meta & Cerebras Unleash AI Speed—18x Faster Than GPU-based Solutions

Meta & Cerebras Unleash AI Speed—18x Faster Than GPU-based Solutions

Meta has officially teamed up with Cerebras Systems to supercharge its Llama API, delivering inference speeds up to 18 times faster than traditional GPU-based solutions. This move positions Meta to compete directly with OpenAI, Anthropic, and Google in the AI inference market, where developers purchase tokens to power their applications.

Cerebras Systems is a cutting-edge Al hardware company specializing in wafer-scale computing, designed to accelerate deep learning and Al inference. Their Wafer-Scale Engine (WSE) is the largest semiconductor chip ever built, offering unprecedented speed and efficiency compared to traditional GPUs.

The Cerebras system enables over 2,600 tokens per second for Llama 4 Scout, compared to 130 tokens per second for ChatGPT and 25 tokens per second for DeepSeek. This speed boost unlocks real-time AI applications, including low-latency voice systems, interactive code generation, and instant multi-step reasoning.

This collaboration positions Cerebras as a major player in Al infrastructure, challenging Nvidia's dominance in Al hardware.

Meta’s shift from just providing open-source models to offering a full-service AI infrastructure marks a significant strategic evolution.

Meta’s partnership with Cerebras Systems could significantly reshape AI development. For an instance, with over 2,600 tokens per second, this collaboration enables real-time AI applications that were previously impractical. Developers can now build low-latency voice assistants, interactive code generation tools, and instant multi-step reasoning systems.

Traditional AI inference relies heavily on GPUs, but Cerebras’ Wafer-Scale Engine offers an alternative that could challenge Nvidia’s dominance in AI hardware. This shift might encourage more companies to explore custom AI chips for efficiency gains.

For an uninitiated, AI inference is the process where a trained AI model applies its learned knowledge to make predictions or decisions on new data. It’s essentially the "thinking" phase of AI—where it takes what it learned during training and uses it in real-world applications.

By integrating Cerebras’ speed into the Llama API, Meta is making high-performance AI more accessible to developers worldwide. This could accelerate innovation across industries, from quick commerce automation to climate modeling—areas you’ve explored extensively.

Andrew Feldman, CEO and co-founder of Cerebras, said, “Cerebras is proud to make Llama API the fastest inference API in the world. Developers building agentic and real-time apps need speed. With Cerebras on Llama API, they can build AI systems that are fundamentally out of reach for leading GPU-based inference clouds.”

Cerebras is the fastest AI inference solution as measured by third party benchmarking site Artificial Analysis, reaching over 2,600 token/s for Llama 4 Scout compared to ChatGPT at ~130 tokens/sec and DeepSeek at ~25 tokens/sec.

Foxconn Launches Its First AI Large Model 'FoxBrain'

Foxconn Launches Its First AI Large Model 'FoxBrain'

Hon Hai Research Institute announced today the launch of the first Traditional Chinese Large Language Model (LLM), setting another milestone in the development of Taiwan’s AI technology with a more efficient and lower-cost model training method completed in just four weeks.

The institute, which is backed by Hon Hai Technology Group (“Foxconn”) (TWSE:2317), the world’s largest electronics manufacturer and leading technological solutions provider, said the LLM – code named FoxBrain – will be open sourced and shared publicly in the future. It was originally designed for applications used in the Group’s internal systems, covering functions such as data analysis, decision support, document collaboration, mathematics, reasoning and problem solving, and code generation.

FoxBrain not only demonstrates powerful comprehension and reasoning capabilities but is also optimized for Taiwanese users' language style, showing excellent performance in mathematical and logical reasoning tests.

"In recent months, the deepening of reasoning capabilities and the efficient use of GPUs have gradually become the mainstream development in the field of AI. Our FoxBrain model adopted a very efficient training strategy, focusing on optimizing the training process rather than blindly accumulating computing power,” said Dr. Yung-Hui Li, Director of the Artificial Intelligence Research Center at Hon Hai Research Institute. ”Through carefully designed training methods and resource optimization, we have successfully built a local AI model with powerful reasoning capabilities."

The FoxBrain training process was powered by 120 NVIDIA H100 GPUs, scaled with NVIDIA Quantum-2 InfiniBand networking, and finished in just about four weeks. Compared with inference models recently launched in the market, the more efficient and lower-cost model training method sets a new milestone for the development of Taiwan's AI technology.

FoxBrain is based on the Meta Llama 3.1 architecture with 70B parameters. In most categories among TMMLU+ test dataset, it outperforms Llama-3-Taiwan-70B of the same scale, particularly exceling in mathematics and logical reasoning (For TMMLU+ benchmark of FoxBrain, please refer to Fig.1). The following are the technical specifications and training strategies for FoxBrain:
  • Established data augmentation methods and quality assessment for 24 topic categories through proprietary technology, generating 98B tokens of high-quality pre-training data for Traditional Chinese
  • Context window length: 128 K tokens
  • Utilized 120 NVIDIA H100 GPUs for training, with total computational cost of 2,688 GPU days
  • Employed multi-node parallel training architecture to ensure high performance and stability
Used a unique Adaptive Reasoning Reflection technique to train the model in autonomous reasoning test results, FoxBrain showed comprehensive improvements in mathematics compared to the base Meta Llama 3.1 model. It achieved significant progress in mathematical tests compared to Taiwan Llama, currently the best Traditional Chinese large model, and surpassed Meta's current models of the same class in mathematical reasoning ability. While there is still a slight gap with DeepSeek's distillation model, its performance is already very close to world-leading standards.

FoxBrain's development – from data collection, cleaning and augmentation, to Continual Pre-Training, Supervised Finetuning, RLAIF, and Adaptive Reasoning Reflection – was accomplished step by step through independent research, ultimately achieving benefits approaching world-class AI models despite limited computational resources. This large language model research demonstrates that Taiwan's technology talent can compete with international counterparts in the AI model field.

Although FoxBrain was originally designed for internal group applications, in the future, the Group will continue to collaborate with technology partners to expand FoxBrain's applications, share its open-source information, and promote AI in manufacturing, supply chain management, and intelligent decision-making.

During model training, NVIDIA provided support through the Taipei-1 Supercomputer and technical consultation, enabling Hon Hai Research Institute to successfully complete the model pre-training with NVIDIA NeMo. FoxBrain will also become an important engine to drive the upgrade of Foxconn’s three major platforms: Smart Manufacturing. Smart EV. Smart City.

The results of FoxBrain is scheduled to be shared for the first time at a major conference during NVIDIA GTC 2025 Session Talk “From Open Source to Frontier AI: Build, Customize, and Extend Foundation Models” on March 20.

IBM Unveils Its Latest AI Models - Granite 3.0 & The Bee Agent Framework

IBM Unveils Its Latest AI Models - Granite 3.0 & The Bee Agent Framework

Technology giant IBM has recently introduced its latest AI models, Granite 3.0 and the Bee Agent Framework.

Granite 3.0 is a compact and efficient Al model designed to run on devices as accessible as a Mac, making it suitable for businesses of all sizes. It's trained on over 12 trillion tokens across 12 human languages and 116 programming languages, and it's available on platforms like GitHub, Hugging Face, and IBM WatsonX.This model is particularly useful for tasks like coding, summarization, and entity extraction.

The Bee Agent Framework allows developers to create versatile Al agents with minimal changes to existing models. It supports models like Llama 3.1 and can be customized using languages such as JavaScript or Python

IBM is also emphasizing transparency in AI development by publicly sharing Granite's training datasets. This move aims to make AI more accessible and trustworthy for businesses.

IBM's focus on transparency and open-source innovation is evident in these new offerings, aiming to make AI more accessible and foster collaboration across industries.

Microsoft-owned Inflection AI and Intel Launch Enterprise AI System

Microsoft-owned Inflection AI and Intel Launch Enterprise AI System

Inflection Al, acqui-hired by Microsoft in June this year, and Intel have recently launched a new enterprise Al system called Inflection for Enterprise. It removes development barriers to accelerate hardware testing and model building.

This system is designed to provide businesses with powerful Al capabilities, including large language models (LLMs), to help them build custom, secure, and employee-friendly Al applications.

Essentially, Inflection for Enterprise is an AI system built around a multi-billion end-point LLM that allows enterprises to own their intelligence in its entirety. Its foundational model is fine-tuned to your business and offers an empathetic, human-centric approach to enterprise AI.

The system is powered by Intel's Gaudi 3 Al accelerators, which are designed to deliver high performance and efficiency.

The service is available on Intel's Tiber Al Cloud, providing a managed cloud infrastructure for developing, accelerating, and deploying Al applications at scale.

Inflection Al's platform, Inflection 3.0, focuses on fine-tuning models using proprietary datasets to build enterprise-specific Al applications.

The system will be available as an industry-first Al appliance powered by Gaudi 3, expected to ship to customers in Q1 2025.

This collaboration aims to set a new standard for Al solutions that deliver immediate, high-impact results for enterprises.

Inflection AI and Intel will also enable developers to build enterprise applications for Inflection for Enterprise, leveraging the robust and human-centric Inflection 3.0 system, to generate critical software tools.

Inflection AI COO, Ted Shelton, said, "Every CEO and CTO we speak to is frustrated that existing AI tools on the market aren’t truly enterprise-grade. Enterprise organizations need more than generic off-the-shelf AI, but they don’t have the expertise to fine-tune a model themselves. We’re proud to offer an AI system that solves these problems, and with the performance gains we see from running on Intel Gaudi, we know it can scale to meet the needs of any enterprise.”

Inflection Al is was founded by entrepreneurs Reid Hoffman (Co-founder and executive chairman of LinkedIn), Mustafa Suleyman (CEO of Microsoft AI, and the co-founder and former head of applied AI at DeepMind) and Karén Simonyan in 2022.

In June this year, Inflection AI was acquired by Microsoft for $650 million. Inflection AI co-founders, Suleyman and Simonyan, announced their departure from the company in order to start Microsoft AI, with Microsoft acqui-hiring nearly the entirety of its 70-person workforce of Inflection AI.

Inflection Al has also collaborated with NVIDIA to develop hardware for generative artificial intelligence.

India Launches BharatGen, the World’s 1st Govt-Funded Multimodal LLM Project

India Launches BharatGen, the World’s 1st Govt-Funded Multimodal LLM Project

India has just launched the BharatGen project, a pioneering initiative aimed at developing generative AI in Indian languages. This state-funded project is spearheaded by IIT Bombay under the National Mission on Interdisciplinary Cyber-Physical Systems (NM-ICPS).

BharatGen is notable for being the world's first government-funded multimodal large language model project. It aims to create high-quality text and multimodal content in various Indian languages, making AI more accessible and inclusive. The project will benefit government, private, educational, and research institutions, and is expected to be completed in two years.

AtmaNirbhar Bharat, Promoting Indian Languages & Social Equity

By leveraging generative AI, the BharatGen project can help preserve and promote the rich linguistic diversity of India. This initiative not only supports cultural heritage but also ensures that technological advancements are inclusive and accessible to a broader population.

BharatGen aligns with the vision of Atmanirbhar Bharat by creating foundational AI models specifically tailored for India. By developing AI technologies within India, BharatGen reduces reliance on foreign technologies and strengthens the domestic AI ecosystem for startups, industries, and government agencies.

Democratizing access to AI through foundational models and detailed technical recipes it allows innovators, researchers, and startups to build AI applications quickly and affordably. A core feature of BharatGen is its focus on data-efficient learning, particularly for Indian languages with limited digital presence. Through fundamental research and collaboration with academic institutions, the initiative will develop models that are effective with minimal data—a critical need for languages underserved by global AI initiatives. BharatGen will also foster a vibrant AI research community through training programs, hackathons, and collaborations with global experts.

One of the primary goals of BharatGen is to deliver generative AI models and applications as a public good. This means prioritizing India’s socio-cultural and linguistic diversity while ensuring that the benefits of AI reach all segments of society.

This initiative also aligns with India's broader goals of promoting social equity, cultural preservation, and linguistic diversity through advanced AI technologies.

Technical Aspects of BharatGen.

The BharatGen project is being developed by a consortium led by IIT Bombay under the National Mission on Interdisciplinary Cyber-Physical Systems (NM-ICPS). The project is managed by the TIH Foundation for IoT and IOE at IIT Bombay.


Several premier academic institutions are involved in this initiative, including IIIT Hyderabad, IIT Mandi, IIT Kanpur, IIT Hyderabad, IIM Indore and IIT Madras.

This collaborative effort aims to create generative AI systems that can produce high-quality text and multimodal content in various Indian languages.

BharatGen focuses on developing multimodal large language models that can handle text, speech, and computer vision tasks. This means the models will be capable of understanding and generating content across different types of media. BharatGen will be developed as an open-source platform. This approach encourages collaboration and innovation, allowing researchers and developers to contribute to and benefit from the project.

The models will be built and trained using datasets that are specifically curated to represent Indian languages and contexts. This ensures that the AI is culturally and contextually relevant.

BharatGen’s roadmap outlines key milestones up to July 2026. These include extensive AI model development, experimentation, and the establishment of AI benchmarks tailored to India’s needs. BharatGen will also focus on scaling AI adoption across industries and public initiatives.

Accenture Invests in AI Startup Martian, Known for Patent-Pending LLM Router

Accenture Invests in AI Startup Martian, Known for Patent-Pending LLM Router

Accenture has invested in Martian, a technology company known for its patent-pending Large Language Model (LLM) router. This router dynamically routes prompts to the most suitable LLM, optimizing for performance, cost, and reliability. 

Accenture plans to integrate Martian's technology into its "switchboard" services, which allow clients to customize and deploy LLMs tailored to specific data sources and use cases¹. This integration aims to enhance the effectiveness and efficiency of AI systems for enterprises.

This investment is part of Accenture Ventures' Project Spotlight, which focuses on working with companies that create disruptive enterprise technologies.

Martian was founded by Shriyash Upadhyay and Etan Ginsberg in 2022. The company is 
headquartered in San Francisco, California.

Regarding funding, Martian has raised a total of $9 million in seed funding. The investment came from notable venture capital firms including General Catalyst, New Enterprise Associates, Prosus Ventures, and Web3.com Ventures.

Martian’s Large Language Model (LLM) router is an innovative tool designed to optimize the use of various LLMs based on performance and cost. The router dynamically routes each query to the most suitable LLM in real-time. This ensures that the best-performing model is used for each specific task. By selecting the most cost-effective model for each query, the router can significantly reduce AI costs. Martian claims it can cut costs by up to 98%.

The router can outperform individual models, including GPT-4, by leveraging the strengths of multiple models. This results in higher overall performance.

If a model experiences an outage or high latency, the router automatically reroutes queries to other available models, ensuring continuous service.

Martian’s API is designed to be simple to use, requiring minimal changes to existing codebases. This makes it easy for developers to integrate the router into their applications. As new models are developed, the router can incorporate them, ensuring that applications always use the latest and best-performing models.

Zoho Making SLMs With Up to 20 Billion Parameters: CEO Sridhar Vembu

Zoho Making SLMs With Up to 20 Billion Parameters: CEO Sridhar Vembu

Zoho, the Software-as-a-Service (SaaS) giant, is developing smaller artificial intelligence (AI) models or better called as Small Language Models (SLMs). According to founder-CEO Sridhar Vembu, these models are based on 7 billion to 20 billion parameters. The Chennai-based company has found that smaller models are better suited for specific domain problems. Additionally, Zoho aims to have its own graphics processing units (GPUs) infrastructure, which is more cost-effective in the long term.

"We are working on models that are based on 7 billion to 20 billion parameters…we are not doing the 500 parameter models as of now. We also want to have our very own graphics processing units (GPUs) infrastructure as that is cheaper in the long term,” Vembu said at CNBC TV18-Moneycontrol's Global AI Conclave.

Notably, Zoho integrates a range of language models (LLMs) within its workflows. These LLMs are used to improve AI output by infusing them with customer and industry-specific data. Zoho essentially plays one LLM against another to achieve better results.

Zoho takes full advantage of Small(er) Language Models (SLMs) to control AI operating costs while maintaining high-quality outputs. By using smaller models with 7 billion to 20 billion parameters, Zoho aims to solve domain-specific problems for its customers.

SLMs like Llama, Mistral, Qwen, Gemma, or Phi3 are designed to be more efficient at focused tasks such as conversation, translation, summarization, and categorization. They offer tailored solutions that are not only cost-effective but also more accessible, allowing for a broader range of applications and innovations.

Zoho's Chief Evangelist, Raju Vegesna, emphasizes that the best AI implementation is when customers don't even notice they're using AI. In other words, the AI seamlessly enhances their experience without being intrusive.

Additionally, Zoho has revealed plans to develop its own extensive language model (LLM), similar to OpenAI's GPT model and Google's PaLM 2. Furthermore, the company is venturing into chipmaking and seeking incentives from the Indian government for this endeavor.

Accenture Launches Accenture AI Refinery™ Framework Built on Nvidia AI Foundry, Also Introduces Llama 3.1 Collection of Openly Available Models

Accenture Launches Accenture AI Refinery™ Framework Built on Nvidia AI Foundry, Also Introduces Llama 3.1 Collection of Openly Available Models

Accenture has just introduced the Accenture AI Refinery™ framework, which leverages NVIDIA AI Foundry. This framework empowers clients to create custom Llama 3.1 language models. These models can be trained on enterprise-specific data and tailored to address unique business needs. By using generative AI, organizations can drive reinvention and transform their industry.

Accenture also launched Llama 3.1 collection refers to a set of openly available language models. These models are part of the Accenture AI Refinery™ framework, which Accenture recently launched. Organizations can use these models as a foundation to build custom language models tailored to their specific needs. By refining and training these prebuilt models with proprietary data, businesses can create powerful AI solutions that address unique challenges in their industry.

The AI Refinery framework includes four key elements:

1. Domain Model Customization and Training: Refining prebuilt foundation models with proprietary data and processes.

2. Switchboard Platform: Allows users to select model combinations based on context, cost, or accuracy.

3. Enterprise Cognitive Brain: Indexes corporate data and knowledge for gen-AI applications.

4. Agentic Architecture: Enables autonomous AI systems to reason, plan, and propose tasks.

These services will be available to all customers using Llama in Accenture AI Refinery, which is built on the NVIDIA AI Foundry service comprised of foundation models, NVIDIA NeMo and other enterprise software, accelerated computing, expert support and a broad partner ecosystem. Models created with AI Refinery can be deployed across all hyper scaler clouds with a variety of commercial options.

This development marks a significant step forward in enterprise generative AI adoption, allowing businesses to create and deploy custom models that align with their unique priorities and industry requirements.

Julie Sweet, chair and CEO of Accenture, said, “The world’s leading enterprises are looking to reinvent with tech, data and AI. They see how generative AI is transforming every industry and are eager to deploy applications powered by custom models. Accenture has been working with NVIDIA technology to reinvent enterprise functions and now can help clients quickly create and deploy their own custom Llama models to power transformative AI applications for their own business priorities.”

Jensen Huang, founder and CEO of NVIDIA, said, “The introduction of Meta’s openly available Llama models marks a pivotal moment for enterprise generative AI adoption, and many are seeking expert guidance and resources to create their own custom Llama LLMs. Powered by NVIDIA AI Foundry, Accenture’s AI Refinery will help fuel business growth with end-to-end generative AI services for developing and deploying custom models.”

Accenture is also using this AI Refinery framework to reinvent its enterprise functions, initially with marketing and communications and then extending to other functions. The solution is enabling Accenture to quickly create gen AI applications that are trained for its unique business needs.

Salesforce Announces Its Groundbreaking 1st Fully Autonomous AI Agent

Salesforce Announces Its 1st Fully Autonomous AI Agent

Salesforce has just unveiled the Einstein Service Agent, their groundbreaking fully autonomous AI agent designed to revolutionize customer service interactions. The CRM software company claims that Einstein Service Agent makes conventional chatbots obsolete with its ability to understand and take action on a broad range of service issues without preprogrammed scenarios, helping make customer service far more efficient. 

1. Dynamic Understanding:

Unlike traditional chatbots — which can only handle specific queries that have been explicitly programmed into their system and don’t understand context or nuance — Einstein Service Agent is intelligent and dynamic.

Built on the Einstein 1 Platform, Einstein Service Agent interacts with large language models (LLMs) by analyzing the full context of the customer’s message and then autonomously determining the next actions to take. It analyzes the full context of customer messages, understanding nuances and context.

2. Generative AI Responses:

Built on the Einstein 1 Platform, it generates conversational responses using generative AI. These responses are grounded in a company's trusted business data, including Salesforce CRM data — tailored to a company’s brand voice, tone, and guidelines with a few clicks.

For service organizations, this means they can offload a large number of tedious inquiries that bog down their productivity so they can focus on tasks that require a human touch.

3. Efficiency and Availability:

Einstein Service Agent operates 24/7 across self-service portals and messaging channels. It offloads tedious inquiries, allowing human agents to focus on tasks requiring a personal touch.

4. Clear Guardrails:

Companies can define privacy and security guardrails using the Einstein 1 Platform, which leverages the Einstein Trust Layer to perform functions like masking personally identifiable information (PII) and defining clear parameters and guardrails for Einstein Service Agent to follow.

5. Cross-channel and multimodal innovation:

Einstein Service Agent can assist customers anytime across self-service portals and messaging channels, like WhatsApp, Apple Messages for Business, Facebook Messenger, and SMS. Because Einstein Service Agent understands text, images, video, and audio, customers can send photos when their issue is too difficult to explain in words.

For complex issues, the new AI agent seamlessly hands off to human agents.

Currently in pilot, Einstein Service Agent will be generally available later this year, transforming how service teams operate and enhancing customer experience. 

NTT Launches Tsuzumi LLM through Microsoft Azure AI MaaS Offering

NTT Launches Tsuzumi LLM through Microsoft Azure AI MaaS Offering

NTT DATA, a global digital business and IT services leader, recently launched Tsuzumi through the Microsoft Azure AI Models-as-a-Service (MaaS) offering. This development represents a significant milestone in their 25-year collaboration, dedicated to pioneering technological solutions that drive sustainability and innovation.

Tsuzumi is a Large Language Model (LLM) with robust capabilities in both Japanese and English. It's designed to address environmental and financial challenges typically associated with LLMs. By adjusting model size without compromising performance, Tsuzumi makes advanced AI technologies, including Generative AI, accessible to a wider range of users and applications. One of its key features is operational adaptability, allowing it to quickly adjust to specific use-case requirements and reduce service provisioning costs. Initially available in Japan on the Azure MaaS platform, plans are underway to expand Tsuzumi's availability to other regions.

Initially available in Japan on the Azure MaaS platform, plans are underway to expand Tsuzumi availability to other regions. Advancements also are planned in multimodality, which will further enhance Tsuzumi's sophisticated capabilities and ensure it meets the evolving needs of businesses across the globe.

Tsuzumi is available in two versions: an ultra-lightweight version with a parameter size of 600 million (0.6B) and a lightweight version with a parameter size of 7 billion (7B), which are 1/300 and 1/25th the size of Open AI's GPT-3's 175 billion (175B), respectively. The lightweight version is designed to perform high-speed inference on a single GPU, while the ultra-lightweight version can do so on a CPU. This design significantly reduces the costs required for training, inference, and tuning.

The word Tsuzumi is a Japanese hand drum with an hourglass-shaped body and two drum heads. The heads are taut and have cords that can be tightened or loosened to adjust the tension. The tsuzumi is used in Japanese traditional music, including Noh, Nagauta, geza, and folk music. Some types of tsuzumi include the ko-tsuzumi and the san-no-tsuzumi, which is played with a wooden stick.

"tsuzumi" is currently in the process of trademark application. Focusing on the processing performance of the Japanese language, it represents the expectation for language model technology that drives industrial development, similar to how the tsuzumi drum initiates the start of a Gagaku (ancient Japanese court music and dance) ensemble.

NTT DATA remains committed to continuous innovation, ensuring that Tsuzumi stays at the forefront of AI technology. Advancements are also planned in multimodality, further enhancing Tsuzumi's capabilities to meet evolving business needs globally.

The launch of Tsuzumi comes at time when Indian IT giant Tech Mahindra has also announced the launch of Indian languages focused LLM 'Project Indic' , starting with Hindi language and its dialects.

Tech Mahindra Launches 'Project Indus' LLM, Phase-1 Designed for the Hindi and Its 37+ Dialects

Tech Mahindra Launches 'Project Indus' LLM, Phase-1 Designed for the Hindi and Its 37+ Dialects

Tech Mahindra has just introduced Project Indus, a large language model (LLM) designed to converse in a multitude of Indian languages and dialects.

Project Indus stands out due to its focus on Indic languages and dialects, making it a unique and valuable addition to the language model landscape.

To give a comparative perspective, the well-known multilingual models like BERT, XLM are trained on a mix of languages, including English, but may not perform optimally for specific Indic languages. Project Indus, on the other hand, is tailored specifically for Indic languages, ensuring better accuracy and understanding.

Similarly, powerful LLMs like GPT-3 and GPT-4 are primarily trained on English and other major languages. Project Indus focuses on Indic linguistic diversity, addressing nuances and dialects that these models might miss.

Existing Indic-specific models such as TALNet and IndicBERT are valuable but may lack the scale and versatility of Project Indus.

In summary, Project Indus bridges the gap by offering a robust, scalable, and context-aware solution for Indian languages. Its focus on dialects and industry applications makes it a promising addition to the AI landscape.

Moreover, Tech Mahindra is collaborating with Dell Technologies and Intel to implement the project’s ‘GenAI in a box’ framework globally. As part of this collaboration, Tech Mahindra will also leverage Intel® Gaudi®AI Accelerators and AI training assets to train the future generation of Indus models as well as skill up its employees on Intel product portfolio (hardware and software) to provide GenAI expertise to its wide network of global customers across industries.

1. Foundational Model for Indic Languages:

  • Project Indus is an indigenous LLM developed by Tech Mahindra.
  • The first phase of Indus LLM focuses on the Hindi language and its 37+ dialects.
  • It aims to provide advanced AI solutions that enable enterprises to scale rapidly.

2. Innovative Deployment Framework: GenAI in a Box:

  • The Indus LLM will be implemented using an innovative framework called 'GenAI in a box'.
  • This solution simplifies the deployment of advanced AI models for enterprises.
  • It leverages Dell Technologies' high-performance computing solutions, storage, and networking capabilities.

3. Intel Collaboration:

  • The LLM also adopts Intel-based infrastructure solutions, including Intel® Xeon® Processors and OneAPI software.
  • Future generation products leveraging CPU features like Intel® Advanced Matrix Extensions (AMX) are used.
  • Tech Mahindra collaborates with Intel to train the future generation of Indus models and skill up its employees on Intel product portfolio.

4. Industry Applications:

  • Project Indus aims to redefine AI-driven solutions across various industries.
  • Use cases include customer support, experience, content creation, and more in sectors like healthcare, rural education, banking, finance, agriculture, and telecom.

5. Dell Technologies' Perspective:

Denise Millard, Chief Partner Officer at Dell Technologies, emphasizes the importance of accessibility and scalability for organizations adopting AI.

The Dell AI Factory supports LLMs like Project Indus, promoting growth, productivity, and innovation.

Tech Mahindra has been making significant strides in offering next-gen solutions to enterprises worldwide. The company recently announced that it is building an LLM to preserve Bahasa Indonesia, the official and national language of Indonesia and its dialects. This collaboration further demonstrates Tech Mahindra's commitment to enabling enterprises to scale rapidly with technological advancements, building a future where AI solutions are accessible, scalable, and responsible.

In A Breakthrough, NTT Develops AI that Can Answer All Kinds of Questions Based on Document Images

In A Breakthrough, NTT Develops AI that Can Answer All Kinds of Questions Based on Document Images

Realize LLM-based visual machine reading comprehension technology

Towards "tsuzumi" that can read and understand visual documents

NTT Corporation has made significant progress in the field of artificial intelligence (AI) with their LLM-based visual machine reading comprehension technology. This breakthrough aims to enable AI systems to answer a wide range of questions based on document images, which is crucial for digital transformation (DX).

Real-world documents often contain both text and visual elements (such as icons, diagrams, etc.). However, existing AI models, including large language models (LLMs), primarily focus on understanding text information.

To address this limitation, NTT proposed Visual Machine Reading Comprehension Technology. The goal was to create an AI system that can read and understand visual documents/ information, similar to how humans do.

Comparison of Text-based and Visual Machine Reading Comprehension.
 Comparison of Text-based and Visual Machine Reading Comprehension.

Previous visual machine reading comprehension techniques struggled with arbitrary tasks, such as information extraction from invoices. Achieving high performance without extensive training data was challenging.

NTT aimed to develop a visual machine reading comprehension model with high instruction-following ability, akin to LLMs.

NTT successfully developed a new visual machine reading comprehension technology that leverages the reasoning ability of LLMs.

The model visually understands documents by analyzing both text and visual information. It can answer complex questions involving diagrams, such as understanding pie charts or other visual representations.

The research results were presented at the 38th Annual AAAI Conference on Artificial Intelligence and received the Outstanding Paper Award at the 30th Annual Conference of the Association for Natural Language Processing.

Notably, this paper is the first to propose a specific methodology for LLM-based visual document understanding.

Tsuzumi


NTT's large language model, called 'Tsuzumi' plays a central role in this technology. Tsuzumi is designed to address the energy consumption challenges associated with large-scale LLMs. It aims to reduce learning and inference costs while maintaining high performance.

The name "Tsuzumi" symbolizes the start of a Gagaku (ancient Japanese court music and dance) ensemble, emphasizing its role in driving industrial development.


Technology

NTT's visual machine reading comprehension technology visually understands documents by utilizing the high reasoning ability of LLMs (Figure below). To achieve this goal, (1) NTT researchers developed a new adapter technology5 that can convert document images into LLM's representations, and (2) constructed the first large-scale visual instruction tuning datasets for diverse visual document understanding tasks. These enable LLMs to understand the content of documents by combining vision and language information and to perform arbitrary tasks without additional training.

Overview of LLM-based Visual Machine Reading Comprehension Technology.
Overview of LLM-based Visual Machine Reading Comprehension Technology.

LLMs with NTT's technology can be used for office works and daily life situations that require human cognition tasks, such as searching and screening documents, and assisting in reading specialized literature.

In a conclusion, NTT's breakthrough in LLM-based visual machine reading comprehension technology brings us closer to AI systems capable of understanding and answering questions based on visual documents—a critical step in the digital transformation journey.

This result is the outcome of joint research with Professor Jun Suzuki in Center for Data-driven Science and Artificial Intelligence Tohoku University in FY2023.

This technology will contribute to the development of important industrial services such as web search and question answering based on real-world visual documents. We aim to establish the technology to realize AI that creates new values by collaborating with humans, including work automation.

Apple Researchers Reveal New AI Model 'ReALM' Claiming That It Outperforms GPT-4



It appears that Apple has made significant strides in AI with their new model named ReALM. According to recent reports, ReALM is designed to be smaller and faster than GPT-4, particularly when parsing contextual data. This could make interactions with Siri more efficient, as ReALM is capable of converting context into text for easier processing by large language models.

In a new research paper published on 29th of March, Apple researchers explain how the new Al system, called ReALM (Reference Resolution As Language Modeling), can look at what's on your screen and what you're doing to figure out what you need which means that Siri could understand the context of your questions much better than before, like knowing what's on your screen or what music is playing.

On top of that, Apple researchers claim that the larger models of ReALM outperform GPT-4. If the claims come true, Siri could become much more helpful than ever. The report notes that Apple's ReALM language model purportedly surpasses GPT-4 in "reference resolution," understanding contextual references like onscreen elements, conversational topics, and background entities.

Apple's research suggests that even the smallest ReALM models perform comparably to GPT-4 with fewer parameters, making it well-suited for on-device use. With increased parameters, ReALM substantially outperforms GPT-4. 

Summary of the key findings from Apple's ReALM research paper:

Efficiency: ReALM is designed to be smaller and faster than large language models like GPT-4, making it well-suited for on-device use.

Reference Resolution: The model excels in reference resolution, which is the ability to understand context and ambiguous references within text. This is crucial for interpreting user commands in a more natural way.

Performance: Even the smallest ReALM models performed similarly to GPT-4 with much fewer parameters. When the number of parameters was increased, ReALM substantially outperformed GPT-4.

Image Parsing: Unlike GPT-4, which relies on image parsing to understand on-screen information, ReALM converts images into text, bypassing the need for advanced image recognition parameters. This contributes to its smaller size and efficiency.

Decoding Constraints: ReALM includes the ability to constrain decoding or use simple post-processing to avoid issues like hallucination, enhancing its reliability. 

Practical Applications: The paper illustrates practical applications of ReALM, such as enabling Siri to parse commands like "call the business" by understanding the context, like a phone number displayed on the screen.

Apple's research indicates that ReALM could significantly improve the speed and accuracy of Siri, making interactions with the voice assistant more intuitive and efficient. The company is expected to reveal more about its AI strategy during WWDC 2024.
 
This development is quite exciting as it indicates progress towards more responsive and intuitive AI systems that can better understand and process user commands. It's also a step forward in the integration of AI in everyday devices, potentially enhancing user experience significantly. Apple plans to unveil more about its AI initiatives in June, which could include further applications of ReALM.

Softbank Corp with SKT, Deutsche Telekom and Others Forming JV Company To Develop LLMs Specifically for Telcos

Softbank Corp with SKT, Deutsche Telekom and Others Forming JV Company To Develop LLMs Specifically for Telcos

SKT, Deutsche Telekom, e&, Singtel, and SoftBank Corp. Announce Plan to Establish a Joint Venture
  • Joint Venture Company to develop Large Language Models (LLM) specifically for telecommunications companies
  • Joint Venture Company to be established within this year
  • Initial fine-tuning of models takes place
SK Telecom (“SKT”), Deutsche Telekom, e& Group, Singtel and SoftBank Corp., on Monday, held the inaugural meeting of the Global Telco AI Alliance (GTAA) at MWC Barcelona 2024 and announced their plans to establish a joint venture.

The meeting was attended by SK’s Chairman Chey Tae-won, SKT’s CEO Ryu Young-sang, Deutsche Telekom’s CEO Tim Höttges and DT’s Board Member for Technology & Innovation, Claudia Nemat, e& Group’s Group CEO Hatem Dowidar, Singtel Group’s CEO Yuen Kuan Moon, and SoftBank’s CISO Tadashi Iida.

Through the Joint Venture Company, the five companies plan to develop Large Language Models (LLMs) specifically tailored to the needs of telecommunications companies (telcos). The LLMs will be designed to help telcos improve their customer interactions via digital assistants and chatbots.

The goal is to develop multilingual LLMs optimized for languages including Korean, English, German, Arabic and Japanese, with plans for additional languages to be agreed among the founding members.

The joint venture plans to focus on deploying innovative AI applications tailored to the needs of the Global AI Telco Alliance members in their respective markets, enabling them to reach a global customer base of approximately 1.3 billion across 50 countries. Deutsche Telekom boasts about 250 million subscribers across 12 countries, including Germany and the U.S. The e& Group has 169 million subscribers in 16 countries across the Middle East, Asia, and Africa, while the Singtel Group has 770 million subscribers in 21 countries, including Australia, India, and Indonesia.

The Joint Venture Company will be established within this year.

Earlier this month, the GSMA and IBM announced the collaboration to support the adoption and skills of generative AI in the telecom industry through the launch of GSMA Advance's AI Training program and the GSMA Foundry Generative AI program

Compared to general LLMs, telco-specific LLMs are more attuned to the telecommunications domain and better at understanding user intent. By making it easier for telcos to deploy high-quality generative AI models swiftly and efficiently, telco-specific LLMs are expected to help accelerate AI transformation of various telco business and services, including customer service.

The LLMs are currently being optimized. Telcos’ customer service data is used to fine-tune the model for telco-specific questions. This is because tariff and contract models, information on special hardware such as the router, for example (e.g. How do I do a reset?) are rarely found in the general training data of the large models. But it's exactly this content that a telco bot needs to know. So that it is able to understand, summarize and respond to these specific concerns.

This targeted training ensures the LLM understands the unique language and needs of telecom operators, paving the way for enhanced, personalized, and efficient customer experiences.

“We as telcos need to develop tailored LLM for the telco industry to make telco operations more efficient, which is a low-hanging fruit. Our ultimate goal is to discover new business models by redefining relationships with customers. The Global Telco AI Alliance brings synergy to its members by allowing them to achieve more by working as a team,” said Ryu Young-sang, CEO of SKT.

"We want our customers to experience the best possible service. AI helps us do that. Already today, more than 100,000 customer service dialogs a month in Germany are handled by Generative AI. By integrating telco-specific large language models, our 'Frag Magenta' chatbot becomes even more human-centric: AI personalizes conversations between customers and chatbots. And our joint venture brings Europe and Asia closer together,” said Claudia Nemat, Board Member Deutsche Telekom for Technology and Innovation.

“This is a monumental step for e& and for the Telco industry at large. From streamlining customer support interactions to enabling personalised recommendations, this multi-lingual LLM will revolutionise how businesses engage with customers”, said Dena Almansoori, Group Chief AI and Data Officer, e& group. “In collaboration with our Global AI Telco Alliance partners, we look forward to shaping both the present and future of customer engagement and setting new standards for efficiency and innovation across the telecommunications landscape to better serve our customers and create meaningful impact."

"This promises to be a game changer not just for us at Singtel but for any telecom company out there looking to lift their customer experience beyond limited automated responses and generic chatbot interactions. This multi-lingual LLM tailored for telcos will greatly expand chatbot capabilities with relevant responses to customers’ technical queries, freeing up service agents to deal with more complex customer issues and we intend to deploy this across the Singtel Group. With leading telcos from three different continents working on this innovative model, this unprecedented effort to scale AI development for the telecom industry would not have been possible had we all decided to go it alone,” said Yuen Kuan Moon, Group Chief Executive Officer, Singtel.

“Through a powerful alliance with industry leaders, we embark on a mission to revolutionize global communication, elevate service quality, and ignite a new era of technological innovation powered by AI. Together, we have the power to shape the future of telecommunications, empowering communities worldwide with seamless connectivity and boundless opportunities,” said Hideyuki Tsukuda, Executive Vice President & CTO of SoftBank Corp.

Google’s Open Source Models Gemma Optimized to Run on NVIDIA GPUs

Google’s Open Source Models Gemma Optimized to Run on NVIDIA GPUs

NVIDIA, in collaboration with Google, on Wednesday, launched optimizations across all NVIDIA AI platforms for Gemma — Google’s state-of-the-art new lightweight 2 billion– and 7 billion-parameter open language models that can be run anywhere, reducing costs and speeding innovative work for domain-specific use cases.

Teams from Google and NVIDIA closely worked together to accelerate the performance of Gemma — built from the same research and technology used to create the Gemini models — with NVIDIA TensorRT-LLM, an open-source library for optimizing large language model inference, when running on NVIDIA GPUs in the data center, in the cloud and on PCs with NVIDIA RTX GPUs.

This allows developers to target the installed base of over 100 million NVIDIA RTX GPUs available in high-performance AI PCs globally.

Developers can also run Gemma on NVIDIA GPUs in the cloud, including on Google Cloud’s A3 instances based on the H100 Tensor Core GPU and soon, NVIDIA’s H200 Tensor Core GPUs — featuring 141GB of HBM3e memory at 4.8 terabytes per second — which Google will deploy this year.

Enterprise developers can additionally take advantage of NVIDIA’s rich ecosystem of tools — including NVIDIA AI Enterprise with the NeMo framework and TensorRT-LLM — to fine-tune Gemma and deploy the optimized model in their production application.

Learn more about how TensorRT-LLM is revving up inference for Gemma, along with additional information for developers. This includes several model checkpoints of Gemma and the FP8-quantized version of the model, all optimized with TensorRT-LLM.

Interested decelioers/users can experience Gemma 2B and Gemma 7B directly from their browser on the NVIDIA AI Playground.

Gemma Coming to Chat With RTX


Adding support for Gemma soon is Chat with RTX, an NVIDIA tech demo that uses retrieval-augmented generation and TensorRT-LLM software to give users generative AI capabilities on their local, RTX-powered Windows PCs.

The Chat with RTX lets users personalize a chatbot with their own data by easily connecting local files on a PC to a large language model.

Since the model runs locally, it provides results fast, and user data stays on the device. Rather than relying on cloud-based LLM services, Chat with RTX lets users process sensitive data on a local PC without the need to share it with a third party or have an internet connection.


Google Introduces GEMMA, An Open Source Lightweight GenAI Models That Can Be Run Anywhere

Google Introduces GEMMA, An Open Source Lightweight GenAI Models That Can Be Run Anywhere

Google has just introduced a new generation of open models to assist developers and researchers in building AI responsibly — Gemma, a family of lightweight, state-of-the art open models built from the same research and technology used to create Google’s earlier 'Gemini' models.

Google says that Gemma models achieved exceptional benchmark results at its 2B and 7B sizes, even outperforming some larger open models.

Developed by Google DeepMind and other teams across Google, Gemma is inspired by Gemini, and the name reflects the Latin gemma, meaning “precious stone.”

Google is also releasing tools to support developer innovation, foster collaboration, and guide responsible use of Gemma models.

Gemma is available worldwide, starting today. Here are the key details to know:
In addition, Google has also released a new Responsible Generative AI Toolkit together with Gemma to help developers and researchers prioritize building safe and responsible AI applications. The toolkit includes:
  • Safety classification: We provide a novel methodology for building robust safety classifiers with minimal examples.
  • Debugging: A model debugging tool helps you investigate Gemma's behavior and address potential issues.
  • Guidance: You can access best practices for model builders based on Google’s experience in developing and deploying large language models.

Free credits for research and development

Gemma is built for the open community of developers and researchers powering AI innovation. You can start working with Gemma today using free access in Kaggle, a free tier for Colab notebooks, and $300 in credits for first-time Google Cloud users. Researchers can also apply for Google Cloud credits of up to $500,000 to accelerate their projects.

Sarvm AI Collaborates with Microsoft to Bring Indian Languages to Azure

Sarvam AI Collaborates with Microsoft to Bring Indian Languages to Azure

Sarvam AI collaborates with Microsoft to bring its Indic voice large language model (LLM) to Azure

The collaboration aims to enable Sarvam AI to leverage Azure AI and Azure Infrastructure to build and deploy their voice LLM stack

It also aims to enable developers to build generative AI apps at scale and empower enterprises to adopt generative AI quickly and responsibly

Indian generative AI startup Sarvam AI has announced that it is working with Microsoft to make its Indic voice large language model (LLM) available on Azure. The collaboration reinforces Microsoft’s commitment to enabling AI-driven growth and innovation in India.

Sarvam AI is building generative AI models targeting Indic languages and contexts. The startup aims to make the development and deployment of generative AI apps in India more accurate and cost effective.

Sarvam AI’s Indic voice LLM, which intends to provide a natural voice-based interface to LLMs, will initially be available in Hindi. Sarvam AI is working to expand coverage to more Indian languages while ensuring support for colloquial language use.

Pratyush Kumar, co-founder of Sarvam AI, said, “We are very excited to collaborate with Microsoft to make advanced AI technology accessible and impactful for everyone in India. This partnership embodies our vision of ‘Sarvam’ – meaning ‘all’ – by enhancing AI’s reach across various Indian languages and sectors.

Voice is one of the most natural interfaces for generative AI applications in Indian languages and can be applied in sectors such as education, finance, healthcare, and customer service. By working to make Sarvam AI’s Indic voice LLM available on Azure, Microsoft is laying the foundations for more India-focused developers to build real-time, voice-based generative AI apps at scale.

The companies will also collaborate to help enterprises adopt generative AI quickly and responsibly. Vivek Raghavan, co-founder of Sarvam AI, said, “We see great synergy in combining the deep-tech expertise at Sarvam AI with Microsoft’s leadership in frontier models and the Copilot stack to empower enterprises to be successful in Generative AI”.

Sarvam AI will use Microsoft’s cutting-edge cloud and AI infrastructure – including Azure OpenAI Service and Azure Machine Learning – to train, host, and scale its LLMs quickly and efficiently. As part of the collaboration, Sarvam AI and Microsoft are also researching ways to better support Indian languages in Microsoft’s Generative AI frontier language models and to deploy them at scale.

At Microsoft, we are committed to enabling AI for everyone, empowering India’s transformation into an AI-first nation,” said Puneet Chandok, president of Microsoft India & South Asia. “Through our collaboration with Sarvam AI, we are not just supporting homegrown innovation – we are fostering a future where every individual, regardless of their language or background, can benefit from the power of voice-driven AI solutions. Together, we are taking a significant step toward enabling India’s people, communities, and organizations to achieve more.”

Click here to learn more about Satya Nadella’s visit and how Microsoft is empowering organizations in India with AI.

About Sarvam AI

Sarvam AI is on a mission to lead transformative research in AI to make the development, deployment, and distribution of generative AI applications in India more robust, better performing, and cheaper. The company aims to develop a full stack of generative AI solutions, including efficient large-scale Indic language models and an enterprise-grade platform for building generative AI apps. Sarvam AI is also committed to fostering a generative AI ecosystem through open-source contributions and large-scale data curation for the public good.


Bengaluru-based Sarvam AI Releases 1st Hindi LLM 'OpenHathi' with GPT-3.5 Like Performance

Bengaluru-based Sarvam AI Releases 1st Hindi LLM 'OpenHathi' with GPT-3.5 Like Performance

Bengaluru-based Sarvam AI has recently announced the released OpenHathi-Hi-v0.1, the first Hindi LLM from its OpenHathi series of models. Trained under compute and data constraints, this Generative AI model can get GPT-3.5-like performance on Indic languages with a frugal budget, claims the 5 months old Sarvam AI, which recently raised funds from Lightspeed Ventures, Peak XV and Khosla Ventures.

The OpenHathi series of work at Sarvam AI is to make contributions to the ecosystem with open models and datasets to encourage innovation in Indian language AI.

OpenHathi is developed by Sarvam AI in partnership with AI4Bharat, a research lab at IIT Madras which works on developing open-source datasets, tools, models and applications for Indian languages.

AI4Bharat has contributed language resources and cross-lingual benchmark.

Build on top of Meta AI's Llama2-7B, the AI model used by Sarvam AI has a 48,000-token extension of Llama2-7B’s tokenizer and undergoes a two-phase training process. and extend its tokenizer to 48K tokens –

1) Embedding alignment: aligns the randomly initialised Hindi embeddings

2) Bilingual language modeling: teaches the model to attend cross-lingually across tokens.



The Tokenizion of Hindi

Tokenization refers to splitting text into smaller parts for easier machine analysis, helping machines understand human language.

To add Hindi skills to Llama-2, Sarvam AI firstly decreases the fertility score (the average number of tokens a word is split into) of its tokeniser on Hindi text. This made both training and inferencing faster and more efficient. It then train a sentence-piece tokeniser from a subsample of 100K documents from the Sangraha corpus, created at AI4Bharat, with a vocabulary size of 16K. It then merge this with the Llama2 tokeniser and create a new tokeniser with a 48K vocabulary (32K original vocabulary plus our added 16K).

For datasets of the OpenHathi's base model, Sarvam AI partnered – VerSe, hindi social media platform 'Koo' , and Kissan AI, (previously known as KissanGPT) an advanced multilingual AI chatbot engineered to help Indian farmers.

This open-sourced base model of OpenHathi has been trained with bilingual language modelling and thus needs fine-tuning on tasks for it to be used as an instruction-following model. It has also not been aligned and thus can occasionally generate inappropriate content seen in its original pretraining.

The company is inviting people to innovate on top of this latest release of OpenHathi series, by building fine-tuned models for different use-cases. Sarvam AI will additionally release enterprise-grade models on its full stack GenAI platform, which will launch soon. The base model is at HuggingFace – Here

The start-up company claims that latest release of OpenHathi model works as well as, if not better than GPT-3.5 on various Hindi tasks while maintaining its English performance. Along with standard NLG tasks, we also evaluate on a bunch of non-academic, real-world tasks.


Market Reports

Market Report & Surveys
IndianWeb2.com © all rights reserved