Here’s another entry in our journal about our AI experiments with Glific If you missed our last post, check it out
https://glific.org/from-experiments-to-integration-in-glific-llm-journey-so-far/
In April, OpenAI introduced File Search. File Search lets the Assistant access information from outside its model, like proprietary product data or user-provided documents. OpenAI takes care of parsing, chunking, embedding, and retrieving this content using vector and keyword search.
In short, this feature allows us to create a vector database on the fly, upload documents to act as a knowledge base, link the Assistant to this knowledge base, and then ask questions. It’s similar to our custom solution but managed entirely by OpenAI.
File Search offers capabilities similar to Jugalbandi and our recent LLM4Dev experiment. So, we ran some tests to compare these three tools and see how they stack up.

- Major Takeaway: Where OpenAI and OpenAI responses are at least related to the topic in the question, LLM4Dev needs to be in the ballpark.
Thus, we decided to create a dedicated webhook for OpenAI File Search and integrate this feature into Glific.
Another recent development is addition of text-to-speech support.
We already have speech-to-text in Glific through Google ASR, Navanatech, and Bhasini. You can check out how different NGOs are using these solutions here:
Next, we want to add text-to-speech capabilities. We’re considering Bhasini and the OpenAI Whisper model. Before making a decision, we’re running some experiments to evaluate their quality.
From our results, Bhasini emerged as the better option, especially since we’re focused on Indic languages, which most of our NGO partners in India use.

Major Takeaway: Whisper speech to text for transcribing indic voice notes to text is not as good as Bhashini
We’ve started integrating Bhasini into Glific, so all of its features can be used at the flow level, helping to bridge language gaps. For more details on Bhasini’s offerings, check out their documentation.
For more details about Bhasini offerings checkout their documentation
In addition to text-to-speech, Bhasini also offers translation options, allowing us to convert responses to users’ preferred languages.
With this integration, we can use existing webhooks in various ways, and several NGOs have already started using these new features.
Generating Voice Notes from Text with Glific
To use Text-to-Speech in Glific, just add a call-a-webhook node with the text you want to convert and the user’s preferred language. The text will be turned into an audio note in their language and sent back to them in the flow.
Similarly one can Text-to-Speech with translation by passing two additional parameter i.e. source and destination language. This way the text is translated first and then audio note is generated
Talk with GPT: Turning Text Responses into Voice Notes
Since Text-to-Speech works by taking text input along with the source and destination languages, this means one can use text from any source, such as responses generated by GPT, to create audio replies for user queries.

For example, in Udhyam‘s case, a school student can ask questions about their SBIC (Societal and Business Innovation Challenge) program, which focuses on cultivating an entrepreneurial mindset among students.
Vision to Voice: Analyzing Images into Audio Feedback
Since Text-to-Speech can take input text generated from any source, we can combined with the GPTVision node, allowing users to share images, have those images analyzed, and receive an audio response.
E.g. For instance, KEF has shown interest in this approach. Their use case involves parents submitting activities via WhatsApp on a weekly basis. By integrating GPTVision with text-to-speech, we can provide an extra layer of quality control and offer detailed feedback on these submissions.

Seamless voice conversations with AI
We can combine three nodes: speech-to-text, GPT file search, and text-to-speech with translation. This flow is particularly useful for Sneha, where ASHA workers’ training materials can be uploaded as knowledgebase and mothers can ask questions in Hindi or Hinglish, whether in text or voice, and the bot can generate responses and send them back as text or voice notes. This system makes it easier for mothers to access vital information in the language they’re comfortable with, enhancing their understanding and confidence.
Read more about their initial pilot in this blog: Sneha Didi: Helping Embrace Motherhood with Confidence and Care.
As we keep exploring AI with Glific, adding features like OpenAI’s File Search and Bhasini’s text-to-speech is a big step forward. These tools not only make Glific better but also help us serve NGOs and their communities more effectively. By breaking down language barriers and improving access through audio responses, we can ensure that everyone gets the information they need, regardless of literacy levels. Over the next few months, we’ll work closely with NGOs, helping them adopt these new capabilities, gathering feedback, and making improvements step by step. Stay tuned for more updates as we strive to make a meaningful impact.
[…] https://glific.org/enhancing-communication-with-ai-and-text-to-speech-in-glific/ […]