Introduction
In pursuit of embracing innovation, iterative building and learning by doing, Glific team launched a v2 of the AI support bot. This is to leverage the power of LLMs to dip into the Glific platform’s documentation and generate a quick response to the queries being shared by NGOs (clients of the platforms) on the support channel. Read more on the context, prompts and model used here. A quick shoutout goes to Discord and Jugalbandi’s API to make this automation possible (for those interested to dive into the code go here)
Following is the review from the responses generated post 1 month of implementation.
Context
To evaluate, following was done
- All the data is being exported into a Google Sheet in real time. We are capturing the NGO query, the LLM response, and other meta-data needed for a drill down at later point
- We came up with a rubric to rate the responses generated by the LLM, which is as follows ( Responses are rated on 1-4, with 1 being the lowest rating and 4 being the highest)
- Irrelevant response
- Relevant but incomplete or inaccurate response
- Accurate but not actionable response
- Accurate and actionable response
- Reviewed each response on a scale of 1 to 4 based up discretion of the support team (subject matter expert)
- Checked each of the links generated in the response to make sure it pointed to a relevant documentation page.
Observations
- 36 queries were raised by in a span of 30 days.
- About 30% of the responses were actionable, this means the support team just had convey to the NGO that the automated answer was accurate and to follow those steps. This is a good win to reduce the dependencies of typing out steps for simple troubleshooting, or referring to the documentation and resharing.
- About 66% of the responses needed more action from the support team, and a majority of this is attributed to the complexity of the query or the lack of specific information provided in the query.
- The 8% inaccurate responses were attributed to a technical glitch in the servers and not an inaccurate response by the LLM.
- About 55% of the links shared from the Glific documentation were corrupted which is a problem statement to solve moving forward.
Next Steps:
- Going deep into the responses rated 2 and 3 to understand the patterns and reasons why the responses were of lower quality.
- Going deep into links shared, to verify if they were hallucinations or an issue of url formatting or something else altogether.
