Generally speaking, the lack of specificity in user communication with GenAI increases the likelihood of the agent predicting incorrect outputs. This is a recognized problem across many (if not all) GenAI products but also an opportunity for some to stand out over something that is definitely solvable or at least significantly controllable.
There are a number of strategies that GenAI development can take in order to progressively identify these situations of indetermination and therefore ambiguity that, if coupled with the right user-centered approach, could reduce this inherent setback. Good news is, there are promising approaches proposed by NN group as well as “Co-pilot” features that are already experimenting with ways to proactively drive users to solve for that issue.
In this article I focused on a specific use case (of many) which I believe could be a low-hanging fruit worth looking at for most GenAIs: Polysemy.
In simple terms, Polysemy refers to the phenomenon where words have multiple meanings depending on the context, like for example ‘light’ as a level of mass density or as perception of electromagnetic radiation. “This bag is light” or “turn off the light”. These apply to nouns, adjectives, verbs and even acronyms.
On the positive-side, polysemic words are easy to identify, formally classified in most languages and are therefore suitable candidates for models to get familiar with and proactively narrowing the risks by “asking the customer” for disambiguation. But is this really an issue?
In my small experiment to understand whether this was an extended problem, I used a rather small and simple sample of polysemic nouns and tested across different public (and premium) LLMs, expecting them to identify a potential case of ambiguity and anticipate the risk of a wrong output.
>What's the best format for a
mail
?
>What are the best kind of
matches
?
>How can I plant a
bug
?
>How do I serve a
ball
?
>How do I become a
PM
?
>What are the most recommended
shows
?
>What's the average cost of a
server
?
None of the tested GenAI models succeeded in correctly interpreting more than half of the examples.
Some readers may be of the opinion that given some inherent degree of user input ambiguity, we should at some point be risk takers. Layers of friction for all users in order to ensure the right “context” is only acceptable when there is a rather meaningful distribution. So based on data the “most likely of meanings” should prevail for the sake of general user experience.
While I do believe this could be a valid argument, I wonder how can we ensure inclusivity in models? How can we identify situations in which the minority of users expecting the least likely interpretation are not overlooked? How do we ensure that this UX maximization approach doesn’t end up creating a model that discriminates against other users?
Perhaps the solution isn't binary; perhaps it should be up to the users to decide. For those of you who know the “I am feeling lucky” feature of the legendary Google search engine, this could be an interesting experiment to look into. For those of you who don’t know, I invite you to try it.