New
research from HiddenLayer reveals inherent vulnerabilities that impact Google's
new Gemini LLM. Users of Gemini Advanced with Google Workspace as well as the
LLM API could face serious issues. Researchers discovered that attackers could
manipulate another users' queries and output. The vulnerability is the latest
problem to emerge on the Gemini platform. In late February, Google removed
Gemini's image generator from service due to content bias, and yesterday Google
placed new restrictions on the platform that prevent it from responding to
global election queries, out of concerns about biased or manipulated content.
The
HiddenLayer findings bring new context to growing public concerns about AI
platform susceptibility to content manipulation and the potential for
misinformation, which have prompted legislative initiatives to regulate
Generative AI technology.
In
the report "New
Google Gemini Content Manipulation Vulns Found - Attackers Can Gain Control of
Users' Queries and LLM Data Output - Enabling Profound Misuse,"
HiddenLayer disclosed multiple prompt hacking vulnerabilities that enable
attackers to conduct activities that allow for misuse and manipulation.
Examples of potential misuse cited included the ability to output
misinformation about global events, multiple avenues that enabled system prompt
leakage, and the ability to inject a model indirectly with a delayed payload
via Google Drive.
Vulnerabilities
HiddenLayer discovered include:
- System
prompt leakage,
which can cause the LLM to disclose the exact instructions it has been given.
- Prompted
Jailbreak,
which enables the generation of misinformation through clever rewording of
prompts to bypass guardrails requiring truthfulness of content.
- Reset
Simulation,
in which uncommon words and phrases prompted Gemini to repeat previous
instructions it received, even if those instructions (such as a password) were
specified as secret.
- Indirect
Injection: a
prompt injection attack in which external data that the AI system ingests is
manipulated or malicious instructions are fed to the platform by an external
resource, such as a website or API call, potentially resulting in poisoned
outputs.
HiddenLayer
researchers warned that the ramifications of the vulnerabilities could be
widespread, and that with the accelerating adoption of LLM AI, companies must
be vigilantly aware of risks and abuse methods that Gen AI and Large Language
Models may suffer, and shore up their policies and defenses.