Threat in AI-generated code

Our “fellow” from Slovakia, Andrei Karpathy, former chief engineer for AI vision in Tesla, has recently sparked the trend of vibe-coding, accepting all the code AI spits out and just riding the vibe. While this approach can have some success with smaller-scale projects and code-bases, in inevitably falls apart for security concerns - as the AI generated code does not have any guardrails.

A study analyzing code from GitHub Copilot and other AI tools found that nearly 30% of Python and 24% of JavaScript snippets had security weaknesses, including issues like insufficient randomness and cross-site scripting. Another report indicated that almost half of the code snippets produced by five different AI models contained bugs that could potentially be exploited. These vulnerabilities often stem from AI models lacking contextual awareness, leading to the generation of code that doesn't adhere to best security practices. There are other risks too, though.

Slopsquatting and Package Hallucination

AI models sometimes suggest non-existent packages, a phenomenon termed "slopsquatting." Attackers can exploit this by creating malicious packages with these names, leading developers to inadvertently incorporate harmful code into their projects.

A notable example involves security researcher Bar Lanyado, who uploaded an empty package named "huggingface-cli"—a name hallucinated by AI models—to a public repository. Within three months, this package was downloaded over 30,000 times, demonstrating how attackers can exploit AI-generated code suggestions to distribute malicious or unwanted packages.

This is closely linked to another trend in AI-code generation. Data poisoning involves injecting malicious code into the training data of AI models, causing them to generate vulnerable or biased outputs. Even small amounts of poisoned data can significantly impact the security of AI-generated code, making these attacks hard to detect.

A concerning instance of this is the "Pravda" disinformation network, which has been linked to Russian efforts to manipulate AI training data. This network floods the internet with misleading articles designed more for AI web crawlers than for human readers. Research reveals that a third of chatbot responses on targeted topics, like supposed U.S. bioweapons in Ukraine, repeated these falsehoods. Such operations exploit the data-hungry nature of AI models, leading to the propagation of disinformation.

Attackers are finding novel ways to exploit AI systems. It is imperative for developers and organizations to remain vigilant, conduct thorough code reviews, and implement robust security measures to mitigate these emerging threats.

Author: Oldřich Příklenk

Picture: chatgpt.com

Slopsquatting and Package Hallucination

Stay Updated!