A recent study finds that software engineers who use code-generating AI systems are more likely to cause security vulnerabilities in the apps they develop. The paper, co-authored by a team of researchers affiliated with Stanford, highlights the potential pitfalls of code-generating systems as vendors like GitHub start marketing them in earnest.

“Code-generating systems are currently not a replacement for human developers,” Neil Perry, a PhD candidate at Stanford and the lead co-author on the study, told TechCrunch in an email interview. “Developers using them to complete tasks outside of their own areas of expertise should be concerned, and those using them to speed up tasks that they are already skilled at should carefully double-check the outputs and the context that they are used in in the overall project.”

The Stanford study looked specifically at Codex, the AI code-generating system developed by San Francisco-based research lab OpenAI. (Codex powers Copilot.) The researchers recruited 47 developers — ranging from undergraduate students to industry professionals with decades of programming experience — to use Codex to complete security-related problems across programming languages including Python, JavaScript and C.

Codex was trained on billions of lines of public code to suggest additional lines of code and functions given the context of existing code. The system surfaces a programming approach or solution in response to a description of what a developer wants to accomplish (e.g. “Say hello world”), drawing on both its knowledge base and the current context.

According to the researchers, the study participants who had access to Codex were more likely to write incorrect and “insecure” (in the cybersecurity sense) solutions to programming problems compared to a control group. Even more concerningly, they were more likely to say that their insecure answers were secure compared to the people in the control.