arXiv: Separating Secrets from Placeholders: A Hybrid CNN-CodeBERT Framework for Three-Class Credential Leakage Detection

AI_SAFETY AI Security & Safety · 29 May 2026 · arxiv_cscr

AI Analysis

This publication introduces a novel hybrid machine learning framework, combining CNN and CodeBERT architectures, designed to detect credential leakage in source code with three-class classification: distinguishing between real secrets, placeholder tokens, and benign code. The research addresses a critical gap in existing detection tools, which often generate high false-positive rates by misclassifying placeholders as genuine credentials. While not a regulatory mandate, this paper signals an emerging technical standard for AI-driven security compliance, particularly relevant under the EU AI Act’s requirements for robust risk management in high-risk AI systems.

Organizations most affected include software development firms, cloud service providers, financial institutions, and any entity handling sensitive credentials in code repositories. Sectors subject to strict data protection regulations, such as finance, healthcare, and critical infrastructure, should take note, as improved detection reduces the risk of non-compliance with GDPR, NIS2, and sector-specific data breach notification obligations.

Compliance teams should immediately review their current credential scanning tools for false-positive rates and consider piloting this hybrid approach to enhance detection accuracy. They should also update internal AI risk assessments to reflect this technical capability, document any reliance on such models for compliance controls, and prepare for potential regulatory scrutiny under the EU AI Act’s transparency and accuracy requirements for AI-based security systems.

View original source →

Get notified about AI_SAFETY changes

Subscribe to our free weekly digest covering 24 compliance frameworks.