My work pursues a single thesis: that the security of modern AI systems is governed, at the deepest level, by the geometry of their representation space. Most state-of-the-art models — face recognizers, speaker recognizers, vision–language models — embed their inputs into a high-dimensional hypersphere; the way that hypersphere is partitioned determines what an attacker can extract and what a defender can hide. My research treats this geometry as the primary object of study, rather than as background, and uses it to organize three research threads.
Modern face-recognition pipelines map an input to an embedding on a high-dimensional hypersphere $\mathbb{S}^{d-1}$. Within that hypersphere sit attribute subspaces — low-dimensional regions along which identity-relevant attributes vary. Once that structure is understood, surprisingly little information is needed to invert a template or impersonate an identity.
Most prior black-box attacks on face recognition rely on adaptive query strategies: tens of thousands of queries, each one chosen in response to the last. My work introduced the first non-adaptive counterpart — attacks that fix all queries up front. Using only a handful of images and a single batch query, the attack reconstructs a target identity or generates a perturbation-free impersonation image, and it succeeds against deployed commercial APIs (Amazon Rekognition, Tencent, among others). This line of work appeared at IEEE S&P 2024 and NeurIPS 2025.
The non-adaptive framework extends naturally beyond faces. Palmprint recognition, speaker recognition, and the identity-verification stacks increasingly used in finance and authentication all live on similar hyperspherical embedding spaces, and all inherit the same structural weaknesses. A current direction analyses how the source–target embedding distance controls the difficulty of a targeted attack, and frames a query-budget splitting strategy as an optimization problem.
So far my published attacks have stayed within biometrics — faces and speech. The natural next step is to push the same machinery onto models that are not biometric at all but inherit the same geometry. CLIP and the vision–language models built on top of it use the same hyperspherical embedding primitives, and an ongoing line of work uses high-level prompting to iteratively refine images so as to drift the CLIP decision boundary, defeating modern deepfake detectors without any pixel-level perturbation. The flip side of strong attacks is the highest-purity adversarial training data we can produce, so the endgame is a closed loop in the spirit of GANs: use mathematically grounded attacks to generate worst-case data, then use it to train recognizers and detectors that are demonstrably more robust.
Passwords match exactly. Biometric embeddings — and, more broadly, the feature vectors inside any modern recognizer — do not: every measurement carries a different noise realization, and the resulting embedding drifts. The whole point of using a biometric is to spare the user the burden of carrying a secret, so the basic challenge of biometric template protection (BTP) is to authenticate fuzzy data without a secret key while keeping the underlying biometric private. Standard cryptography is not directly applicable, and the right repair depends on the deployment setting.
In verification (1:1 matching), the absence of a secret key is exactly the design constraint. Storing the template in plaintext leaks the underlying biometric the moment the database is breached, and encrypting it under a user-held key reintroduces the burden biometrics was meant to remove. What is needed instead is a public-key-only transformation that keeps the embedding matchable but irreversible — so that the protected template can be released without leaking the source biometric. My work on the verification side combines real-valued error-correcting codes with hash functions to do exactly this: IronMask (CVPR 2021), Deep Face Template Protection in the Wild (Pattern Recognition 2025), and SilverMask (IEEE Access 2026), the last introducing a fine-grained noise-correction mechanism.
In identification (1:N search) the operational picture is different. At airport-scale or building-access-control scale, a single long-lived secret key held by the deciding party is a reasonable assumption — the convenience constraint is on the user side, not the operator. That assumption opens the door to homomorphic encryption: the server can match a probe against millions of enrolled templates without ever decrypting them. The catch is cost — a naïve homomorphic distance over every template is far too slow in practice. IDFace (ICCV 2025) closes that gap with an efficient encoding paired with an almost-isometric transformation, pushing large-scale homomorphic identification into a practical regime.
Cutting across both is the question of which transformations are actually irreversible. Many published BTP schemes — locality- sensitive hashing, random projection, learned transformations — turn out to be invertible under modern decoders. IEEE TDSC 2025 gives a unified account of when LSH-based protections leak the underlying template, and when they do not.
Many adjacent problems reduce to the same primitive: securely computing cosine similarity between high-dimensional vectors. Privacy-preserving retrieval for RAG, fuzzy private set intersection, and other high-dimensional similarity-search settings all share the geometry studied above and connect naturally to this line of work.
Empirical robustness is fragile — a recognizer that resists today's attacks can fall to tomorrow's. Certified robustness sidesteps the arms race by giving a mathematical guarantee: no adversarial perturbation within a specified radius can flip the decision. Adapting this framework to recognition is non-trivial: the standard randomized-smoothing toolbox assumes classification, while recognition lives on a metric space.
Our work at ECCV 2024 and IEEE T-BIOM 2025 studies how certified-robustness analyses transfer to recognition geometry, identifies where the assumptions break, and proposes targeted fixes.