Itamar Zimerman

I am a PhD candidate in the Blavatnik School of Computer Science at Tel Aviv University, advised by Prof. Lior Wolf. Additionally, I work as an AI research scientist at IBM Research.

Email  /  Scholar  /  Twitter  /  Linkedin  

profile photo

Research

My research focuses on modern deep learning architectures, with a particular emphasis on improving their capabilities in language modeling, computer vision, and crypto applications. I am passionate about developing architectures and understanding their design principles, especially transformers, state-space layers and modern RNNs.

Selected Publications

Representative papers are highlighted. '*' indicates equal contribution.
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
A. Ben-Kish, I. Zimerman, S. Abu-Hussein, N. Cohen, A. Globerson, L. Wolf, R. Giryes

We investigate the length-extrapolation capabilities of Mamba models, revealing their limited effective receptive field. To address these limitations, we introduce DeciMamba, the first context-extension method specifically designed for Mamba. Empirical experiments on real-world long-range NLP tasks demonstrate that DeciMamba can extrapolate to context lengths up to 25 times longer than those encountered during training.

ICLR, 2025
Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation
I. Zimerman*, A. Ali*, L. Wolf

We introduce a unified representation of attention-free layers, including Mamba, RWKV, RetNet, and Griffin, using an implicit self-attention formula. Our approach is more comprehensive and precise than previously proposed frameworks, and it significantly enhances the interpretability and explainability of these models.

ICLR, 2025
Viewing Transformers Through the Lens of Long Convolutions Layers
I. Zimerman, L. Wolf

We investigate the design principles that enable long-range layers, such as state-space models and Hyena, to outperform transformers on long-range tasks.

ICML, 2024
Converting Transformers to Polynomial Form for Secure Inference Over Homomorphic Encryption
I. Zimerman, M. Baruch, N. Drucker, G. Ezov, O. Soceanu, L. Wolf

We introduce the first polynomial transformer, which is based on a novel alternative to softmax and layer normalization. This breakthrough enables the application of transformers on encrypted data for the first time.

ICML, 2024
The Hidden Attention of Mamba Models
A. Ali*, I. Zimerman*, L. Wolf

We establish the connection between S6 layers and causal self-attention, demonstrating that both rely on a data-control linear operator, which can be represented through attention matrices. Using these matrices, we introduce the first explainability tools for Mamba models.

arXiv, 2024
A 2-Dimensional State Space Layer for Spatial Inductive Bias
E. Baron*, I. Zimerman*, L. Wolf

2D-SSM is a spatial layer that extends S4 into multi-axis data and is parametrized by 2-D recursion. Empirically, it enhances the performance of various ViTs and ConvNeXt backbones. Theoretically, our layer is more expressive than previous extensions.

ICLR, 2024
Multi-Dimensional Hyena for Spatial Inductive Bias
I. Zimerman, L. Wolf

We present Hyena N-D, a theoretically grounded extension of the Hyena layer to multi-axis sequence modeling. We show that when incorporated into a ViT instead of an attention mechanism, it results in a data- and memory-efficient model.

AISTATS, 2024
Focus Your Attention (with Adaptive IIR Filters)
S. Lutati, I. Zimerman, L. Wolf

Focus is a new architecture for LMs built on adaptive IIR filters. We show that second-order IIR filters are theoretically more expressive than state-space models, yet still stable. We boost the performance of these filters by making them adaptive, with the first global-convolution-based hypernetwork.

EMNLP, 2023 (Oral presentation)
Decision S4: Efficient Sequence-Based RL via State Spaces Layers
S. Bar-David*, I. Zimerman*, E. Nachmani, L. Wolf

We explore the application of the state-space layers in offline and online RL, highlighting its superiority in handling long-range dependencies and its improved efficiency over transformers.

ICLR, 2023
Recovering AES Keys with a Deep Cold Boot Attack
I. Zimerman*, E. Nachmani*, L. Wolf

Here we present a DL-based side-channel attack on AES keys. Our method combines a SAT solver with neural tools from the field of error correction coding to enhance attack effectiveness.

ICML, 2021

This page design is based on a template by Jon Barron.