.png&w=828&q=75)
This project explores the use of Joint Embedding Predictive Architectures (JEPA) for news recommendation on the MIND dataset, with a focus on learning user intent representations from sequential interaction data rather than relying heavily on supervised click prediction. The system is designed around the idea that recommendation is fundamentally a latent-state prediction problem: instead of directly predicting the next clicked article, the model attempts to predict the future embedding space of a user’s interests from their historical behavior. The dataset used is Microsoft MIND, which contains news articles (news.tsv) and user interaction logs (behaviors.tsv). Article representations are built by encoding the concatenation of the article title and abstract using MiniLM-L6, producing dense semantic embeddings that are projected into a 128-dimensional space. These embeddings are augmented with lightweight metadata features such as article category information and entity-presence indicators extracted from Wikidata links. User histories are constructed by sorting interactions chronologically and forming fixed-length click sequences. Users with extremely sparse histories are filtered out to ensure meaningful sequential context. The core architecture consists of three components: a context encoder, a predictor, and a target encoder. The context encoder is implemented as a lightweight Transformer designed to process user click histories and generate latent user-state representations. The predictor network maps these representations into the target latent space, while the target encoder acts as an exponential moving average (EMA) version of the context encoder. Unlike traditional supervised recommenders, the target encoder is never updated directly through gradients. Evaluation is performed using the official MIND metrics: AUC, MRR, nDCG@5, and nDCG@10. Special emphasis is placed on cold-start users and long-tail content retrieval.
10 May 2026