📈 T-ECD: The Largest Open Source E-commerce Cross-Domain Dataset
T-Tech researchers unveiled the T-ECD dataset, which contains information on 135 billion interactions involving 44 million users.
The dataset is fully synthetic—artificially generated based on real data. This enables the research and development of production-grade recommender systems, all while maintaining complete anonymity.
T-ECD is designed as a comprehensive and flexible benchmark, supporting a wide range of recommender system models and paradigms, including next-item, next-basket, session-based, and top-N recommendation tasks
🍴 Dataset includes data on marketplace and retail services, offers, reviews, and payments. Many users, items, and brands appear across multiple domains.
Due to this cross-domain nature, an AI model trained on the T-ECD can predict whether a user will leave a product review and personalize recommendations based on their purchasing history, spending patterns, and interests. It also makes it easier to launch new ecosystem products without "stitching" together separate datasets.
⚙ Besides the main dataset, the team shipped a small version with 1 billion interactions. You can test core hypotheses on it without spending resources processing the whole corpus.
➡️ Both sets are available on Hugging Face.


