Super Animal 🐭 models are getting into the DLC Model Zoo 🐁🐘🐿🔥

4 min readMar 23, 2023

Animal pose estimation is often a critical step in the analysis of behavior, and although DeepLabCut has lead the charge to make this as easy as possible for users to build their own customized networks, there is globally a lot of redundancy. Therefore, we decided to develop the SuperAnimal method to build plug-and-play deep learning models that can be used immediately on common animals.

Practically, this means that for many users No additional model training 🏋️‍♀️ is required. If you do want to fine-tune the model on your data, it required 10X less data and is 2X better than the original DeepLabCut. Yep, it’s pretty awesome.

Read 📖 the pre-print HERE (1)!

✨ We also put SuperAnimal models into the DeepLabCut Model Zoo!
Test the models on Google Colab ∞ or HuggingFace 🤗

The SuperAnimal Method and contributions include 🛠 :

We propose panoptic pose estimation to merge and train diverse, differently labeled datasets.
With our SuperAnimal method we make two broad pose models, that cover over 45 species of mammals, for 27–39 keypoints.
We show excellent zero-shot performance (i.e., no additional training, tested on new data).
Our SuperAnimal method outperforms ImageNet-pretraining (the current best standard in the field) on three benchmarks.
If fine-tuning is required, our models are over 10$\times$ more data efficient for a 2$\times$ boost in performance.
We developed an optimal keypoint matching algorithm to automatically align out-of-distribution datasets with our models.
We developed a rapid, unsupervised video-adaptation method that allows users to fine-tune models without any data labeling.
To minimize domain shifts, we developed a spatial-pyramid search method to account for changes in video input size, and pseudo-labeling to minimize temporal jitter in videos.
We also show that new transformers (AnimalTokenPose), trained with the SuperAnimal method, outperform state-of-the-art convolutional neural networks.

Models and Datasets

The DLC team collected in-house and publicly available datasets to build the SuperAnimal models. In sum 🧮 we constructed two super datasets, namely:

1️⃣ TopViewMouse-5k 🐭 that contains around 5 thousand top-view mouse images. It has 27 key-points 🔑.

2️⃣ Quadruped-40K 🦬 which consists of 40 thousand side-view images of various animals with four legs. It has 39 key-points 🔑.

A glimpse 👀 into the Quadruped-40k super dataset

We made two SuperAnimal models which cover over 45 mammalian species for 27 to 39 key-points 🔑. These models are trained in the TensorFlow or PyTorch 🔥 framework and currently support single-animal 🦓 inference. Shortly 🚀, we plan to expand the framework and cover multi-animal scenarios.

✨ More Data — More Models — More Results

➡️ You can help us improve our SuperAnimal models by sharing your data with us! See Contrib.deeplabcut.org or please get in touch 🙏!Please get in touch 🙏!!

The project started in early 2020 by Mackenzie & Alexander Mathis, when we all needed to work from home. Building models for dogs 🐕 and cats 🐈 was both fun and a bit relaxing, but it bloomed into something much more significant. Now it includes an amazing team of software engineers, PhD students, masters students and research assistants 🌸 at EPFL:

Shaokai Ye, Anastasiia Filippova, Jessy Lauer, Maxime Vidal, Steffen Schneider, Tian Qiu, Alexander Mathis, Mackenzie Weygandt Mathis.

From lead author, Shaokai Ye:

Fun facts: Recent innovations in the project ✨ are partly inspired by the Foundation Models paradigm, which is making a big impact in AI.

What is a foundation model? Foundation models represent a general class of models trained on vast, rich data which can be used to complete a broad range of downstream tasks, typically with fine-tuning. Their basis is deep neural networks and self-supervised learning📚. Lately, their development has been growing rapidly 📈. To learn more about the opportunities 👏 and risks ⚠️ of foundation models, check out this thorough report HERE.

An overview of Foundation Models as it was conceptualised in (2)

Two illustrative examples are DALL-E and GPT3. For instance, GPT3 is an autoregressive language model developed by OpenAI with 175 million parameters that achieve robust performance in many natural language processing datasets such as translation, question-answering, and file-in-the-blank tasks like the ones requiring reasoning. More importantly, GPT3 can be applied to many of these tasks without fine-tuning (3). You can check out 🧐 both GPT3 and DALL-E HERE.

References 📚

Ye, S. et al. (2023), SuperAnimal models pretrained for plug-and-play analysis of animal behavior, arXiv.org. Available at: https://arxiv.org/abs/2203.07436 .
Bommasani, R. et al. (2022) On the opportunities and risks of Foundation models, arXiv.org. Available at: https://arxiv.org/abs/2108.07258 .
Brown, T.B. et al. (2020) Language models are few-shot learners, arXiv.org. Available at: https://arxiv.org/abs/2005.14165 .

Super Animal 🐭 models are getting into the DLC Model Zoo 🐁🐘🐿🔥

Written by DeepLabCut Blog