Super Animal ๐Ÿญ models are getting into the DLC Model Zoo ๐Ÿ๐Ÿ˜๐Ÿฟ๐Ÿ”ฅ

DeepLabCut Blog
4 min readMar 23, 2023

--

Animal pose estimation is often a critical step in the analysis of behavior, and although DeepLabCut has lead the charge to make this as easy as possible for users to build their own customized networks, there is globally a lot of redundancy. Therefore, we decided to develop the SuperAnimal method to build plug-and-play deep learning models that can be used immediately on common animals.

Practically, this means that for many users No additional model training ๐Ÿ‹๏ธโ€โ™€๏ธ is required. If you do want to fine-tune the model on your data, it required 10X less data and is 2X better than the original DeepLabCut. Yep, itโ€™s pretty awesome.

Read ๐Ÿ“– the pre-print HERE (1)!

โœจ We also put SuperAnimal models into the DeepLabCut Model Zoo!
Test the models on Google Colab โˆž or HuggingFace ๐Ÿค—

The SuperAnimal Method and contributions include ๐Ÿ›  :

  • We propose panoptic pose estimation to merge and train diverse, differently labeled datasets.
  • With our SuperAnimal method we make two broad pose models, that cover over 45 species of mammals, for 27โ€“39 keypoints.
  • We show excellent zero-shot performance (i.e., no additional training, tested on new data).
  • Our SuperAnimal method outperforms ImageNet-pretraining (the current best standard in the field) on three benchmarks.
  • If fine-tuning is required, our models are over 10$\times$ more data efficient for a 2$\times$ boost in performance.
  • We developed an optimal keypoint matching algorithm to automatically align out-of-distribution datasets with our models.
  • We developed a rapid, unsupervised video-adaptation method that allows users to fine-tune models without any data labeling.
  • To minimize domain shifts, we developed a spatial-pyramid search method to account for changes in video input size, and pseudo-labeling to minimize temporal jitter in videos.
  • We also show that new transformers (AnimalTokenPose), trained with the SuperAnimal method, outperform state-of-the-art convolutional neural networks.

Models and Datasets

The DLC team collected in-house and publicly available datasets to build the SuperAnimal models. In sum ๐Ÿงฎ we constructed two super datasets, namely:

1๏ธโƒฃ TopViewMouse-5k ๐Ÿญ that contains around 5 thousand top-view mouse images. It has 27 key-points ๐Ÿ”‘.

2๏ธโƒฃ Quadruped-40K ๐Ÿฆฌ which consists of 40 thousand side-view images of various animals with four legs. It has 39 key-points ๐Ÿ”‘.

A glimpse ๐Ÿ‘€ into the Quadruped-40k super dataset

We made two SuperAnimal models which cover over 45 mammalian species for 27 to 39 key-points ๐Ÿ”‘. These models are trained in the TensorFlow or PyTorch ๐Ÿ”ฅ framework and currently support single-animal ๐Ÿฆ“ inference. Shortly ๐Ÿš€, we plan to expand the framework and cover multi-animal scenarios.

โœจ More Data โ€” More Models โ€” More Results

โžก๏ธ You can help us improve our SuperAnimal models by sharing your data with us! See Contrib.deeplabcut.org or please get in touch ๐Ÿ™!Please get in touch ๐Ÿ™!!

The project started in early 2020 by Mackenzie & Alexander Mathis, when we all needed to work from home. Building models for dogs ๐Ÿ• and cats ๐Ÿˆ was both fun and a bit relaxing, but it bloomed into something much more significant. Now it includes an amazing team of software engineers, PhD students, masters students and research assistants ๐ŸŒธ at EPFL:

Shaokai Ye, Anastasiia Filippova, Jessy Lauer, Maxime Vidal, Steffen Schneider, Tian Qiu, Alexander Mathis, Mackenzie Weygandt Mathis.

  • From lead author, Shaokai Ye:

Fun facts: Recent innovations in the project โœจ are partly inspired by the Foundation Models paradigm, which is making a big impact in AI.

What is a foundation model? Foundation models represent a general class of models trained on vast, rich data which can be used to complete a broad range of downstream tasks, typically with fine-tuning. Their basis is deep neural networks and self-supervised learning๐Ÿ“š. Lately, their development has been growing rapidly ๐Ÿ“ˆ. To learn more about the opportunities ๐Ÿ‘ and risks โš ๏ธ of foundation models, check out this thorough report HERE.

An overview of Foundation Models as it was conceptualised in (2)

Two illustrative examples are DALL-E and GPT3. For instance, GPT3 is an autoregressive language model developed by OpenAI with 175 million parameters that achieve robust performance in many natural language processing datasets such as translation, question-answering, and file-in-the-blank tasks like the ones requiring reasoning. More importantly, GPT3 can be applied to many of these tasks without fine-tuning (3). You can check out ๐Ÿง both GPT3 and DALL-E HERE.

References ๐Ÿ“š

  1. Ye, S. et al. (2023), SuperAnimal models pretrained for plug-and-play analysis of animal behavior, arXiv.org. Available at: https://arxiv.org/abs/2203.07436 .
  2. Bommasani, R. et al. (2022) On the opportunities and risks of Foundation models, arXiv.org. Available at: https://arxiv.org/abs/2108.07258 .
  3. Brown, T.B. et al. (2020) Language models are few-shot learners, arXiv.org. Available at: https://arxiv.org/abs/2005.14165 .

--

--

DeepLabCut Blog

bringing you top performing markerless pose estimation for animals: deeplabcut.org