From vision to reality, explore our blog and articles. Contact us to turn your ideas into success.
Contact us.
By Next SolutionLab on 2024-10-23 01:05:39
PersonVision is a real-time computer vision system designed to identify individuals and predict key demographic attributes such as age, gender, and ethnicity from real time person video. Leveraging state-of-the-art deep learning techniques, PersonVision provides accurate and efficient multi-task recognition, making it suitable for various applications like security systems, retail analytics, and personalized user interactions.
This project aims to make real-time demographic recognition accessible and fast by utilizing advanced neural network architectures and a well-curated dataset for training. It supports for both image and video input for prediction.
The objective of PersonVision is to develop a robust, real-time facial recognition system that accurately predicts key demographic information, including age, gender, and ethnicity. By leveraging deep learning models trained on diverse datasets, the system aims to:
PersonVision/
│
├── assets
├── ckpts/ # Checkpoint of our model
│ ├── model.pt # Gender&age detector models checkpoint
│ └── yolo11n.pt # Person_recognition models checkpoint
├── screenshots/ # Directory for application screenshots
│ ├── 1.png # Screenshot of the app interface
│ ├── 2.png # Screenshot of single image prediction
│ └── 3.png # Screenshot of live camera prediction
│
├── network/ # Directory for model files
│ └── models # models for Gender&age detection
├── input/ # Input videos
│ └── video.mp4 # example video
├── app.py # Streamlit application file
├── environment.yml # Environment file
├── main.py # Main file to start program
└── README.md # Project descriptions and associate information
We have created a environment.yml file where all the necessary modules and packages are listed. For setting up the environment use
conda create env -f environment.yml
conda activate od
We have used two pre-trained models for the PersonVision. The first is Person_recognition, which is the state-of-the-art model for object detection. The second is Gender&age detector which is a transformer-based model. We have used Yolo 11 and FaceXFormer as the backbone of two models. The High-level architecture of the models is as follows:
The Yolo which is a state-of-the-art model for object detection is trained on Common Objects in Context (COCO) is a large-scale object detection, segmentation, and captioning dataset with 80 object categories. On the other hand, Gender&age models are trained on UTKFace [1], and FairFace [2].
We have used a pre-trained version of the two models. But it can be customized or finetune on the custom dataset. To finetune the first model i.e object detection model you can follow :
To make the trained model accessible, we develop a web application using Streamlit. The application will allow users to upload images of a person or Open live camera, and receive the person bounding box, along with age, gender, and race/ethnicity.
To run our UI type
streamlit run app.py
The PersonVision App is a robust and interactive tool designed for real-time age, gender, and ethnicity detection using advanced computer vision techniques. By leveraging the power of YOLO for object detection and a deep learning model for demographic analysis, this application offers users a seamless experience in processing both uploaded media and live webcam feeds. We encourage users to explore the application, provide feedback, and consider its implications in their respective domains.
At this time, when multiple people are present in the same frame it can't calculate all the characteristics such as gender, age, and race for numerous people. It will work only for a single person. In the future, we will solve the issue.
Our user interface
Single image prediction
Live camera prediction
@article{narayan2024facexformer,
title={FaceXFormer: A Unified Transformer for Facial Analysis},
author={Narayan, Kartik and VS, Vibashan and Chellappa, Rama and Patel, Vishal M},
journal={arXiv preprint arXiv:2403.12960},
year={2024}
}
[1]. Zhang, Z., Song, Y., Qi, H.: Age progression/regression by conditional adversarial autoencoder. In: Proceedings of the IEEE conference on computer vision and
pattern recognition. pp. 5810–5818 (2017)
[2]. Karkkainen, K., Joo, J.: Fairface: Face attribute dataset for balanced race, gender,
and age for bias measurement and mitigation. In: Proceedings of the IEEE/CVF
Winter Conference on Applications of Computer Vision. pp. 1548–1558 (2021)
At Next Solution Lab, we are dedicated to transforming experiences through innovative solutions. If you are interested in learning more about how our projects can benefit your organization.
Contact Us