PersonVision: Real-time Age, Gender, and Ethnicity Recognition System




By Next SolutionLab on 2024-10-23 01:05:39

object_detection_picture

 

Introduction

PersonVision is a real-time computer vision system designed to identify individuals and predict key demographic attributes such as age, gender, and ethnicity from real time person video. Leveraging state-of-the-art deep learning techniques, PersonVision provides accurate and efficient multi-task recognition, making it suitable for various applications like security systems, retail analytics, and personalized user interactions.

This project aims to make real-time demographic recognition accessible and fast by utilizing advanced neural network architectures and a well-curated dataset for training. It supports for both image and video input for prediction.

 

Objectives

The objective of PersonVision is to develop a robust, real-time facial recognition system that accurately predicts key demographic information, including age, gender, and ethnicity. By leveraging deep learning models trained on diverse datasets, the system aims to:

  • Identify persons with high accuracy.
  • Predict age with high accuracy, offering an estimate within a reasonable range.
  • Classify gender into distinct categories, such as male, and female.
  • Determine ethnicity by analyzing facial structures and features.
  • Provide real-time performance suitable for live video streams and interactive applications.

 

Benefits

  1. Real-time Analysis: It provides real-time age, gender, and ethnicity recognition, making it ideal for applications that require instant feedback, such as surveillance systems, retail analytics, or interactive user experiences.
  2. High Accuracy and Efficiency: The system delivers high accuracy in demographic predictions by using cutting-edge deep learning techniques and well-curated datasets, reducing the chances of misclassification while maintaining efficiency.
  3. Scalability: The system is designed to handle both image and video inputs, making it scalable for a variety of use cases, whether it’s processing large amounts of stored data or performing live predictions in dynamic environments.
  4. Versatile Use Cases: From security and surveillance (to identify individuals and analyze demographics) to retail and marketing (to understand customer demographics), It serves a broad range of sectors.
  5. Automation and Personalization It can enhance human-computer interaction by automatically adjusting to user demographics, and providing tailored experiences such as personalized advertisements or product recommendations based on age, gender, and ethnicity.

 

Directory Structure

 
PersonVision/
  │
  ├── assets                         
  ├── ckpts/                          # Checkpoint of our model
  │   ├── model.pt                    # Gender&age detector models checkpoint
  │   └── yolo11n.pt                  # Person_recognition models checkpoint
  ├── screenshots/                    # Directory for application screenshots
  │   ├── 1.png                       # Screenshot of the app interface
  │   ├── 2.png                       # Screenshot of single image prediction
  │   └── 3.png                       # Screenshot of live camera prediction
  │
  ├── network/                         # Directory for model files
  │   └── models                       # models for Gender&age detection
  ├── input/                           # Input videos
  │   └── video.mp4                    # example video
  ├── app.py                           # Streamlit application file
  ├── environment.yml                  # Environment file
  ├── main.py                          # Main file to start program
  └── README.md                        # Project descriptions and associate information

 

Environment Setup:

We have created a environment.yml file where all the necessary modules and packages are listed. For setting up the environment use

 
conda create env -f environment.yml
conda activate od
 

 

Model Architecture

We have used two pre-trained models for the PersonVision. The first is Person_recognition, which is the state-of-the-art model for object detection. The second is Gender&age detector which is a transformer-based model. We have used Yolo 11 and FaceXFormer as the backbone of two models. The High-level architecture of the models is as follows:

 

 

Dataset

The Yolo which is a state-of-the-art model for object detection is trained on Common Objects in Context (COCO) is a large-scale object detection, segmentation, and captioning dataset with 80 object categories. On the other hand, Gender&age models are trained on UTKFace [1], and FairFace [2].

 

Training the Model

We have used a pre-trained version of the two models. But it can be customized or finetune on the custom dataset. To finetune the first model i.e object detection model you can follow : here

 

Deployment

To make the trained model accessible, we develop a web application using Streamlit. The application will allow users to upload images of a person or Open live camera, and receive the person bounding box, along with age, gender, and race/ethnicity.

To run our UI type

 
streamlit run app.py
 

 

Streamlit Application Features

  • Image Upload: Users can upload images in formats like JPG, JPEG, or PNG.
  • Live Webcam: The user can access his/her webcam to predict the above information.
  • Real-Time Prediction: The application will display the above prediction information on the image/live video in real-time.
  • User-Friendly Interface: A simple and intuitive UI to enhance user experience.

Conclusion

The PersonVision App is a robust and interactive tool designed for real-time age, gender, and ethnicity detection using advanced computer vision techniques. By leveraging the power of YOLO for object detection and a deep learning model for demographic analysis, this application offers users a seamless experience in processing both uploaded media and live webcam feeds. We encourage users to explore the application, provide feedback, and consider its implications in their respective domains.

 

Limitations

At this time, when multiple people are present in the same frame it can't calculate all the characteristics such as gender, age, and race for numerous people. It will work only for a single person. In the future, we will solve the issue.

 

Screenshots

Our user interface 

 

Single image prediction

 

 

Live camera prediction

 

 

Citation

 

@article{narayan2024facexformer,
  title={FaceXFormer: A Unified Transformer for Facial Analysis},
  author={Narayan, Kartik and VS, Vibashan and Chellappa, Rama and Patel, Vishal M},
  journal={arXiv preprint arXiv:2403.12960},
  year={2024}
}
[1]. Zhang, Z., Song, Y., Qi, H.: Age progression/regression by conditional adversarial autoencoder. In: Proceedings of the IEEE conference on computer vision and
pattern recognition. pp. 5810–5818 (2017)
​
[2]. Karkkainen, K., Joo, J.: Fairface: Face attribute dataset for balanced race, gender,
and age for bias measurement and mitigation. In: Proceedings of the IEEE/CVF
Winter Conference on Applications of Computer Vision. pp. 1548–1558 (2021)

 

Let us know your interest

At Next Solution Lab, we are dedicated to transforming experiences through innovative solutions. If you are interested in learning more about how our projects can benefit your organization.

Contact Us