Catch The AI is our graduation project, it is an Intelligent Sytem to detect AI-generated content with our advanced models. Our deep learning technology distinguishes between AI and human-authored media in images and text, we deployed our model in a web application to make it easy to use for everyone. you can visit the website Catch The AI
full project in details in the documentation
✔️ Detect AI-generated content in images, text, and audio.
✔️ User-friendly web application.
✔️ Full User authentication and authorization system.
✔️ detecting history for each user.
✔️ Admin panel to manage users and their data.
✔️ Supscription system to get more features (coming soon).
Here's a concise summary for your README:
This project utilizes a range of datasets to train and test the AI detection models. The datasets are categorized by the type of media they contain, including images, audio, and text. The datasets are sourced from various repositories and research projects, providing a diverse and comprehensive collection of AI-generated content for model training and evaluation.
- Fake-or-Real Dataset (FoR): Baseline detection with genuine and fake audio samples.
- Scenefake Dataset: Diverse deepfake audio clips from various techniques.
- In the Wild Dataset: Real and fake audio from diverse internet sources.
- ASVspoof 2019 Dataset: Authentic and spoofed audio for ASV challenges.
- ASVspoof 2021 Dataset: Updated spoofed audio reflecting advancements in deepfake technology.
- 140k Real and Fake Faces: 70,000 real faces from Flickr and 70,000 StyleGAN-generated faces, resized to 256x256.
- CelebA-HQ (256x256): 30,000 high-quality celebrity faces for model training.
- Synthetic Faces High Quality (SFHQ) Part 2: 91,361 curated faces at 1024x1024, enhanced by StyleGAN2.
- Face Dataset Using Stable Diffusion v1.4: Real and fake faces, resized to 256x256, using Stable Diffusion models.
- Stable Diffusion Face Dataset: AI-generated faces at 512x512, 768x768, and 1024x1024 resolutions using Stable Diffusion checkpoints.
- Synthetic Faces High Quality (SFHQ) Part 3: 118,358 faces at 1024x1024, generated by StyleGAN2 with advanced techniques.
- Synthetic Human Faces for 3D Reconstruction: High-quality 512x512 faces generated using the EG3D model for 3D reconstruction.
- LLM Generated Essays for the Detect AI Comp: 700 essays, including 500 generated with GPT-3.5-turbo and 200 with GPT-4.
- DAIGT Data - Llama 70b and Falcon 180b:
- Llama Falcon v3: 7,000 LLM-generated essays.
- Llama 70b v2: 1,172 LLM-generated essays.
- Llama 70b v1: 1,172 LLM-generated essays.
- Falcon 180b v1: 1,055 LLM-generated essays.
- Persuade Corpus 2.0: Over 25,000 argumentative essays by U.S. students in grades 6-12.
- DAIGT External Dataset: 2,421 student-generated texts and 2,421 AI-generated texts for balanced training data.
- BERT: Achieved 90% accuracy but showed signs of overfitting on smaller datasets.
- RoBERTa: Outperformed BERT with 99% accuracy and demonstrated better generalization.
- DeBERTa: Achieved the highest accuracy of 99%, showing superior handling of complex text patterns.
- Wav2Vec2: Excelled with a word error rate of 7% and robust anomaly detection.
- Mel-spectrogram + CNN: Delivered reasonable accuracy but was less effective in detecting subtle anomalies compared to Wav2Vec2.
- ResNet-based Model: Provided good results but was more computationally intensive.
- EfficientNet: Balanced accuracy and computational efficiency, achieving 99% accuracy.
- ResNet: Reached 99% accuracy but required more computational resources.
- Xception: Offered detailed feature extraction but was less efficient compared to EfficientNet.
- Ensemble of RoBERTa and DeBERTa: Combines the outputs of both models and integrates them through a final linear layer to enhance overall classification performance.
- Architecture:
- RoBERTa Output: Captures robust language patterns.
- DeBERTa Output: Provides nuanced language understanding.
- Final Linear Layer: Integrates the concatenated outputs to improve classification.
🔗 For more details, about text models please check the DAIGT-Catch-the-AI
- Architecture:
- Wav2Vec2: Selected for its state-of-the-art performance in audio anomaly detection.
- EfficientNet: Chosen for its efficiency and high accuracy in distinguishing real from AI-generated images.
The ensemble model for text was validated on additional test datasets, confirming its robustness and ability to generalize across various scenarios. This ensemble approach demonstrated significant improvements over individual models, providing a more comprehensive understanding and classification of text inputs.
first clone the repo
# Clone this project
$ git clone https://github.com/romanyn36/Catch-The-AI.git
# go to the project folder
$ cd Catch-The-AI
# create a virtual environment
$ python -m venv myenv
# activate the virtual environment
$ source myenv/bin/activate # for linux
$ myenv\Scripts\activate # for windows
# install the requirements
$ pip install -r requirements.txt
# great! now you can run the project
# now to run the Django server
# go to the Backend Django folder
$ cd Backend
# run the server
$ python manage.py runserver
# now to run the React app
# go to the Frontend React folder
$ cd ../Frontend
cd catch-the-ai
# install the dependencies
$ npm install
# run the app
$ npm start
the app will run on [http://localhost:3000/](http://localhost:3000/)
We did not just work as a team, but we were a family. These people are truly skilled and creative. Follow them and wait for their wonderful projects, from which they learn a lot and benefit a lot of people.❤️
work on Image detection model, and responsible for the backend and model deployment
you can contact team leader for any questions or help via the following links
This project is under license from Apache License 2.0. For more details, see the LICENSE file. any breach of the license will be prosecuted under the law.