Fake Voice Detection Service
Detect Fake Voices
with Scientific Precision.
BR SYSTEMS' VoiceGuard Analytics combines machine learning and deep learning in a multi-stage approach to detect AI-synthesized voices with high accuracy. Simply send us your audio files by email to receive a detailed analysis report.
Threats from AI Voice Synthesis
Advanced TTS systems such as XTTS v2 and VALL-E enable anyone to create highly realistic synthetic voices that convincingly imitate real humans.
Voice Impersonation
Attacks using synthesized voices of specific individuals to bypass identity verification and authentication systems are becoming a real threat.
Audio Deepfakes
Fake audio of politicians and public figures is spreading, causing serious damage to social credibility and reputation.
Phone & Business Fraud
Fraudulent calls impersonating family members or supervisors are increasing, making it difficult to distinguish them from genuine voices.
Content Authenticity
Verifying the authenticity of interviews, testimonies, and recordings has become increasingly difficult, undermining legal and journalistic trust.
Two Analysis Services
Choose between a universal model and a speaker-specific model based on your use case and accuracy requirements.
Universal Fake Voice Detection
No speaker information required — immediate analysis available. A general-purpose model supporting multiple speakers and TTS engines analyzes whether submitted audio is synthesized.
- No speaker registration — same-day analysis available
- 197-dimensional acoustic features + GradientBoosting
- Dual judgment with RawNet2 deep learning model
- Detailed report including ROC-AUC, EER, and confidence scores
- Batch analysis (multiple files at once) supported
Personalized Fake Voice Detection
Pre-register voice samples of the target speaker to build a speaker-specific high-accuracy model. Particularly effective for impersonation detection.
- Pre-register authentic voice samples (Real)
- Achieve extremely high accuracy with speaker-specific model
- Speaker verification via ECAPA-TDNN embeddings
- Includes Threshold Analysis detailed report
- Continuous model update option available
Completed in 4 Steps
Simply send your audio files by email to receive a comprehensive analysis report.
Inquiry
Tell us about the audio to be analyzed, quantity, and purpose. We will provide a quote the same day.
Send Audio Files
Send the target audio files (WAV recommended) by email.
Multi-Stage Analysis
Precision analysis using 197-dimensional acoustic features and RawNet2 deep learning.
Report Delivery
Comprehensive report including ROC curves, AUC, feature importance, and judgment rationale.
Analysis Technology Overview
A multi-stage approach combining machine learning and deep learning achieves high-accuracy judgments.
Feature-Based Analysis
Extracts 197-dimensional acoustic features and classifies using GradientBoosting. Multi-dimensional feature engineering combining MFCC, LFCC, CQCC, Group Delay, and Mel statistics achieves high explainability.
Deep Learning Model (RawNet2 Official)
An end-to-end neural network that takes raw waveforms directly as input. The SincConv + Channel Attention + ResBlocks + GRU architecture learns subtle voice characteristics that feature-based approaches cannot capture. Occlusion Sensitivity visualization shows which time-frequency regions contributed to the judgment.
Technology Validated by World-Standard Benchmarks
Detection Accuracy — Proven by the Numbers
Our system has been rigorously validated against
ASVspoof 2019 LA (71,237 samples),
the world-standard evaluation framework for fake voice detection.
Using the official RawNet2 implementation (Tak et al., ICASSP 2021),
we achieved EER = 4.487%, min t-DCF = 0.12352,
significantly surpassing the official baseline (LFCC+GMM EER≈8%)
and delivering production-ready detection accuracy.
For Japanese speech, we have achieved
ROC-AUC = 1.000 and EER = 0.3%,
fully meeting commercial service standards.
To further strengthen confidence in our technology,
we are also pursuing independent third-party validation
through submission to an international peer-reviewed journal.
(7-speaker validation)
(Equal Error Rate)
Exceeds world-standard baseline
Types of Fake Voice Covered
Audio deepfakes are broadly classified into five categories. Below is an honest overview of what BR-FVD currently covers and our roadmap for future development.
TTS (Text-to-Speech)
SupportedSynthesized voice generated by neural TTS systems such as XTTS v2, VALL-E, and StyleBERT. Validated on ASVspoof 2019 LA (A01–A19). EER = 4.487% achieved.
VC (Voice Conversion)
SupportedVoice converted from speaker A to speaker B. Trained on ASVspoof VC attacks (A01, A02, A17–A19). VAE-based VC is the most challenging attack type in the benchmark.
Emotion Fake
PartialVoice with artificially altered emotion or tone from the same speaker. Detectable via Jitter, Shimmer, and related features. Model enhancement with dedicated training data is under consideration.
Scene Fake
PartialVoice with manipulated background noise, reverberation, or environmental audio. Detectable via Spectral Flatness and related features. Validation with dedicated datasets is a future goal.
Partially Fake
PlannedVoice where only a portion of an utterance is replaced. Difficult to detect with an overall score — segment-level judgment is required. Implementation including Occlusion Sensitivity is under consideration.
Telephone Channel Audio
PlannedVoice transmitted over PSTN or VoIP with bandwidth limiting and codec compression applied. Implementation is under consideration, including dedicated data augmentation and pre-processing.
Analysis Report Contents
We provide detailed reports that scientifically visualize the basis for judgments, not just a simple yes/no answer.
ROC Curve & AUC
Receiver Operating Characteristic curve. Quantifies model discrimination performance using AUC and EER.
Score Distribution
Visualization of Real/Synthetic score distributions. Intuitively shows the degree of separation between the two classes.
Feature Importance
Importance ranking of 197-dimensional features. Shows which acoustic features served as the basis for the judgment.
Threshold Analysis
Detailed threshold-based classification. Includes individual indicator evaluations for Jitter, Shimmer, Spectral features, etc.
RawNet2 Deep Score + Occlusion
Deep learning model synthetic probability score with Mel Spectrogram x Occlusion Sensitivity visualization in 4 panels, showing which time-frequency regions contributed to the judgment.
Summary CSV
A CSV report listing synthetic probability scores, predicted labels, and confidence levels for each file.
Simple Pricing
We offer flexible plans tailored to the number of files and use case. Please feel free to contact us for a consultation.
- Universal FVD analysis
- ROC & Score Distribution report
- Summary CSV
- Delivery: within 3 business days
- Universal FVD batch analysis
- RawNet2 deep learning judgment
- Full report set (6 types)
- Feature Importance analysis
- Delivery: within 5 business days
- Speaker-specific model construction
- Personalized FVD analysis
- Full report set (6 types)
- Continuous model update option
- Delivery: upon consultation
Frequently Asked Questions
Contact Us
For service inquiries or quote requests, please feel free to reach out. We typically respond within one business day.
info@brsystems.jp