Fake Voice Detection Service

VoiceGuard Analytics — Fake Voice Detection Service

Detect Fake Voices
with Scientific Precision.

BR SYSTEMS' VoiceGuard Analytics combines machine learning and deep learning in a multi-stage approach to detect AI-synthesized voices with high accuracy. Simply send us your audio files by email to receive a detailed analysis report.

1.000

AUC (Japanese)

0.3%

EER (Japanese)

4.5%

EER (English/RawNet2)

197

Feature Dimensions

Validated Speakers

Request Analysis View Technology

Background & Risk

Threats from AI Voice Synthesis

Advanced TTS systems such as XTTS v2 and VALL-E enable anyone to create highly realistic synthetic voices that convincingly imitate real humans.

Voice Impersonation

Attacks using synthesized voices of specific individuals to bypass identity verification and authentication systems are becoming a real threat.

Audio Deepfakes

Fake audio of politicians and public figures is spreading, causing serious damage to social credibility and reputation.

Phone & Business Fraud

Fraudulent calls impersonating family members or supervisors are increasing, making it difficult to distinguish them from genuine voices.

Content Authenticity

Verifying the authenticity of interviews, testimonies, and recordings has become increasingly difficult, undermining legal and journalistic trust.

Services

Two Analysis Services

Choose between a universal model and a speaker-specific model based on your use case and accuracy requirements.

Universal FVD

Universal Fake Voice Detection

No speaker registration required

No speaker information required — immediate analysis available. A general-purpose model supporting multiple speakers and TTS engines analyzes whether submitted audio is synthesized.

No speaker registration — same-day analysis available
197-dimensional acoustic features + GradientBoosting
Dual judgment with RawNet2 deep learning model
Detailed report including ROC-AUC, EER, and confidence scores
Batch analysis (multiple files at once) supported

Personalized FVD

Personalized Fake Voice Detection

Speaker-specific high-accuracy model

Pre-register voice samples of the target speaker to build a speaker-specific high-accuracy model. Particularly effective for impersonation detection.

Pre-register authentic voice samples (Real)
Achieve extremely high accuracy with speaker-specific model
Speaker verification via ECAPA-TDNN embeddings
Includes Threshold Analysis detailed report
Continuous model update option available

Process

Completed in 4 Steps

Simply send your audio files by email to receive a comprehensive analysis report.

Inquiry

Tell us about the audio to be analyzed, quantity, and purpose. We will provide a quote the same day.

Send Audio Files

Send the target audio files (WAV recommended) by email.

Multi-Stage Analysis

Precision analysis using 197-dimensional acoustic features and RawNet2 deep learning.

Report Delivery

Comprehensive report including ROC curves, AUC, feature importance, and judgment rationale.

Technology

Analysis Technology Overview

A multi-stage approach combining machine learning and deep learning achieves high-accuracy judgments.

Feature-Based Analysis

Extracts 197-dimensional acoustic features and classifies using GradientBoosting. Multi-dimensional feature engineering combining MFCC, LFCC, CQCC, Group Delay, and Mel statistics achieves high explainability.

Deep Learning Model (RawNet2 Official)

An end-to-end neural network that takes raw waveforms directly as input. The SincConv + Channel Attention + ResBlocks + GRU architecture learns subtle voice characteristics that feature-based approaches cannot capture. Occlusion Sensitivity visualization shows which time-frequency regions contributed to the judgment.

// ASVspoof 2019 LA Benchmark — EER Comparison

LFCC + GMM ASVspoof 2019 Official Baseline

8.0%

—

GradientBoosting + 197-dim BR-FVD Feature Engineering

13.4%

0.944

RawNet2 Official + SWA
          BR-FVD Deep Learning (English / ASVspoof 2019 LA)
4.5%
—
EN

GradientBoosting + 197-dim
          BR-FVD (Japanese / 7 speakers)
0.3%
1.000
JA

AASIST World State-of-the-Art (reference)

0.8%

—

Verified Performance

Technology Validated by World-Standard Benchmarks

Detection Accuracy — Proven by the Numbers

Our system has been rigorously validated against ASVspoof 2019 LA (71,237 samples), the world-standard evaluation framework for fake voice detection. Using the official RawNet2 implementation (Tak et al., ICASSP 2021), we achieved EER = 4.487%, min t-DCF = 0.12352, significantly surpassing the official baseline (LFCC+GMM EER≈8%) and delivering production-ready detection accuracy.

For Japanese speech, we have achieved ROC-AUC = 1.000 and EER = 0.3%, fully meeting commercial service standards. To further strengthen confidence in our technology, we are also pursuing independent third-party validation through submission to an international peer-reviewed journal.

1.000

ROC-AUC for Japanese speech
(7-speaker validation)

0.3%

EER for Japanese speech
(Equal Error Rate)

4.49%

EER for English (ASVspoof 2019 LA)
Exceeds world-standard baseline

Coverage

Types of Fake Voice Covered

Audio deepfakes are broadly classified into five categories. Below is an honest overview of what BR-FVD currently covers and our roadmap for future development.

✓

TTS (Text-to-Speech)

Supported

Synthesized voice generated by neural TTS systems such as XTTS v2, VALL-E, and StyleBERT. Validated on ASVspoof 2019 LA (A01–A19). EER = 4.487% achieved.

✓

VC (Voice Conversion)

Supported

Voice converted from speaker A to speaker B. Trained on ASVspoof VC attacks (A01, A02, A17–A19). VAE-based VC is the most challenging attack type in the benchmark.

△

Emotion Fake

Partial

Voice with artificially altered emotion or tone from the same speaker. Detectable via Jitter, Shimmer, and related features. Model enhancement with dedicated training data is under consideration.

△

Scene Fake

Partial

Voice with manipulated background noise, reverberation, or environmental audio. Detectable via Spectral Flatness and related features. Validation with dedicated datasets is a future goal.

◯

Partially Fake

Planned

Voice where only a portion of an utterance is replaced. Difficult to detect with an overall score — segment-level judgment is required. Implementation including Occlusion Sensitivity is under consideration.

◯

Telephone Channel Audio

Planned

Voice transmitted over PSTN or VoIP with bandwidth limiting and codec compression applied. Implementation is under consideration, including dedicated data augmentation and pre-processing.

Deliverables

Analysis Report Contents

We provide detailed reports that scientifically visualize the basis for judgments, not just a simple yes/no answer.

ROC Curve & AUC

Receiver Operating Characteristic curve. Quantifies model discrimination performance using AUC and EER.

Score Distribution

Visualization of Real/Synthetic score distributions. Intuitively shows the degree of separation between the two classes.

Feature Importance

Importance ranking of 197-dimensional features. Shows which acoustic features served as the basis for the judgment.

Threshold Analysis

Detailed threshold-based classification. Includes individual indicator evaluations for Jitter, Shimmer, Spectral features, etc.

RawNet2 Deep Score + Occlusion

Deep learning model synthetic probability score with Mel Spectrogram x Occlusion Sensitivity visualization in 4 panels, showing which time-frequency regions contributed to the judgment.

Summary CSV

A CSV report listing synthetic probability scores, predicted labels, and confidence levels for each file.

Pricing

Simple Pricing

We offer flexible plans tailored to the number of files and use case. Please feel free to contact us for a consultation.

Spot Analysis

Inquiry

For one-time audio file verification. Ideal for a small number of files.

Universal FVD analysis
ROC & Score Distribution report
Summary CSV
Delivery: within 3 business days

Standard

Inquiry

For regular audio audits and media use. Includes batch analysis and detailed reports.

Universal FVD batch analysis
RawNet2 deep learning judgment
Full report set (6 types)
Feature Importance analysis
Delivery: within 5 business days

Personalized

Inquiry

Speaker-specific analysis. From voice sample submission to individual model construction.

Speaker-specific model construction
Personalized FVD analysis
Full report set (6 types)
Continuous model update option
Delivery: upon consultation

FAQ

Frequently Asked Questions

What kind of audio files should I send?

We recommend WAV format (44,100 Hz, mono, 16-bit). MP3 and m4a are also supported, but conversion processing is required. Audio length should be at least 1.5 seconds, with 3–8 seconds of natural speech recommended.

Which service should I choose — Universal FVD or Personalized FVD?

We recommend Universal FVD for verifying audio from unknown or anonymous speakers, and Personalized FVD for impersonation detection or voice protection of specific individuals. Please feel free to contact us if you are unsure.

How accurate is the detection?

For Japanese speech, we have achieved ROC-AUC=1.000 and EER=0.3%. For English (ASVspoof 2019 LA evaluation set, 71,237 samples), our official RawNet2 implementation achieved EER=4.487% and min t-DCF=0.12352, significantly outperforming the world-standard baseline (EER≈8%). Please note that accuracy may vary for unknown TTS engines or post-processed audio.

How is the confidentiality of submitted audio data protected?

Audio data submitted for analysis will be securely deleted upon completion. We also support NDA (Non-Disclosure Agreement) signing for highly confidential data.

When will the online service launch?

We are currently building a web application server. We plan to launch an online file upload and instant analysis service in the near future. We will announce the launch on this page when it is ready.

Contact Us

For service inquiries or quote requests, please feel free to reach out. We typically respond within one business day.

info@brsystems.jp

Voice Forensic Analysis

Fake Voice Detection Service

Detect Fake Voiceswith Scientific Precision.

Threats from AI Voice Synthesis

Voice Impersonation

Audio Deepfakes

Phone & Business Fraud

Content Authenticity

Two Analysis Services

Universal Fake Voice Detection

Personalized Fake Voice Detection

Completed in 4 Steps

Inquiry

Send Audio Files

Multi-Stage Analysis

Report Delivery

Analysis Technology Overview

Feature-Based Analysis

Deep Learning Model (RawNet2 Official)

Technology Validated by World-Standard Benchmarks

Detection Accuracy — Proven by the Numbers

Types of Fake Voice Covered

TTS (Text-to-Speech)

VC (Voice Conversion)

Emotion Fake

Scene Fake

Partially Fake

Telephone Channel Audio

Analysis Report Contents

ROC Curve & AUC

Score Distribution

Feature Importance

Threshold Analysis

RawNet2 Deep Score + Occlusion

Summary CSV

Simple Pricing

Frequently Asked Questions

Contact Us

Detect Fake Voices
with Scientific Precision.