VoiceGuard Analytics — Fake Voice Detection Service

Detect Fake Voices
with Scientific Precision.

Powered by Deep Learning & Acoustic Analysis

BR SYSTEMS' VoiceGuard Analytics combines machine learning and deep learning in a multi-stage approach to detect AI-synthesized voices with high accuracy. Simply send us your audio files by email to receive a detailed analysis report.

1.000
AUC (Japanese)
0.3%
EER (Japanese)
4.5%
EER (English/RawNet2)
197
Feature Dimensions
7+
Validated Speakers
Background & Risk

Threats from AI Voice Synthesis

Advanced TTS systems such as XTTS v2 and VALL-E enable anyone to create highly realistic synthetic voices that convincingly imitate real humans.

01

Voice Impersonation

Attacks using synthesized voices of specific individuals to bypass identity verification and authentication systems are becoming a real threat.

02

Audio Deepfakes

Fake audio of politicians and public figures is spreading, causing serious damage to social credibility and reputation.

03

Phone & Business Fraud

Fraudulent calls impersonating family members or supervisors are increasing, making it difficult to distinguish them from genuine voices.

04

Content Authenticity

Verifying the authenticity of interviews, testimonies, and recordings has become increasingly difficult, undermining legal and journalistic trust.

Services

Two Analysis Services

Choose between a universal model and a speaker-specific model based on your use case and accuracy requirements.

Universal FVD

Universal Fake Voice Detection

No speaker registration required

No speaker information required — immediate analysis available. A general-purpose model supporting multiple speakers and TTS engines analyzes whether submitted audio is synthesized.

  • No speaker registration — same-day analysis available
  • 197-dimensional acoustic features + GradientBoosting
  • Dual judgment with RawNet2 deep learning model
  • Detailed report including ROC-AUC, EER, and confidence scores
  • Batch analysis (multiple files at once) supported
Personalized FVD

Personalized Fake Voice Detection

Speaker-specific high-accuracy model

Pre-register voice samples of the target speaker to build a speaker-specific high-accuracy model. Particularly effective for impersonation detection.

  • Pre-register authentic voice samples (Real)
  • Achieve extremely high accuracy with speaker-specific model
  • Speaker verification via ECAPA-TDNN embeddings
  • Includes Threshold Analysis detailed report
  • Continuous model update option available
Process

Completed in 4 Steps

Simply send your audio files by email to receive a comprehensive analysis report.

1

Inquiry

Tell us about the audio to be analyzed, quantity, and purpose. We will provide a quote the same day.

2

Send Audio Files

Send the target audio files (WAV recommended) by email.

3

Multi-Stage Analysis

Precision analysis using 197-dimensional acoustic features and RawNet2 deep learning.

4

Report Delivery

Comprehensive report including ROC curves, AUC, feature importance, and judgment rationale.

Technology

Analysis Technology Overview

A multi-stage approach combining machine learning and deep learning achieves high-accuracy judgments.

Feature-Based Analysis

Extracts 197-dimensional acoustic features and classifies using GradientBoosting. Multi-dimensional feature engineering combining MFCC, LFCC, CQCC, Group Delay, and Mel statistics achieves high explainability.

MFCC 39-dim LFCC 60-dim CQCC 60-dim Group Delay Mel Stats Jitter / Shimmer GradientBoosting

Deep Learning Model (RawNet2 Official)

An end-to-end neural network that takes raw waveforms directly as input. The SincConv + Channel Attention + ResBlocks + GRU architecture learns subtle voice characteristics that feature-based approaches cannot capture. Occlusion Sensitivity visualization shows which time-frequency regions contributed to the judgment.

RawNet2 Official SincConv + Attention ResBlocks x6 GRU x3 SWA Occlusion Sensitivity CUDA / RTX 5070
// ASVspoof 2019 LA Benchmark — EER Comparison
Method
EER
AUC
Lang
LFCC + GMM ASVspoof 2019 Official Baseline
8.0%
EN
GradientBoosting + 197-dim BR-FVD Feature Engineering
13.4%
0.944
EN
RawNet2 Official + SWA BR-FVD Deep Learning (English / ASVspoof 2019 LA)
4.5%
EN
GradientBoosting + 197-dim BR-FVD (Japanese / 7 speakers)
0.3%
1.000
JA
AASIST World State-of-the-Art (reference)
0.8%
EN
Research

Research & Publication

Peer-Reviewed Paper in Preparation

This system has been developed as academic research, with continuous performance validation on the world-standard benchmark ASVspoof 2019 LA. Using the official RawNet2 implementation (Tak et al., ICASSP 2021), we achieved EER = 4.487%, min t-DCF = 0.12352 (ASVspoof 2019 LA evaluation set, 71,237 samples), significantly outperforming the official baseline (LFCC+GMM EER≈8%). We are currently preparing a manuscript for submission to an international peer-reviewed journal (IEEE Access). This is a unique research program jointly developing a Japanese neural TTS system (BR-TTS NNW) and its corresponding fake voice detector (BR-FVD) within a unified framework.

1.000
ROC-AUC for Japanese speech
(7-speaker validation)
0.3%
EER for Japanese speech
(Equal Error Rate)
4.49%
EER for English (ASVspoof 2019 LA)
RawNet2 Official Implementation
Deliverables

Analysis Report Contents

We provide detailed reports that scientifically visualize the basis for judgments, not just a simple yes/no answer.

ROC Curve & AUC

Receiver Operating Characteristic curve. Quantifies model discrimination performance using AUC and EER.

Score Distribution

Visualization of Real/Synthetic score distributions. Intuitively shows the degree of separation between the two classes.

Feature Importance

Importance ranking of 197-dimensional features. Shows which acoustic features served as the basis for the judgment.

Threshold Analysis

Detailed threshold-based classification. Includes individual indicator evaluations for Jitter, Shimmer, Spectral features, etc.

RawNet2 Deep Score + Occlusion

Deep learning model synthetic probability score with Mel Spectrogram x Occlusion Sensitivity visualization in 4 panels, showing which time-frequency regions contributed to the judgment.

Summary CSV

A CSV report listing synthetic probability scores, predicted labels, and confidence levels for each file.

Pricing

Simple Pricing

We offer flexible plans tailored to the number of files and use case. Please feel free to contact us for a consultation.

Spot Analysis
Inquiry
For one-time audio file verification. Ideal for a small number of files.
  • Universal FVD analysis
  • ROC & Score Distribution report
  • Summary CSV
  • Delivery: within 3 business days
Contact Us
Personalized
Inquiry
Speaker-specific analysis. From voice sample submission to individual model construction.
  • Speaker-specific model construction
  • Personalized FVD analysis
  • Full report set (6 types)
  • Continuous model update option
  • Delivery: upon consultation
Contact Us
FAQ

Frequently Asked Questions

What kind of audio files should I send?
We recommend WAV format (44,100 Hz, mono, 16-bit). MP3 and m4a are also supported, but conversion processing is required. Audio length should be at least 1.5 seconds, with 3–8 seconds of natural speech recommended.
Which service should I choose — Universal FVD or Personalized FVD?
We recommend Universal FVD for verifying audio from unknown or anonymous speakers, and Personalized FVD for impersonation detection or voice protection of specific individuals. Please feel free to contact us if you are unsure.
How accurate is the detection?
For Japanese speech, we have achieved ROC-AUC=1.000 and EER=0.3%. For English (ASVspoof 2019 LA evaluation set, 71,237 samples), our official RawNet2 implementation achieved EER=4.487% and min t-DCF=0.12352, significantly outperforming the world-standard baseline (EER≈8%). Please note that accuracy may vary for unknown TTS engines or post-processed audio.
How is the confidentiality of submitted audio data protected?
Audio data submitted for analysis will be securely deleted upon completion. We also support NDA (Non-Disclosure Agreement) signing for highly confidential data.
When will the online service launch?
We are currently building a web application server. We plan to launch an online file upload and instant analysis service in the near future. We will announce the launch on this page when it is ready.

Contact Us

For service inquiries or quote requests, please feel free to reach out. We typically respond within one business day.

info@brsystems.jp