DOI

10.5703/1288284317867

Description

Digital investigators face challenges searching through massive data when coded language evades keyword searches. Our research shows how vector embeddings enhance digital forensics by detecting implicit drug references in social media. Using LLM2Vec with instruction-based embeddings, we significantly outperformed traditional keyword searches when dealing with slang, enabling investigators to find content based on meaning rather than exact word matching.

Share

COinS
 

Finding What Keywords Miss: Vector Search for Digital Forensics

Digital investigators face challenges searching through massive data when coded language evades keyword searches. Our research shows how vector embeddings enhance digital forensics by detecting implicit drug references in social media. Using LLM2Vec with instruction-based embeddings, we significantly outperformed traditional keyword searches when dealing with slang, enabling investigators to find content based on meaning rather than exact word matching.