DOI
10.5703/1288284317867
Description
Digital investigators face challenges searching through massive data when coded language evades keyword searches. Our research shows how vector embeddings enhance digital forensics by detecting implicit drug references in social media. Using LLM2Vec with instruction-based embeddings, we significantly outperformed traditional keyword searches when dealing with slang, enabling investigators to find content based on meaning rather than exact word matching.
Finding What Keywords Miss: Vector Search for Digital Forensics
Digital investigators face challenges searching through massive data when coded language evades keyword searches. Our research shows how vector embeddings enhance digital forensics by detecting implicit drug references in social media. Using LLM2Vec with instruction-based embeddings, we significantly outperformed traditional keyword searches when dealing with slang, enabling investigators to find content based on meaning rather than exact word matching.