The goal of this phase is to gather all relevant data (artifacts) from the suspected phishing message before deeper analysis begins. These artifacts are the forensic breadcrumbs that help SOC analysts determine if the message is malicious, how it was crafted, what it's trying to do, and how they should respond.
Consider this step as digital crime scene preservation — collect first, analyze second.
What are Artifacts?
Artifacts are pieces of information or evidence that originate from or are embedded in the phishing communication. These include:
Message metadata (headers, timestamps)
Sender/recipient information
Message content (text, HTML, language patterns)
URLs, domains, and redirection chains
File attachments and their properties
Embedded scripts or code
Sender IPs and hosting infrastructure
Phone numbers, audio recordings (for voice-based phishing)
Screenshots (especially for mobile or web-based phishing)
Handling Artifacts Safely
Never click on live phishing links directly
Open files and URLs in isolated environments (VMs, Sandboxes)
Use secure artifact storage (case folder structure)
Maintain a chain of custody when exporting forensics or logs
Label and index artifacts in the incident tracking/ticketing system.
Full Raw Headers
Use “View Original” or “Message Source” in a mail client (Gmail, Outlook, Thunderbird).
From, To, Reply-To, Return-Path, Received, Message-ID
SPF, DKIM, DMARC results
Sender Details
Display name vs actual email
Reply address discrepancies
Timestamps
Look for odd sending times, future or past date discrepancies
Body Content
Text version and HTML version
Look for hidden links, base64 encoding, and suspicious iframes
Embedded URLs
Extract and defang (e.g., hxxp://malicious[.]com)
Note redirection, obfuscation, URL shorteners
Attachments
File name, extension, size
Compute hashes (SHA256, MD5)
Save a copy for sandboxing or reverse engineering
Sender number or shortcode
Can be spoofed—check format (international, random digits, etc.)
Full message text
Look for spelling/grammar issues, urgency, impersonation
Any links
Often shortened (e.g., bit.ly, tinyurl)
Document where they point (using urlscan.io, VirusTotal)
Screenshots
Mobile screenshots help preserve delivery format
Timestamp
Caller ID/Phone Number
Use reverse lookup tools
Voicemail/Call Recording
If possible, transcribe or summarize
Date and time of call
Contextual Details
What did the caller ask for - credentials, payment, information?
Was it part of a known pretext?
Profile Information
Username, display name, profile link, creation date
Message Text
Capture the full conversation
Any attachment media or files
Download with metadata
URLs
Trace and document defanged
Screenshots
Useful for documentation and reporting
Timestamps and message delivery metadata
Full URL
Including parameters, subdomains, etc.
Redirect chain
Tools like urlscan.io or browser dev tools (Network tab)
Page source
HTML, JavaScript, embedded forms
Screenshots
Login pages, fake portals, misleading branding
SSL Certificate Info
Use crt.sh or browser certificate inspector
WHOIS/Domain Registration Info
Date registered, registrar, contact email
Questions to Ask When Collecting Artifacts
What would help another analyst reproduce this analysis?
What would help the security team block or detect similar threats in the future?
Can I correlate these artifacts with other incidents or threat intel?