YARA rules are structured, code-level pattern-matching definitions used to identify and classify malware by describing the low-level characteristics that appear in malicious files or memory. A rule encodes these characteristics as text strings, hexadecimal byte sequences, or regular expressions, then applies Boolean logic to determine when a match should occur. Unlike hash-based detection, which breaks as soon as a single byte changes, YARA rules capture the reusable genetic markers of malware families, such as shared code fragments, configuration strings, function names, or structural traits, allowing analysts to detect variants and related samples at scale.
Because YARA operates on files, memory dumps, and data streams, it is widely used across malware research, DFIR, threat hunting, and backup validation workflows to detect known threats, uncover modified versions, and classify previously unseen samples based on their internal composition.
In this article, we break down how YARA works, what YARA rules are, and how to build high-quality rules for threat intelligence and malware detection. You can also refer to our blog about Sigma Rules which is another widely used generic signature format.
Writing YARA rules uses the rule identifier : tags { ... } structure with these core components:
These elements form a compact, expressive language for describing malware traits with precision.
Below is an example of YARA rules created for the ESXi variant of Play ransomware. To walk through the YARA structure and syntax, we will take a closer look at this real-world example referenced in the updated CISA AA23-352A advisory from June 2025.
The ESXi variant of Play ransomware uses hypervisor-specific shell commands to shut down virtual machines, enumerate datastore contents, and modify the ESXi interface to display its ransom note.
This YARA rule identifies the ESXi-focused Play binary by matching its distinct operational strings, targeted VM file extensions, and characteristic ESXi management commands.
|
rule PlayForESXi |
Let’s start with explaining the rule name body.
The rule name is the primary identifier that appears immediately after the rule keyword. It is the handle YARA uses when compiling rules, executing scans, generating logs, and debugging results. A clear and descriptive rule name improves maintainability and makes the rule easier to reference in detection workflows, threat hunting, and DFIR operations.
A rule name must:
Examples of valid rule names;
|
rule RansomwareDetection |
In our example, the name of the YARA rule is “PlayForESXi”.
|
rule PlayForESXi |
This name clearly indicates the rule’s purpose: detecting the PLAY ransomware variant targeting VMware ESXi systems.
Tags in a YARA rule are optional labels placed after the rule name. They act as classifiers that help you group, filter, and organize rules without affecting how the rule matches. Tags do not influence detection logic; they simply provide a way to categorize rules for operational use.
Purpose of tags:
Here is an example syntax.
|
rule PlayForESXi : ransomware esxi play { |
For instance, if we were to run only rules tagged for ransomware analysis, we would run the following command.
|
yara -t ransomware rules.yar /path/to/files |
Or, run only YARA rules tagged play;
|
yara -t play rules.yar /path/to/files |
This is extremely useful when you have large rule sets and want to run only a subset, such as ESXi ransomware rules, Linux ransomware rules, or rules for a specific threat family.
The metadata section in a YARA rule (meta:) provides descriptive information about the rule. It does not influence detection logic. Its purpose is to give context for analysts, documentation systems, and automated tools.
What metadata usually includes;
These fields support rule management, threat hunting workflows, and knowledge organization.
Example metadata block;
|
meta: description = "Detects PLAY ransomware targeting ESXi Hypervisors" date = "2025-01" filetype = "elf" maltype = "ransomware" |
Strings are the core detection elements used to match patterns inside files, memory, or data streams. They represent observable artifacts of malware: text, byte sequences, or behavior signatures. Each string is assigned an identifier (e.g., $s1) and is later referenced in the condition section to determine when a rule should trigger.
Strings can be text, hexadecimal byte patterns, or regular expressions.
YARA evaluates these patterns efficiently using the Aho–Corasick algorithm for multi-string scanning.
Types of Strings in YARA
|
String Type |
Example |
Purpose / Typical Use Cases |
|
Text |
$s = "encrypt:" nocase |
Matches readable text in ASCII or UTF-16; used for malware keywords, commands, URLs, config markers, mutex names, strings seen in plaintext or debug output. Works with modifiers like nocase, wide, fullword. |
|
Hex |
$h = { E8 ?? ?? ?? ?? 83 C4 } |
Matches raw bytes, opcodes, binary signatures, shellcode, PE/ELF headers, unpacker stubs. Supports wildcards, byte-range “jumps”, and alternatives to tolerate minor differences across variants. Very useful when malware is obfuscated or binary-only. |
|
Regex |
$r = /[A-Za-z0-9]{32}/ or /cmd\.exe/i |
Matches variable or partially obfuscated patterns: hashes, encoded strings, dynamic URLs, variable file paths, config lines. Allows flexible detection when exact text or byte patterns are too brittle. Supports regex modifiers like nocase, ascii, wide, fullword (also regex-specific flags). |
The string block in our YARA rule example;
In our example, the strings section appears between strings: and condition:.
Each string defines an artifact commonly observed in Play ransomware activity on ESXi environments. These artifacts include status messages printed by the ransomware, ESXi-specific filesystem paths, virtual-machine file extensions, and shell commands issued during the attack.
|
strings: |
What they represent:
Ransomware execution messages
"encrypt:", "First step is done.", "Complete." → messages logged during encryption, useful for identifying Play’s workflow.
ESXi filesystem and entropy sources
Paths such as "/vmfs/volumes" and "/dev/urandom" indicate interactions with ESXi storage locations and pseudo-random generation.
Targeted virtual machine file extensions
The list of .vmdk, .vmx, .vswp, .nvram, etc. reflects Play ransomware’s focus on core VMware VM artifacts that must be encrypted or disrupted.
ESXi shell commands used during attacks
Strings such as vim-cmd vmsvc/power.off and esxcli storage filesystem list match operational steps Play performs:
Campaign-specific ransomware extension
The .PLAY extension is a direct indicator of Play ransomware activity.
The condition section defines the logical expression that determines when the rule matches a file, process, memory region, or data stream.
It is the mandatory part of a rule, without a valid condition:, the rule cannot run.
The condition can check for things like;
|
Element |
What It Does |
|
Boolean operators (and, or, not) |
Combine multiple conditions logically (all must match, any, or negate) |
|
String references; $string_name, or counting: #string_name |
$string_name: true if the pattern appears at least once. #string_name: returns the number of matches, useful for hit-count based logic. |
|
File properties / Metadata checks (e.g. filesize, header checks) |
Enables context-aware logic: skip too small files, check file type (PE, ELF, etc.), or limit by size to reduce false positives. |
|
Positional / offset operators; at, in |
Force patterns to appear at specific offsets (e.g. header at 0) or within defined byte ranges, useful for structural checks. |
|
Arithmetic / relational expressions (==, >, <, >=, <=, mathematical operators) |
Combine counts or file property values to create thresholds or ratios (e.g. “if string appears > N times” or “if size < X”). condition: $PLAY_ext_str and filesize < 300KB |
|
Grouping / nested logic / “any of” / “all of” shorthand |
Build complex detection logic combining many indicators, e.g. require one of several suspicious strings and a certain file size and structural header checks. |
Here is another YARA rule example.
The condition triggers only if the file is a valid BEAM binary under 1MB, contains the ssh_connection.erl reference, and does not contain any of the fix-related strings.
In short: it matches vulnerable Erlang/OTP SSH modules and excludes patched ones.
|
rule VULN_Erlang_OTP_SSH_CVE_2025_32433_Apr25 { |
In our example, the condition section is rather simple.
|
condition: |
"all of them" is a YARA shorthand that refers to every string defined in the strings: block. For this rule to trigger, every one of those patterns must be present in the scanned file.
This creates a highly specific match requirement, ensuring the rule only detects files that exhibit the full set of indicators associated with Play ransomware on ESXi systems.
Import statements allow a rule to load additional modules that extend YARA’s core functionality. These modules provide specialized functions, constants, and file-format metadata that cannot be accessed through basic strings or hex patterns alone.
Here is an example.
|
import "pe" |
Here, import "pe" brings in the PE module.
In the condition, pe.is_pe checks that the file is a valid Portable Executable.
pe.number_of_sections > 10 lets the rule flag binaries with more than 10 sections, which might be a suspicious packing or obfuscation.
Here is an example.
Figure 1. YARA and Suricata Rules to Detect the Infostealer GRXBA Version 1.1.3.0, CISA AA23-352A
|
rule Identify_ZIP_Structure |
This YARA rule detects ZIP files, even if renamed, or ZIP-based formats like DOCX or APK.
Here’s a simplified workflow, based on how YARA repos and vendors operate.
A high-quality YARA rule should:
Here are concrete practices many in the community follow when authoring YARA rules.
A naive YARA rule might rely on a generic string such as "config" or "password". These strings appear in thousands of legitimate binaries, Office documents, browser caches, and configuration utilities. In a production SOC environment, such a rule would trigger constantly and provide no actionable value.
A stronger YARA rule uses malware-specific indicators, for example:
A rule that combines something unique (e.g., "ph0b0s_cfg_v3") plus a magic header check avoids scanning text files, ZIP archives, and other benign content.
This dramatically reduces false positives while preserving detection across Phobos variants that reuse the same config layout.
A rule built by hashing one malware sample or matching a single string (e.g., "BuildID:20220911") will break the moment the attacker:
Real attackers regularly change such indicators.
A resilient rule uses structural or behavioral markers that persist across variants, for example:
Combining these creates a signature that continues to match new variants as long as they maintain the same logical workflow, even if strings or superficial features change.
In a large SOC or MDR environment, hundreds of YARA rules may be deployed across:
Without proper metadata and versioning, analysts cannot tell:
Example: A rule written in 2021 for TrickBot loaders might rely on old string artifacts like "clientID="? or "systeminfo.exe"execution chains. But TrickBot evolved dramatically in 2024–2025, changing config structures and infection flow.
If a SOC has versioned rules, documentation, and a change history, analysts can:
This prevents stale or broken rules from silently reducing visibility.
|
Aspect |
YARA Rules |
Sigma Rules |
|
Primary Focus |
Static or memory-based pattern matching, identifying malicious files, malware, unwanted binaries or suspicious artifacts on disk or in memory. |
Log-based event detection, identifying suspicious behavior by analyzing system, application, or network logs. |
|
What they Detect |
File content (binaries, documents), memory dumps, file structure (headers, byte sequences, strings, etc.) |
Events/activities recorded in logs, e.g. process creation, login attempts, user activity, network events, command executions. |
|
Rule Format |
Custom rule language with C-like syntax: define strings / hex-patterns / regex / module imports (PE, ELF, etc.), and Boolean conditions. |
YAML-based format. Rules consist of metadata (title, id, description, tags), logsource definition, detection criteria (field conditions), and condition logic. |
|
Typical Use Scenarios |
Malware analysis, forensic investigation, threat-hunting on file systems or backups, scanning unknown binaries for malicious patterns, classification of malware families, cleaning malware from backups before restore. |
Detecting suspicious or malicious behaviors, brute-force attempts, anomalous user activities, lateral movements, privilege misuse, across systems that produce logs (endpoints, servers, network devices, etc.). |
|
Strengths |
Very precise detection based on file/memory content. Detects malicious files even before execution. Good for polymorphic malware if rule authors use robust patterns (byte-sequences, headers, structure) rather than superficial strings. | SIEM-agnostic: same rule works across different log-management or SIEM platforms. Well suited for behavior-based detection (when malicious activity happens, not just file presence).Works across diversified log sources (endpoints, network devices, auth servers, etc.), providing broad coverage. |
|
Limitations |
If malware is packed, encrypted, obfuscated, or heavily polymorphic, static pattern-matching may fail. Only sees the file or memory: no context of behavior (network activity, process execution, user actions). Requires file to be present or loaded, cannot detect threats solely via logs or network events. |
Relies on sufficient logging, if events are not logged, Sigma rules cannot detect anything. No insight into file payloads or memory content, cannot detect malicious files just by logs. Log-based detection can produce high false positives if not carefully tuned (normal admin actions may appear suspicious). |
|
Typical Deployment |
Anti-malware engines, EDR tools (file / memory scanning), backup-scan workflows (e.g. before restore), threat-hunting or forensic pipelines. |
SIEM systems, log-analysis tools, centralized logging platforms, SOC monitoring dashboards, event-correlation pipelines. |
|
Rule Sharing |
Community and vendor rule repositories, often used by malware researchers, DFIR teams; rules are portable across systems that support YARA scanning. |
Designed to be SIEM-agnostic: same Sigma rule can be converted into platform-specific queries (Splunk SPL, Elastic, Chronicle, etc.) without rewriting logic. |
|
Detection Level |
Content-based (static/memory), looks at the actual bytes, strings, structure of files or memory dumps. |
Behavior-based (event/activity), looks at what happens on the system: log events, user and process activity, network events, etc. |
There are several well-known YARA rule repositories and communities where researchers, SOC teams, and threat analysts can publish, share, and collaborate on YARA rules:
YARA GitHub Repository: The central hub for the YARA project. This repository hosts the YARA source code, latest releases, issue tracking, and contributions from the core maintainers. It is the authoritative location for obtaining the most up-to-date YARA engine.
YARA Documentation: Hosted on ReadTheDocs, this documentation provides an in-depth reference for the YARA language, including syntax, rule structure, available modules (PE, ELF, hash, etc.), and examples used for malware classification and threat detection.
YARA Rules and Signatures Repository: Multiple open-source repositories maintain curated, community-contributed YARA rule sets. These collections aggregate signatures created by malware researchers, DFIR analysts, detection engineers, and threat intelligence practitioners. Users can download existing rules or contribute their own for broader community use.
YARA rules by ransomware group: a community of sample YARA rules specific to ransomware.
The Picus Mitigation Library is a comprehensive repository of validated prevention and detection content that helps organizations strengthen and maintain their security posture.
Figure 2. Picus Platform Provides Both Prevention and Detection Content
It delivers tailored mitigation coverage for a wide range of security technologies, enabling teams to close exposure gaps quickly and effectively.
The library includes:
Figure 3. Picus Platform Detection Content Library
Ready to put your defenses to the test and apply real-world mitigation?
Start your demo now to simulate attack scenarios, validate which controls stop threats, and instantly apply tailored mitigation suggestions through the Picus Platform.