Hooked on Mnemonics Worked for Me

Stressing LLMs - Local Model Complexity Attacks Progress

Hello, this is an ongoing personal learning series on Large Language Models (LLMs) and automated reverse engineering. In a previous blog post, I described a type of complexity attack against LLMs. I am using "attack" in the practical reverse-engineering sense: intentionally increasing the amount of interdependent code and state the model has to reason about. My hypothesis is that increasing the number of interdependent round functions increases the chance that a model will make a small but fatal translation error, even when it identifies the correct high-level algorithm. Here is an excerpt from that blog post that describes the methodology I'm using.

The complexity attack increases computational complexity by generating binaries with a large number of interdependent functions. Instead of hiding the logic, the goal is to make the amount of state and code too large for practical static reasoning. The executable contains a toy XOR cipher with a keystream derived from a set of N functions, where N is the number of generated rounds. A Python script generates C source code with an embedded encrypted string and decryption loop. GCC is then used to compile the C source into an executable. At runtime, the decrypted string is printed to the console. To make this concrete, we can walk through generating the code and compiling it.

What is useful about this approach is that I can measure the complexity of the generated binaries and then test them against an LLM. At first glance, this sounds easy, but there are a lot of nuances to running models locally. For example, one model started to fail after I updated Ollama. In this blog post, I'm going to describe the results, themes of failures and lessons learned. Before diving into those, I think it's worth mentioning the hardware and tooling.

I'm running all of the local models on a DGX Spark and using OpenAI's Codex as an agent. On the Spark, I use Python and Bash for all of the scripting and tool execution, clearbluejar's pyghidra-mcp within Docker to interact with Ghidra, and Ollama to interact with models. I'm using the following models for evaluation:

  • gemma4:26b
  • qwen3.6:35b
  • qwen3-coder:30b
  • mistral-small3.2:24b
  • command-r:35b
  • cogito:32b
  • hermes3:8b

I attempted to download and use a number of other models, but they did not work because they were not accessible via MCP, timed out, or had other similar issues. Before I dig too deep into the evaluation process and data, I wanted to share the results of the most recent evaluation. In these runs, gemma4:31b is clearly the best local baseline. gemma4:26b was historically successful, but it became unstable after updating Ollama. qwen3.6:35b was able to solve fixtures (e.g. binary test case) 0001 and 0002 after updating the prompt.   Below is a table of the results from 102 runs. Please note that the results improved over time with updates to the prompt, so this should be read as a practical lab notebook rather than a perfectly controlled benchmark.

model rows verified completed false timeouts other inconclusive successful fixtures
gemma4:31b 15 11 2 2 0 0001, 0002, 0003, 0004 across later runs
gemma4:26b 23 10 8 3 2 mostly 0001, plus 0002 and 0003
qwen3.6:35b 11 2 6 1 2 0001, 0002
qwen3-coder:30b 9 0 8 0 1 none
qwen3.5:35b 4 0 2 1 1 none
cogito:32b 9 0 8 0 1 none
command-r:35b 9 0 8 0 1 none
mistral-small3.2:24b 9 0 8 0 1 none
hermes3:8b 9 0 8 0 1 none
lfm2:24b 4 0 4 0 0 none

The following table is from the last run in which gemma4:31b was able to successfully write a decryptor for binaries with 1, 2, and 3 rounds of XOR functions.

model fixture rounds final grade completed decryptor correct py blocks status elapsed sec MCP calls error
gemma4:31b /fx_r0001_sl0016_sp0000_dwarf.exe-2f588b 0001 true True True 1 completed 958.73 8
gemma4:31b /fx_r0002_sl0016_sp0000_dwarf.exe-861fc4 0002 true True True 1 completed 1101.385 11
gemma4:31b /fx_r0003_sl0016_sp0000_dwarf.exe-135f91 0003 true True True 1 completed 918.604 9
gemma4:31b /fx_r0004_sl0016_sp0000_dwarf.exe-9f4750 0004 inconclusive_timeout False False 0 ollama_timeout 2287.903 10 TimeoutError: timed out
qwen3.6:35b /fx_r0001_sl0016_sp0000_dwarf.exe-2f588b 0001 true True True 2 completed 414.498 23
qwen3.6:35b /fx_r0002_sl0016_sp0000_dwarf.exe-861fc4 0002 true True True 4 completed 281.356 13
qwen3.6:35b /fx_r0003_sl0016_sp0000_dwarf.exe-135f91 0003 false True False 1 completed 304.465 29

The analysis of the 4-round binary timed out after 1800 seconds (30 minutes). Previous runs were able to decrypt 4-round binaries, but not 5-round binaries. There is evidence in this setup that the models start to degrade as the binary becomes more complex, but the results are still inconclusive.

Failures

Disclaimer: The failure report was generated using AI. I personally find the failures fascinating.

The failures cluster around a small set of reverse-engineering translation hazards:

  • uint32_t + uint32_t wrapping before a later uint64_t cast
  • C unsigned literal widths, especially constants ending in u
  • C cast timing, such as (uint64_t)(expr) where expr was already evaluated at 32 bits
  • CONCAT44(a,b) high/low argument interpretation
  • C operator precedence involving +, ^, |, <<, and >>
  • 64-bit rotate idioms such as (x << 21) | (x >> 43)
  • final derive_state accumulator behavior and final xorshift32(...)
  • preserving byte order from mixed character/hex array initializers
  • implementing every generated round function in order

The most interesting lesson so far is that the models often do the hard-looking part first. They find the right functions, recover the encrypted bytes, and understand the XOR loop. The thing that breaks them is often much smaller: one C integer-width rule, one cast at the wrong time, or one generated round translated almost-but-not-quite correctly.

1. The Hardest Failures Are Now Translation Errors, Not Tool Access

The strongest models usually find the right region of the binary: main, derive_state, xorshift32, encrypted bytes, and the generated round functions.

The most important remaining failure class is translating C/Ghidra semantics into Python exactly. This shows up as scripts that look plausible, run successfully, but print non-plaintext bytes.

Note: 20260611-** is the id of the run 

Confirmed examples:

  • gemma4:31b, 20260611-aa, fixture 0004

    • The model recovered the right functions and wrote a full decryptor.
    • Failure was in R3: it translated (uint64_t)(0xc8e57b40u + m) as a wide Python addition instead of u64(u32(0xc8e57b40 + m)).
    • One-line fix made the script print Hello, World.
  • gemma4:31b, 20260611-cc, fixture 0005

    • Same root cause in R3.
    • Correct C: p->b ^= (uint64_t)(0xc8e57b40u + m);
    • Model Python missed the 32-bit wrap before widening.
    • One-line fix made the script print Hello, World.

This is best classified as incorrect-reasoning with a bad-type-recovery or cast-timing secondary cause.

2. Prompt Updates Improved gemma4:31b, but Did Not Fully Solve Cast Timing

gemma4:31b improved materially after prompt changes. In 20260611-cc, it solved fixtures 0001 through 0004 and failed on 0005. Earlier, in 20260611-aa, it solved 0001 through 0003 and failed 0004.

The remaining 0005 failure shows that telling the model to use ctypes is not enough. The model imported or mentioned fixed-width behavior but still used raw Python arithmetic for a critical intermediate expression.

The prompt now needs to force helper usage, not just mention ctypes:

def u32(x): return ctypes.c_uint32(x).value
def u64(x): return ctypes.c_uint64(x).value

Critical translation rule:

(uint64_t)(a_uint32 + b_uint32) -> u64(u32(a + b))

This is the kind of bug that makes the benchmark useful to me. The model is not completely lost, but it is also not correct. That middle zone is where a lot of reverse-engineering automation gets interesting.

3. gemma4:31b Is the Current Best Local Baseline

gemma4:31b has the strongest completed results:

  • 20260611-aa: solved 0001, 0002, 0003; failed 0004.
  • 20260611-cc: solved 0001, 0002, 0003, 0004; failed 0005.
  • 20260611-dd: solved 0001, 0002, 0003; timed out on 0004.

It is slow, but its failures are now narrow and mechanically diagnosable.

4. gemma4:26b Is Historically Useful but Unstable

gemma4:26b has repeated successes, mostly on fixture 0001, and solved 0001 through 0003 in 20260611-bb.

However, it also regressed repeatedly:

  • wrong decryptor output on fixture 0001 in some runs
  • timeouts on early fixtures
  • empty or pseudo-tool response on fixture 0002
  • inconsistent ability to proceed past fixture 0001

It remains useful as a historical baseline, but it is not as reliable as gemma4:31b.

5. qwen3.6:35b Became Interesting in Later Runs

Earlier qwen3.6:35b runs mostly produced invalid or non-printing Python, timed out, or failed to converge.

In 20260611-dd, it solved fixtures 0001 and 0002, then failed fixture 0003 with invalid/malformed Python. That suggests it is not merely MCP-compatible; it can solve the simpler generated fixtures under some settings. It still degrades as round complexity increases.

6. Many Models Are Tool-Compatible but Do Not Converge

Several models can call MCP tools but fail to produce a usable decryptor:

  • command-r:35b
  • lfm2:24b
  • some qwen3.6:35b and qwen3.5:35b runs

These failures usually are not MCP server failures. They are either:

  • tool use without a final algorithm
  • excessive tool loops
  • target drift
  • loss of the objective after accumulating tool output

lfm2:24b is the clearest example: it used many MCP calls in some runs but did not produce a Python decryptor.

7. Smaller Models Often Stop Too Early

The most common pattern for mistral-small3.2:24b and hermes3:8b is minimal MCP usage followed by no decryptor.

These are mostly incomplete-xrefs failures:

  • did not inspect enough of main
  • did not inspect derive_state
  • did not inspect all generated round functions
  • did not recover the encrypted bytes and loop bounds

8. Invalid Python and Fence Extraction Remain Separate Problems

Some failures are model output quality problems:

  • invalid Python syntax
  • Markdown/prose inside extracted code
  • several fenced blocks, none of which are a clean decryptor
  • claimed plaintext inconsistent with script output

There is also a harness extraction issue observed in 20260611-cc for gemma4:31b fixture 0005: the saved block_0.py was not the explicit Python block. It captured Markdown around the recovered-output prose because earlier c fences confused the plain-fence extractor. The actual Python block in answer.md ran, but printed wrong bytes until the R3 cast-timing bug was fixed.

This means two separate checks are needed:

  1. Did the model write a correct Python decryptor?
  2. Did the extraction harness capture the intended Python block?
Back to non-AI-ish text.

Limitations

There are a few caveats worth calling out before the conclusion. The prompt changed during the study, and some of the later improvements came from manually comparing generated Python against the original C and feeding that back into the prompt. Ollama updates may also have changed model behavior, especially for gemma4:26b. There was also at least one harness extraction issue where the saved Python block was not the intended final decryptor. Finally, timeouts are inconclusive. They show that the run did not finish inside the configured timeout, not that the model could never solve the fixture.

That means the results are best read as evidence from one local setup, not a universal ranking of models or a final statement about LLM reverse-engineering ability.

Conclusion

More rounds do not make solving impossible, but they substantially increase the chance of failure once a model must preserve a longer state mutation chain.

The dominant complexity effect is not discovering the high-level XOR scheme. Models often find:

  • main
  • encrypted bytes
  • seed
  • derive_state
  • xorshift32
  • generated round functions

The failure emerges when translating every round exactly:

  • more generated functions to implement
  • more mutable state updates to preserve in order
  • more C integer width boundaries
  • more unsigned literal/cast timing traps
  • more rotate and precedence opportunities for one-bit state errors

For this fixture generator, prompt, harness, and local model setup, the practical threshold appears to be:

rounds 1-3: solvable by gemma4:31b with good reliability
round 4: boundary where failures/timeouts start
round 5: current failure point for gemma4:31b

That conclusion should be treated as provisional because there are few high-round samples.

Next Steps

  • Automated prompt improvement using failure analysis

    • This process was done manually by comparing the generated Python code against the original C code, then having the harness recommend upgrades to the prompt.
    • This substantially improved the results. qwen3.6 started working after the first iteration of this approach.
  • Compare against frontier models.

  • Keep learning and keep reading.

Any recommendations? I would love to hear feedback, hints or thoughts from others. Feel free to send me an email to Alexander dot Hanel at gmail dot com or leave a comment. Cheers.

Frost64

At my day job, I run a RE mentorship. It’s a simple process for me. I have a link to the task, and the user emails me the answers to the questions in the task. If their answers are correct, I send them the next task. If the answers are incorrect, I let them know their mistakes and send them references to help. Sometimes people get tripped up with learning Assembly, or they think it’s boring (which it kind of is). One tool I have always wanted was a way to gamify the learning of Assembly.  So, my vibe-coding weekend project was a SoftIce-like interface that teaches the basics of Assembly. It uses the Godot gaming engine, Unicorn-Engine for emulation, and iced. I hit my quota for the next couple of hours I figured I’d post a screenshot:



Overall I’m very happy with it. It’s a complete emulator, has correct addresses, bytes, validation and reminds me of SoftIce. I still need to audit the text in the lesson notes because they are duplicated in the command window text. Once it’s complete I’ll post the code to GitHub or maybe post the game on Steam.  

I’m still working on my Stressing LLMs project. Some local models that I tested failed miserably on writing basic python decryptors. I have been digging into proper prompts, skills.md and settings for Ghidra. That project should be out in a couple of weeks. 


Stressing LLMs - Triage Stage

Packers, cryptors, and code obfuscation are all methods used to bypass signature-based scanners in AV/EDR or to slow down the reverse engineering process. Many people are now using Large Language Models (LLMs) to reverse engineer or thwart these protections. It is increasingly common to see examples of frontier models solving CTF challenges or being used to port old video games to modern code. It is somewhat morbidly fascinating to consider how LLMs could drive an arms race with DRM systems.

When thinking about LLMs for reverse engineering, I keep asking: at what point does randomization degrade tokenization or code comprehension? This is a reasonable question in the context of compiled executables.

In my view, there are two types of potential attacks against LLMs in the context of static analysis of compiled binaries. The first is making the code so complex that the context size and token cost are no longer practical. The second, which I call “Tokenization Inflation,” attempts to inflate or fragment tokens to increase processing cost or reduce coherence. These “attacks” may not even be effective against LLMs, especially for trivial tasks, but they are still worth exploring. This is the first of two posts: this one outlines the approaches and code; the second tests the hypothesis.

Complexity Attack

The complexity attack increases computational complexity by generating binaries with a large number of interdependent functions. Instead of hiding the logic, the goal is to make the amount of state and code too large for practical static reasoning. The executable contains a toy XOR cipher with a keystream derived from a set of N functions, where N is the number of generated rounds. A Python script generates C source code with an embedded encrypted string and decryption loop. GCC is then used to compile the C source into an executable. At runtime, the decrypted string is printed to the console. To make this concrete, we can walk through generating the code and compiling it.

python gen_fixture.py generate --seed 0xdeadbeef --rounds 5 --symbol-len 16 --symbol-pad 0 --message "Hello, World" --out fixture.c
Wrote fixture.c
Generated symbol prefix length: 16
Compile with:
gcc -O0 -g3 -gdwarf-5 -fno-omit-frame-pointer -fno-inline -std=c11 fixture.c -o fixture.exe

Here is fixture.c. There are 5 functions named TokenizerBench, which matches the number of rounds specified on the command line. If this were increased to 16,397 rounds, the generated binary would contain 16,397 functions.

// Generated CTF-style static-analysis fixture
//
// Suggested build:
//   gcc -O0 -g3 -gdwarf-5 -fno-omit-frame-pointer -fno-inline -std=c11 fixture.c -o fixture.exe
//
// seed=0xdeadbeef
// rounds=5
// const_mode=rand
// const_seed=0xc001d00d
// symbol_len=16
// symbol_pad=0
// generated_prefix_length=16
//
// Notes:
// - Per-function constants are baked into each generated function.
// - Function bodies vary by generated round variant.
// - The plaintext is stored encrypted in the binary and decrypted at runtime.

#include <stdint.h>
#include <stdio.h>
#include <stddef.h>

typedef struct TokenizerBench___Type__LongRecord__With__Lots__Of__Nested__Like__Tokens {
    uint64_t a;
    uint64_t b;
    uint64_t c;
} TokenizerBench___Type__LongRecord__With__Lots__Of__Nested__Like__Tokens;

static uint32_t xorshift32(uint32_t x) {
    x ^= x << 13;
    x ^= x >> 17;
    x ^= x << 5;
    return x;
}

__attribute__((used, noinline))
uint32_t TokenizerBench___R0(TokenizerBench___Type__LongRecord__With__Lots__Of__Nested__Like__Tokens *p) {
    uint32_t m1 = xorshift32(0x9336956du ^ 0x31bbf978u ^ (uint32_t)p->a);
    uint32_t m2 = xorshift32(0xcd6f55fcu ^ (uint32_t)p->b);
    p->a ^= ((uint64_t)m1 << 32) | (uint64_t)m2;
    p->b += (uint64_t)(0x9336956du ^ m2);
    p->b = (p->b << 10) | (p->b >> 54);
    p->c = (p->c + p->a) ^ (uint64_t)(0x31bbf978u ^ 0xcd6f55fcu);
    uint64_t r = p->a ^ p->b ^ p->c ^ (uint64_t)0x9336956du ^ (uint64_t)0x31bbf978u ^ (uint64_t)0xcd6f55fcu;
    return (uint32_t)(r ^ (r >> 32));
}

__attribute__((used, noinline))
uint32_t TokenizerBench___R1(TokenizerBench___Type__LongRecord__With__Lots__Of__Nested__Like__Tokens *p) {
    uint32_t m = xorshift32(0x366856bbu ^ (uint32_t)p->a);
    p->a ^= ((uint64_t)0x366856bbu << 32) | (uint64_t)m;
    p->b += p->a ^ (p->c + (uint64_t)0x72fcd409u);
    p->c = ((p->c ^ (uint64_t)0x3afd4cabu) << 24) | ((p->c ^ (uint64_t)0x3afd4cabu) >> 40);
    uint64_t r = p->a ^ p->b ^ p->c ^ (uint64_t)0x366856bbu ^ (uint64_t)0x72fcd409u ^ (uint64_t)0x3afd4cabu;
    return (uint32_t)(r ^ (r >> 32));
}

__attribute__((used, noinline))
uint32_t TokenizerBench___R2(TokenizerBench___Type__LongRecord__With__Lots__Of__Nested__Like__Tokens *p) {
    uint32_t m = xorshift32(0x046d6ad2u ^ (uint32_t)p->b);
    p->b ^= ((uint64_t)m << 32) | (uint64_t)0xc719f452u;
    p->c += p->b ^ (uint64_t)0x0fc1bdd9u;
    p->a = (p->a + (uint64_t)0x046d6ad2u);
    p->a = (p->a >> 21) | (p->a << 43);
    uint64_t r = p->a ^ p->b ^ p->c ^ (uint64_t)0xc719f452u ^ (uint64_t)0x046d6ad2u ^ (uint64_t)0x0fc1bdd9u;
    return (uint32_t)(r ^ (r >> 32));
}

__attribute__((used, noinline))
uint32_t TokenizerBench___R3(TokenizerBench___Type__LongRecord__With__Lots__Of__Nested__Like__Tokens *p) {
    uint32_t m = xorshift32(0xc55b15eeu + (uint32_t)p->c);
    p->a += ((uint64_t)m << 32) | (uint64_t)0x0d11e683u;
    p->c ^= p->a;
    p->c = (p->c >> 16) | (p->c << 48);
    p->b ^= (uint64_t)(0xc8e57b40u + m);
    uint64_t r = p->a ^ p->b ^ p->c ^ (uint64_t)0xc8e57b40u ^ (uint64_t)0x0d11e683u ^ (uint64_t)0xc55b15eeu;
    return (uint32_t)(r ^ (r >> 32));
}

__attribute__((used, noinline))
uint32_t TokenizerBench___R4(TokenizerBench___Type__LongRecord__With__Lots__Of__Nested__Like__Tokens *p) {
    uint32_t m1 = xorshift32(0xdaf09eaeu ^ 0xf6f1f787u ^ (uint32_t)p->a);
    uint32_t m2 = xorshift32(0xe0cf500du ^ (uint32_t)p->b);
    p->a ^= ((uint64_t)m1 << 32) | (uint64_t)m2;
    p->b += (uint64_t)(0xdaf09eaeu ^ m2);
    p->b = (p->b << 21) | (p->b >> 43);
    p->c = (p->c + p->a) ^ (uint64_t)(0xf6f1f787u ^ 0xe0cf500du);
    uint64_t r = p->a ^ p->b ^ p->c ^ (uint64_t)0xdaf09eaeu ^ (uint64_t)0xf6f1f787u ^ (uint64_t)0xe0cf500du;
    return (uint32_t)(r ^ (r >> 32));
}

__attribute__((used, noinline))
uint32_t derive_state(uint32_t seed) {
    TokenizerBench___Type__LongRecord__With__Lots__Of__Nested__Like__Tokens x = {
        seed,
        seed ^ 0x12345678ULL,
        seed + 0x9ULL
    };

    uint32_t s = seed;
    s ^= TokenizerBench___R0(&x);
    s ^= TokenizerBench___R1(&x);
    s ^= TokenizerBench___R2(&x);
    s ^= TokenizerBench___R3(&x);
    s ^= TokenizerBench___R4(&x);

    s = xorshift32(s);
    return s;
}

int main(void) {
    uint8_t encrypted[] = { 0xcf, 0x7a, 0xe5, 0x10, 0x3c, 0x49, 0xe6, 0x0b, 0x79, 0xcb, 0xf9, 0x3d, 0x00 };
    uint32_t s = derive_state(0xdeadbeef);

    for (size_t i = 0; i < sizeof(encrypted) - 1; i++) {
        s = xorshift32(s + 0xA5A5A5A5u);
        encrypted[i] ^= (uint8_t)(s & 0xffu);
    }

    puts((const char *)encrypted);
    return 0;
}

Below is the creation and execution of a 100,000-round binary.

python gen_fixture.py generate --seed 0xdeadbeef --rounds 100000 --message "Hello, World" --out fixture-100k.c --symbol-len 16 --symbol-pad 0
Wrote fixture-100k.c
Generated symbol prefix length: 16
Compile with:
gcc -O0 -g3 -gdwarf-5 -fno-omit-frame-pointer -fno-inline -std=c11 fixture-100k.c -o fixture-100k.exe

gcc -O0 -g3 -gdwarf-5 -fno-omit-frame-pointer -fno-inline -std=c11 fixture-100k.c -o fixture-100k.exe
.\fixture-100k.exe
Hello, World

The 100k-function binary was over 55 MB. Dynamic analysis could bypass this obfuscation with a single breakpoint, but the focus here is static analysis. The interesting part is that the number of functions scales easily for testing, and each function contributes to the final state. If the analysis is incomplete or incorrect, the derived decryption key will also be incorrect.

Tokenization Inflation

Once a prompt is sent to an LLM, it is tokenized into integers. A simple way to think about this is mapping chunks of text to IDs. These IDs are then used to index into the model’s embedding table. This may seem similar to compression algorithms, since both map variable-length sequences to codes. The difference is that tokenization uses a fixed vocabulary optimized for model performance, while compression builds or applies dictionaries to reduce size by exploiting repetition in the data.

A potential weakness in both compression and tokenization is that long inputs increase computational cost. Repetitive or structured data can also affect how efficiently it is represented as tokens. Most modern implementations handle this reasonably well, but there is still a cost.

In an executable, one of the most common ways to introduce large amounts of data is through strings. However, not all strings are surfaced or prioritized during analysis. One type of string that is often preserved and exposed is debug information. With GCC, DWARF debug metadata can be used to store extremely long function names. We can generate function names of arbitrary length using the Python script. By passing -g3 -gdwarf-5, GCC emits DWARF metadata. Disassemblers such as Binary Ninja, Ghidra, and IDA can read this metadata, recover the names, and in some workflows pass them along to an LLM, which then tokenizes the text. The following command generates 5 rounds with a function name length of 7,331 characters.

python gen_fixture.py generate --seed 0xdeadbeef --rounds 5 --message "Hello, World" --out fixture-p.c --symbol-len 7331 --symbol-pad 1337
Wrote fixture-p.c
Generated symbol prefix length: 8672
Compile with:
gcc -O0 -g3 -gdwarf-5 -fno-omit-frame-pointer -fno-inline -std=c11 fixture-p.c -o fixture-p.exe

gcc -O0 -g3 -gdwarf-5 -fno-omit-frame-pointer -fno-inline -std=c11 fixture-p.c -o fixture-p.exe

Below is a screenshot of a graph view in IDA. It gives a sense of how long the function names are, although they are truncated after 1024 characters in IDA.

Here is an example of a complete function name.

uint32_t __cdecl TokenizerBench__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char___0(TokenizerBench__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__DemangleLike__std__basic_string__char__std__char_traits__char__std__allocator__char__vector__pair__basic_string__int__PadXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX_Type__LongRecord__With__Lots__Of__Nested__Like__Tokens_0 *p) 
Combining the code complexity option with the large function names makes it so a large corpus of similar strings would be generated in a single function which might be taxing on a tokenizer. This attack can be easily defeated by not loading the debug/dwarf strings in the disassembler.

Summary

The goal is not to make binaries impossible to reverse, but to push LLM-based analysis into inefficient paths. One approach scales interdependent functions to force complexity. The other inflates token-heavy inputs through debug metadata to stress context limits and attention costs. These are better understood as attempts to trigger worst-case behavior in the analysis pipeline, not attacks on tokenization itself. This highlights a shift in where the pressure points are within LLMs. Context windows, token budgets, and attention scaling become part of the attack surface. If LLMs are used in reverse engineering workflows, understanding where they degrade may matter as much as improving their capability.

The next step is validating whether these ideas actually hold up in practice. That means testing them in a way in which I don't go broke with token cost and/or get banned by Anthropic or OpenAI. Odds are my first attempts will be locally using resources referenced in this gist.

Feel free to send me an email if you have any ideas at alexander dot hanel at gmail dot com.

Here is the source code: https://github.com/alexander-hanel/StressingLLMs

Codex’s Model Interaction & Inter-process Communication

Over the weekend I explored OpenAI’s Codex source code using Codex with the goal of understanding how it sends, receives, and processes responses from the API. Here is a link to the report. While going through it, I started thinking about inter-process communication (IPC) between Codex and other processes. In the coding agents I’m familiar with (Anthropic’s Claude Code and OpenAI’s Codex), there isn’t much support for receiving input from external processes, which raises a question I’ve had for a while: how does a third-party process communicate with the agent in a meaningful way? For example, if a command is blocked by a security provider like an EDR, the agent could simply generate a slightly modified version of that command and try again. But how would it know the block was due to a security event and shouldn’t be retried?

To explore this, I had Cursor modify its source to add IPC. The forked vibe coded version compiles and connects to OpenAI’s API just like a normal Codex install. It has a local file-based IPC surface so secondary processes can discover active sessions and submit feedback which gets added to the models context. While testing with multiple Codex instances, I accidentally ran a command to update README.md in the wrong terminal. Earlier, I had injected an “external security control” signal via a Python script, and the session responded with: 

“No. README.md was not updated. The edit attempt on README.md was blocked by an external security control, and the runtime indicated not to retry until that condition is cleared.” 

At first it didn’t make sense, then it registered that the previous tests had worked and the session context had been updated through the IPC channel. I thought it was fascinating because it opened a whole almost philosophical question: how would AI-Agents safely evaluate prompts from remote processes in the context of its current task? It shows how much trust matters in the context of AI-Agents. 

The README.md of the project is constructed as a learning guide. Enjoy. 

Agentic AI Security: Reviewing the Past to Predict the Future

OpenAI recently posted a role for a Cybersecurity Landscape Analyst within their Intelligence and Investigation team. One line stood out:

“Develop forward-looking assessments of how cyber threats may evolve over 6–24 months.”

To predict the future of Agentic AI, we only need to look to the past. Agentic AI security is not emerging from nothing. It is replaying the same history as traditional computing security, but within a compressed timeline.

As of this writing, prompt injection is a commonly discussed attack vector against LLM-based systems. At its core, prompt injection exists because LLMs are sequence predictors with no native separation between trusted control instructions (system prompts) and untrusted input (user data). This is not a new problem. This is basically Intel x86 in Real Mode.

In Real Mode, code, data, the stack, and even the interrupt vector table all share the same memory space. There is no privilege separation. Any instruction can jump anywhere, overwrite anything, and execute without restriction. The fundamental issue is identical: no boundary between control and data. Detection strategies in that era relied on pattern matching, heuristics, checksums, and runtime hooking. Modern defenses against prompt injection, such as guardrails, input filtering, and heuristic detection, are not that different. They are variations of the same reactive strategies used before architectural fixes existed.

What about forward-looking cyber threats like the first Agentic-AI worm? For this example, we could  consider the Morris Worm in 1988. Its success was not due to a single vulnerability, but an environment characterized by high trust between systems, widespread exposure of network services, weak authentication mechanisms, and a highly connected user base.

Now map this to Agentic AI. Instead of network services like sendmail, finger, or rsh, we have tool-enabled agents such as OpenClaw. Instead of academic researchers, we have early adopters rapidly integrating these systems into real workflows. Instead of BSD Unix systems in academic environments, we have Mac Minis showing up in homes and offices because people want to run OpenClaw locally. Instead of executable payloads, we have prompts. The conditions for a worm are the same: trust, connectivity, and execution capability. What is currently missing is density. There are not yet enough interconnected, tool-enabled systems for large-scale, worm-like propagation comparable to the Morris Worm or Slammer

My theory is that the same threats, along with the security mitigations developed to address them since the 60s and 70s, will replay themselves within the microcosm of Agentic AI. We are currently in DOS Mode for Agentic AI. 

Update: A colleague shared the following link 

https://arxiv.org/abs/2403.02817


 

 


LLMs != Security Products

Cybersecurity stocks took a dive after Anthropic released a blog post titled “Making frontier cybersecurity capabilities available to defenders" What stood out was not the post itself, but the market reaction. Companies tied to endpoint protection, cloud security, and other traditional cybersecurity products were affected, even though the post had little direct relevance to those companies.

That reaction highlights a disconnect between the perceived capabilities of “AI” and its actual impact on cybersecurity products, a disconnect that likely extends beyond the market. To make sense of that gap, it helps to start with what is actually meant by ‘AI’ in this context. Usage of the term AI (short for Artificial Intelligence) has increased sharply since the release of ChatGPT in November of 2022. In practice, much of what is labeled “AI” today is better described as large language models (LLMs). For readers unfamiliar with LLMs, a common definition is:

“A large language model (LLM) is a type of artificial intelligence that can understand and create human language. These models learn by studying huge amounts of text from books, websites, and other sources.”

What makes LLMs fascinating and applicable to our modern life is how they solved (on a surface level) a field of AI called Natural Language Processing (NLP). For readers not familiar with NLP, autocomplete, email spam filters and auto-correct are all examples of NLP. Here is a definition of NLP.  

“A field in Artificial Intelligence, and also related to linguistics, focused on enabling computers to understand and generate human language.”

Long-time readers of this blog may recall that I previously used a sub-field of NLP, Natural Language Generation (NLG) to automatically create descriptions of disassembled functions via API calls. On their own, LLMs require text for both training and inference. They are not autonomous systems;  without prompts, they do not function. This distinction is important when discussing AI and cybersecurity, because evaluating or classifying security events requires context that does not exist as text as input to a prompt. That context has to be generated by additional software.

Generating the context requires an understanding and access to the complete lifecycle of the security event that is being used for the context. Walking through this lifecycle matters because it highlights how much logic exists before an event ever becomes text.

A classic example of a security event is a process initiating an outbound network connection directly to an IP address. How that event is handled varies widely depending on the type of security product and where it operates in the OSI model. For this example, assume the product operates at Layer 7, the application layer. The event pipeline in this case includes several distinct steps. A kernel-mode driver or user-mode component monitors process creation and relevant networking APIs. The destination IP address is evaluated to ensure it is not local, then serialized into text and logged. That log data is subsequently forwarded to a file-based or cloud-based centralized logging system. Even this simplified path omits important actions such as blocking the connection or terminating the process. Writing code is not the same as building a security product, and LLMs do not possess the authority or signal access required to determine whether an IP address is benign or malicious. An LLM can describe an alert very well; it cannot, on its own, determine whether that alert represents malicious behavior without pre-existing detection logic, telemetry, or intelligence-derived indicators of compromise.

In practice, an agent is an LLM placed inside a loop, where it can inspect the current state of a system, run tools or commands, observe the results, and decide what to do next until it reaches some stopping point. Without the output of those tools and commands, the LLM provides no value; it has nothing to reason over. The surrounding software is what produces the text that gives the model context.

As of this publication date, LLMs are not going to replace cybersecurity products. These systems are large, long-lived codebases, and their value is not defined by code generation alone. What matters is the telemetry collected and the logic built on top of that telemetry to determine whether the text describing an event represents something benign or something malicious. Large language models can help explain security events, but they don’t replace the systems that detect them, and confusing the two is how markets end up reacting to the wrong things.

msdocsviewer

Hello, 

I forgot to post a recent IDAPython plugin that I created for viewing Microsoft SDK documentation in IDA. Here is an example screenshot of msdocsviewer .


 The repository for the plugin can be found here.