Execution depends on whether UC_HOOK_MEM_READ/_WRITE is registered #2041

boborjan2 · 2024-10-22T17:13:07Z

Using the dev branch head, I am facing an issue where I see that an execution flow that has been recorded in the past differs from it's recorded value. This binary snipplet includes self modifying code. I can get the old behavior only if I register an empty UC_HOOK_MEM_READ or WRITE hook (parameters do not matter, they can be zero). I checked the code and it seems branches regarding the existance of these hooks have been added in 8816883 and 6cc7e1d. And all these seem to be in connection with d010357.
(I made a check: If I remove the appropriate calls to tlb_set_dirty(), it works ok without hooks.)
I do not really understand the connection between tlb_set_dirty() and read/write hooks.
Unfortunately I do not have an example that easily fits here.

The text was updated successfully, but these errors were encountered:

wtdcode · 2024-10-23T01:22:06Z

I assume this is #2031 , could you have a check?

boborjan2 · 2024-10-24T13:05:39Z

Unfortunately no. In my case, #2031 fails (differs) even with mem hooks registered. Also, it min 2 times slower. I gonna try to figure out when (i.e which commit) it went wrong. It did work in september.

boborjan2 · 2024-10-24T13:23:40Z

It did work on 379791a (end of day 4th Sept). With and without mem hooks.
However if I cherry pick d010357 on top, I get the same behavior where execution depends on mem hooks being registered. Sadly, this very commit is the one responsible for the speed improvement as well.

PhilippTakacs · 2024-10-24T16:58:41Z

Can you provide a sample?

boborjan2 · 2024-10-24T19:33:28Z

It is not trivial to extract a shellcode size sample. What I see with #2031 is that the phenomenon is consistent with a correct execution flow with missing write mem hook. (i.e it is different from what I saw without #2031: there I experienced a change in execution flow when I enabled an empty read or write hook. Here the execution flow is correct with or without hook registered but when it is registered, some callbacks are missing) This is my current understanding, I keep looking.

PhilippTakacs · 2024-10-25T08:39:10Z

Can you check what happend when you apply following patch:

diff --git a/qemu/accel/tcg/cputlb.c b/qemu/accel/tcg/cputlb.c
index 211d0ec1..3de68e67 100644
--- a/qemu/accel/tcg/cputlb.c
+++ b/qemu/accel/tcg/cputlb.c
@@ -1188,23 +1188,23 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env, target_ulong addr)
 
 static bool uc_mem_hook_installed(struct uc_struct *uc, target_ulong paddr)
 {
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_FETCH_UNMAPPED, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_FETCH_UNMAPPED))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_READ_UNMAPPED, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_READ_UNMAPPED))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_READ, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_READ))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_READ_PROT, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_READ_PROT))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_FETCH_PROT, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_FETCH_PROT))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_READ_AFTER, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_READ_AFTER))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_WRITE, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_WRITE))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_WRITE_UNMAPPED, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_WRITE_UNMAPPED))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_WRITE_PROT, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_WRITE_PROT))
         return true;
     return false;
 }

Also, it min 2 times slower

There is not much I can do about this. The problem is by using tlb_set_dirty most of the unicorn code will be skipped for mem access. This allows faster execution but prevents self modifying code and memory hooks.

wtdcode · 2024-10-25T08:43:30Z

What happens to self modifying code? Does the optimization break it? From: PhilippTakacs ***@***.***> Date: Friday, 25 October 2024 at 4:39 PM To: unicorn-engine/unicorn ***@***.***> Cc: lazymio ***@***.***>, Comment ***@***.***> Subject: Re: [unicorn-engine/unicorn] Execution depends on whether UC_HOOK_MEM_READ/_WRITE is registered (Issue #2041) Can you check what happend when you apply following patch: diff --git a/qemu/accel/tcg/cputlb.c b/qemu/accel/tcg/cputlb.c index 211d0ec1..3de68e67 100644

…

--- a/qemu/accel/tcg/cputlb.c +++ b/qemu/accel/tcg/cputlb.c @@ -1188,23 +1188,23 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env, target_ulong addr) static bool uc_mem_hook_installed(struct uc_struct *uc, target_ulong paddr) { - if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_FETCH_UNMAPPED, paddr)) + if (HOOK_EXISTS(uc, UC_HOOK_MEM_FETCH_UNMAPPED)) return true; - if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_READ_UNMAPPED, paddr)) + if (HOOK_EXISTS(uc, UC_HOOK_MEM_READ_UNMAPPED)) return true; - if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_READ, paddr)) + if (HOOK_EXISTS(uc, UC_HOOK_MEM_READ)) return true; - if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_READ_PROT, paddr)) + if (HOOK_EXISTS(uc, UC_HOOK_MEM_READ_PROT)) return true; - if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_FETCH_PROT, paddr)) + if (HOOK_EXISTS(uc, UC_HOOK_MEM_FETCH_PROT)) return true; - if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_READ_AFTER, paddr)) + if (HOOK_EXISTS(uc, UC_HOOK_MEM_READ_AFTER)) return true; - if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_WRITE, paddr)) + if (HOOK_EXISTS(uc, UC_HOOK_MEM_WRITE)) return true; - if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_WRITE_UNMAPPED, paddr)) + if (HOOK_EXISTS(uc, UC_HOOK_MEM_WRITE_UNMAPPED)) return true; - if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_WRITE_PROT, paddr)) + if (HOOK_EXISTS(uc, UC_HOOK_MEM_WRITE_PROT)) return true; return false; } Also, it min 2 times slower There is not much I can do about this. The problem is by using tlb_set_dirty most of the unicorn code will be skipped for mem access. This allows faster execution but prevents self modifying code and memory hooks. — Reply to this email directly, view it on GitHub<#2041 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AHJULO26TSEQSI4AD2MKGODZ5H7UJAVCNFSM6AAAAABQM7NDKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZXGIZDCNJYGU>. You are receiving this because you commented.Message ID: ***@***.***>

PhilippTakacs · 2024-10-25T08:53:12Z

What happens to self modifying code? Does the optimization break it?

As far as I see yes. I assume the problem is when one tb writes two times itself for the first one the tb-cache is cleared and not dirty is set, but for the second write the fast path is taken so no rebuild of the tb. I'm currently still at debugging this to understand what exactly happen and to enable the optimization whenever possible.

boborjan2 · 2024-10-25T09:17:14Z

Without Optimize Notdirty write #2031: in d010357, it is indeed tlb_set_dirty() that causes the bad behavior in notdirty_write(). Previously it had been never called (cpu_physical_memory_is_clean() always evaluated to true), after the patch it is always called. After the patch, the code in i386/tcg-target.inc, in tcg_out_tlb_load(), makes the difference:

    // Unicorn: fast path if hookmem is not enable
    if (!HOOK_EXISTS(s->uc, UC_HOOK_MEM_READ) && !HOOK_EXISTS(s->uc, UC_HOOK_MEM_WRITE))
        tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
    else
        /* slow_path, so data access will go via load_helper() */
        tcg_out_opc(s, OPC_JMP_long, 0, 0, 0);

Taking the slow path always gets rid of the dependency of mem hooks being registered.

With Optimize Notdirty write #2031: in my case, execution does not touch to uc_mem_hook_installed() (verified via debugger), so any patch there won't help. I am not using snaphots. Debugging shows that tlbe->addr_code never gets equal to -1, so uc_mem_hook_installed() is never called. This means that tlb_set_dirty() is never called. This is similar to the behavior before d010357 (no dependence on mem hooks) with the difference that some mem hooks are missed.
Thanks for the quick response btw.

PhilippTakacs · 2024-10-25T09:51:50Z

Can you install your memhook in a version before d010357 and check if all mem hooks get called?

It confuses me, because tlb_set_dirty() is not called like before but now some mem hooks are missing.

boborjan2 · 2024-10-25T09:52:15Z

It seems that the mem callbacks are missing because uc->size_recur_mem is not zero in store_helper(). The stores do happen, as i said, execution flow seems ok.

boborjan2 · 2024-10-25T09:54:50Z

Can you install your memhook in a version before d010357 and check if all mem hooks get called?

It confuses me, because tlb_set_dirty() is not called like before but now some mem hooks are missing.

before d010357 , mem hooks do happen and executing does not depend on hooks being installed. I made this check under 1) above.

so

before d010357: hooks do happen and are ok
after d010357: hooks do happen but execution depends on whether I install mem hooks
after Optimize Notdirty write #2031 : execution does not depend on mem hooks but if i register them, some are missing because in store_helper(),
uc->size_recur_mem is not zero

PhilippTakacs · 2024-10-25T10:11:37Z

Can you try following patch:

diff --git a/qemu/accel/tcg/translate-all.c b/qemu/accel/tcg/translate-all.c
index 217522d0..d7608b74 100644
--- a/qemu/accel/tcg/translate-all.c
+++ b/qemu/accel/tcg/translate-all.c
@@ -1843,6 +1843,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         tlb_reset_dirty_by_vaddr(cpu, pc & TARGET_PAGE_MASK,
                                 (pc & ~TARGET_PAGE_MASK) + tb->size);
     }
+    uc->size_recur_mem = 0;
 
     /*
      * No explicit memory barrier is required -- tb_link_page() makes the

wtdcode · 2024-10-25T10:19:01Z

uc->size_recur_mem is a pretty old hack when prototyping Unicorn2. I failed to understand why it was there a few months ago, too.

boborjan2 · 2024-10-25T10:59:43Z

Can you try following patch:

diff --git a/qemu/accel/tcg/translate-all.c b/qemu/accel/tcg/translate-all.c
index 217522d0..d7608b74 100644
--- a/qemu/accel/tcg/translate-all.c
+++ b/qemu/accel/tcg/translate-all.c
@@ -1843,6 +1843,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         tlb_reset_dirty_by_vaddr(cpu, pc & TARGET_PAGE_MASK,
                                 (pc & ~TARGET_PAGE_MASK) + tb->size);
     }
+    uc->size_recur_mem = 0;
 
     /*
      * No explicit memory barrier is required -- tb_link_page() makes the

This makes it fine. (it is "cpu->uc->size_recur_mem = 0;" in fact, I just want to make sure we see the same sourcetree. ?)

wtdcode · 2024-10-25T11:25:02Z

Can you try following patch:

diff --git a/qemu/accel/tcg/translate-all.c b/qemu/accel/tcg/translate-all.c
index 217522d0..d7608b74 100644
--- a/qemu/accel/tcg/translate-all.c
+++ b/qemu/accel/tcg/translate-all.c
@@ -1843,6 +1843,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         tlb_reset_dirty_by_vaddr(cpu, pc & TARGET_PAGE_MASK,
                                 (pc & ~TARGET_PAGE_MASK) + tb->size);
     }
+    uc->size_recur_mem = 0;
 
     /*
      * No explicit memory barrier is required -- tb_link_page() makes the

This makes it fine. (it is "cpu->uc->size_recur_mem = 0;" in fact, I just want to make sure we see the same sourcetree. ?)

Cool, this seems the root cause. I will check if this hack is really necessary.

Btw, does this change fail any unit test?

PhilippTakacs · 2024-10-25T12:41:52Z

Btw, does this change fail any unit test?

Not now, but I can check if I can adjust the test of #2031 for this issue.

PhilippTakacs · 2024-10-25T12:44:02Z

it is "cpu->uc->size_recur_mem = 0;" in fact

Yes, my bad.

wtdcode · 2024-10-25T12:44:06Z

Btw, does this change fail any unit test?

Not now, but I can check if I can adjust the test of #2031 for this issue.

Thanks. Please go ahead.

PhilippTakacs · 2024-10-28T12:20:25Z

So now I could reproduce the missing hooks, but I have now double hooks for each write which modifies the current tb. This is the asm:

mov QWORD PTR [rip+0x29], rax
mov WORD PTR [rip+0x0], 0x548
mov eax, DWORD PTR [rax+0x12345678]
nop
nop
nop
mov QWORD PTR [rip-0x08], rax
mov WORD PTR [rip+0x0], 0x548
mov eax, DWORD PTR [rax+0x12345678]
hlt

I have debugged this a bit and found that the cpu_loop_exec_tb() is called with the old value of rip. For example after the second mov the execution is restarted with rip as 0x1007 (code starts at 0x1000). This causes the self modifying code to be executed twice. Is this expected behavior?

boborjan2 · 2024-10-28T13:25:17Z

I confirm I experience two hooks per hit too. I wonder if it's connected to d010357 as well.

PhilippTakacs · 2024-10-29T10:19:51Z

No, the double execution (with smc) also happens before d010357.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execution depends on whether UC_HOOK_MEM_READ/_WRITE is registered #2041

Execution depends on whether UC_HOOK_MEM_READ/_WRITE is registered #2041

boborjan2 commented Oct 22, 2024

wtdcode commented Oct 23, 2024

boborjan2 commented Oct 24, 2024

boborjan2 commented Oct 24, 2024 •

edited

Loading

PhilippTakacs commented Oct 24, 2024

boborjan2 commented Oct 24, 2024

PhilippTakacs commented Oct 25, 2024

wtdcode commented Oct 25, 2024 via email

PhilippTakacs commented Oct 25, 2024

boborjan2 commented Oct 25, 2024 •

edited

Loading

PhilippTakacs commented Oct 25, 2024

boborjan2 commented Oct 25, 2024

boborjan2 commented Oct 25, 2024 •

edited

Loading

PhilippTakacs commented Oct 25, 2024

wtdcode commented Oct 25, 2024

boborjan2 commented Oct 25, 2024

wtdcode commented Oct 25, 2024

PhilippTakacs commented Oct 25, 2024

PhilippTakacs commented Oct 25, 2024

wtdcode commented Oct 25, 2024

PhilippTakacs commented Oct 28, 2024

boborjan2 commented Oct 28, 2024

PhilippTakacs commented Oct 29, 2024

Execution depends on whether UC_HOOK_MEM_READ/_WRITE is registered #2041

Execution depends on whether UC_HOOK_MEM_READ/_WRITE is registered #2041

Comments

boborjan2 commented Oct 22, 2024

wtdcode commented Oct 23, 2024

boborjan2 commented Oct 24, 2024

boborjan2 commented Oct 24, 2024 • edited Loading

PhilippTakacs commented Oct 24, 2024

boborjan2 commented Oct 24, 2024

PhilippTakacs commented Oct 25, 2024

wtdcode commented Oct 25, 2024 via email

PhilippTakacs commented Oct 25, 2024

boborjan2 commented Oct 25, 2024 • edited Loading

PhilippTakacs commented Oct 25, 2024

boborjan2 commented Oct 25, 2024

boborjan2 commented Oct 25, 2024 • edited Loading

PhilippTakacs commented Oct 25, 2024

wtdcode commented Oct 25, 2024

boborjan2 commented Oct 25, 2024

wtdcode commented Oct 25, 2024

PhilippTakacs commented Oct 25, 2024

PhilippTakacs commented Oct 25, 2024

wtdcode commented Oct 25, 2024

PhilippTakacs commented Oct 28, 2024

boborjan2 commented Oct 28, 2024

PhilippTakacs commented Oct 29, 2024

boborjan2 commented Oct 24, 2024 •

edited

Loading

boborjan2 commented Oct 25, 2024 •

edited

Loading

boborjan2 commented Oct 25, 2024 •

edited

Loading