Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution depends on whether UC_HOOK_MEM_READ/_WRITE is registered #2041

Open
boborjan2 opened this issue Oct 22, 2024 · 22 comments
Open

Execution depends on whether UC_HOOK_MEM_READ/_WRITE is registered #2041

boborjan2 opened this issue Oct 22, 2024 · 22 comments

Comments

@boborjan2
Copy link

Using the dev branch head, I am facing an issue where I see that an execution flow that has been recorded in the past differs from it's recorded value. This binary snipplet includes self modifying code. I can get the old behavior only if I register an empty UC_HOOK_MEM_READ or WRITE hook (parameters do not matter, they can be zero). I checked the code and it seems branches regarding the existance of these hooks have been added in 8816883 and 6cc7e1d. And all these seem to be in connection with d010357.
(I made a check: If I remove the appropriate calls to tlb_set_dirty(), it works ok without hooks.)
I do not really understand the connection between tlb_set_dirty() and read/write hooks.
Unfortunately I do not have an example that easily fits here.

@wtdcode
Copy link
Member

wtdcode commented Oct 23, 2024

I assume this is #2031 , could you have a check?

@boborjan2
Copy link
Author

Unfortunately no. In my case, #2031 fails (differs) even with mem hooks registered. Also, it min 2 times slower. I gonna try to figure out when (i.e which commit) it went wrong. It did work in september.

@boborjan2
Copy link
Author

boborjan2 commented Oct 24, 2024

It did work on 379791a (end of day 4th Sept). With and without mem hooks.
However if I cherry pick d010357 on top, I get the same behavior where execution depends on mem hooks being registered. Sadly, this very commit is the one responsible for the speed improvement as well.

@PhilippTakacs
Copy link
Contributor

Can you provide a sample?

@boborjan2
Copy link
Author

It is not trivial to extract a shellcode size sample. What I see with #2031 is that the phenomenon is consistent with a correct execution flow with missing write mem hook. (i.e it is different from what I saw without #2031: there I experienced a change in execution flow when I enabled an empty read or write hook. Here the execution flow is correct with or without hook registered but when it is registered, some callbacks are missing) This is my current understanding, I keep looking.

@PhilippTakacs
Copy link
Contributor

Can you check what happend when you apply following patch:

diff --git a/qemu/accel/tcg/cputlb.c b/qemu/accel/tcg/cputlb.c
index 211d0ec1..3de68e67 100644
--- a/qemu/accel/tcg/cputlb.c
+++ b/qemu/accel/tcg/cputlb.c
@@ -1188,23 +1188,23 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env, target_ulong addr)
 
 static bool uc_mem_hook_installed(struct uc_struct *uc, target_ulong paddr)
 {
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_FETCH_UNMAPPED, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_FETCH_UNMAPPED))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_READ_UNMAPPED, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_READ_UNMAPPED))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_READ, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_READ))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_READ_PROT, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_READ_PROT))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_FETCH_PROT, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_FETCH_PROT))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_READ_AFTER, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_READ_AFTER))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_WRITE, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_WRITE))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_WRITE_UNMAPPED, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_WRITE_UNMAPPED))
         return true;
-    if (HOOK_EXISTS_BOUNDED(uc, UC_HOOK_MEM_WRITE_PROT, paddr))
+    if (HOOK_EXISTS(uc, UC_HOOK_MEM_WRITE_PROT))
         return true;
     return false;
 }

Also, it min 2 times slower

There is not much I can do about this. The problem is by using tlb_set_dirty most of the unicorn code will be skipped for mem access. This allows faster execution but prevents self modifying code and memory hooks.

@wtdcode
Copy link
Member

wtdcode commented Oct 25, 2024 via email

@PhilippTakacs
Copy link
Contributor

What happens to self modifying code? Does the optimization break it?

As far as I see yes. I assume the problem is when one tb writes two times itself for the first one the tb-cache is cleared and not dirty is set, but for the second write the fast path is taken so no rebuild of the tb. I'm currently still at debugging this to understand what exactly happen and to enable the optimization whenever possible.

@boborjan2
Copy link
Author

boborjan2 commented Oct 25, 2024

  1. Without Optimize Notdirty write #2031: in d010357, it is indeed tlb_set_dirty() that causes the bad behavior in notdirty_write(). Previously it had been never called (cpu_physical_memory_is_clean() always evaluated to true), after the patch it is always called. After the patch, the code in i386/tcg-target.inc, in tcg_out_tlb_load(), makes the difference:
    // Unicorn: fast path if hookmem is not enable
    if (!HOOK_EXISTS(s->uc, UC_HOOK_MEM_READ) && !HOOK_EXISTS(s->uc, UC_HOOK_MEM_WRITE))
        tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
    else
        /* slow_path, so data access will go via load_helper() */
        tcg_out_opc(s, OPC_JMP_long, 0, 0, 0);

Taking the slow path always gets rid of the dependency of mem hooks being registered.

  1. With Optimize Notdirty write #2031: in my case, execution does not touch to uc_mem_hook_installed() (verified via debugger), so any patch there won't help. I am not using snaphots. Debugging shows that tlbe->addr_code never gets equal to -1, so uc_mem_hook_installed() is never called. This means that tlb_set_dirty() is never called. This is similar to the behavior before d010357 (no dependence on mem hooks) with the difference that some mem hooks are missed.
    Thanks for the quick response btw.

@PhilippTakacs
Copy link
Contributor

Can you install your memhook in a version before d010357 and check if all mem hooks get called?

It confuses me, because tlb_set_dirty() is not called like before but now some mem hooks are missing.

@boborjan2
Copy link
Author

It seems that the mem callbacks are missing because uc->size_recur_mem is not zero in store_helper(). The stores do happen, as i said, execution flow seems ok.

@boborjan2
Copy link
Author

boborjan2 commented Oct 25, 2024

Can you install your memhook in a version before d010357 and check if all mem hooks get called?

It confuses me, because tlb_set_dirty() is not called like before but now some mem hooks are missing.

before d010357 , mem hooks do happen and executing does not depend on hooks being installed. I made this check under 1) above.

so

  • before d010357: hooks do happen and are ok
  • after d010357: hooks do happen but execution depends on whether I install mem hooks
  • after Optimize Notdirty write #2031 : execution does not depend on mem hooks but if i register them, some are missing because in store_helper(),
    uc->size_recur_mem is not zero

@PhilippTakacs
Copy link
Contributor

Can you try following patch:

diff --git a/qemu/accel/tcg/translate-all.c b/qemu/accel/tcg/translate-all.c
index 217522d0..d7608b74 100644
--- a/qemu/accel/tcg/translate-all.c
+++ b/qemu/accel/tcg/translate-all.c
@@ -1843,6 +1843,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         tlb_reset_dirty_by_vaddr(cpu, pc & TARGET_PAGE_MASK,
                                 (pc & ~TARGET_PAGE_MASK) + tb->size);
     }
+    uc->size_recur_mem = 0;
 
     /*
      * No explicit memory barrier is required -- tb_link_page() makes the

@wtdcode
Copy link
Member

wtdcode commented Oct 25, 2024

uc->size_recur_mem is a pretty old hack when prototyping Unicorn2. I failed to understand why it was there a few months ago, too.

@boborjan2
Copy link
Author

Can you try following patch:

diff --git a/qemu/accel/tcg/translate-all.c b/qemu/accel/tcg/translate-all.c
index 217522d0..d7608b74 100644
--- a/qemu/accel/tcg/translate-all.c
+++ b/qemu/accel/tcg/translate-all.c
@@ -1843,6 +1843,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         tlb_reset_dirty_by_vaddr(cpu, pc & TARGET_PAGE_MASK,
                                 (pc & ~TARGET_PAGE_MASK) + tb->size);
     }
+    uc->size_recur_mem = 0;
 
     /*
      * No explicit memory barrier is required -- tb_link_page() makes the

This makes it fine. (it is "cpu->uc->size_recur_mem = 0;" in fact, I just want to make sure we see the same sourcetree. ?)

@wtdcode
Copy link
Member

wtdcode commented Oct 25, 2024

Can you try following patch:

diff --git a/qemu/accel/tcg/translate-all.c b/qemu/accel/tcg/translate-all.c
index 217522d0..d7608b74 100644
--- a/qemu/accel/tcg/translate-all.c
+++ b/qemu/accel/tcg/translate-all.c
@@ -1843,6 +1843,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
         tlb_reset_dirty_by_vaddr(cpu, pc & TARGET_PAGE_MASK,
                                 (pc & ~TARGET_PAGE_MASK) + tb->size);
     }
+    uc->size_recur_mem = 0;
 
     /*
      * No explicit memory barrier is required -- tb_link_page() makes the

This makes it fine. (it is "cpu->uc->size_recur_mem = 0;" in fact, I just want to make sure we see the same sourcetree. ?)

Cool, this seems the root cause. I will check if this hack is really necessary.

Btw, does this change fail any unit test?

@PhilippTakacs
Copy link
Contributor

Btw, does this change fail any unit test?

Not now, but I can check if I can adjust the test of #2031 for this issue.

@PhilippTakacs
Copy link
Contributor

it is "cpu->uc->size_recur_mem = 0;" in fact

Yes, my bad.

@wtdcode
Copy link
Member

wtdcode commented Oct 25, 2024

Btw, does this change fail any unit test?

Not now, but I can check if I can adjust the test of #2031 for this issue.

Thanks. Please go ahead.

@PhilippTakacs
Copy link
Contributor

So now I could reproduce the missing hooks, but I have now double hooks for each write which modifies the current tb. This is the asm:

mov QWORD PTR [rip+0x29], rax
mov WORD PTR [rip+0x0], 0x548
mov eax, DWORD PTR [rax+0x12345678]
nop
nop
nop
mov QWORD PTR [rip-0x08], rax
mov WORD PTR [rip+0x0], 0x548
mov eax, DWORD PTR [rax+0x12345678]
hlt

I have debugged this a bit and found that the cpu_loop_exec_tb() is called with the old value of rip. For example after the second mov the execution is restarted with rip as 0x1007 (code starts at 0x1000). This causes the self modifying code to be executed twice. Is this expected behavior?

@boborjan2
Copy link
Author

I confirm I experience two hooks per hit too. I wonder if it's connected to d010357 as well.

@PhilippTakacs
Copy link
Contributor

No, the double execution (with smc) also happens before d010357.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants