-
-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE REQUEST] Add JS link extraction #961
Comments
Do you have a specific example of the current link extraction feature failing? That would probably be helpful Otherwise, this is a very non-specific request (like, "do link extraction better") 😀 |
For example this site: https://ictcloud.zj-huawei.com/ I define a short field test.txt with the following content:
I extracted many JS links but there are no actual directories. What I mean is that many interfaces are present in the JS response.
|
Great info, thanks One thing you might try is increasing the logging to see what exactly is happening. If you attach the detailed logs here I could take a look If I understand correctly, you're saying that the endpoints are valid and return some success or other distinct status code, but requests to each of the levels of the parent directory tree behave as 404s? Like:
From what I recall, the "deep"/direct link to the API ebdpoint should be tested, even if the directories above it don't appear to exist, because extracted links get direct requests regardless of whether the parent directory structure returns a positive or negative code It's possible that the paths are represented in the JS file in an unusual way and are not being extracted at all by the JS extraction. That extraction is done (at least partially) with regular expressions. Because it's Javascript, paths can take many forms and I believe there are some documented limitations in the current implementation. Because you mentioned another tool is able to identify them, it may be reasonable to make enhancements there; if we're able to determine a precise-ish cause, @epi052 can decide if it's a bug or a feature, and if it should be accounted for with a code change. Note that this is his project, I'm just a bystander trying to help vet the issue 😀 It would be helpful to capture the debug/verbose output, to see where exactly the problem is; I think the first step would be determine if:
If you can attach or paste relevant or full contents of one of the JS files, it would probably be helpful for a quick glance If you can attach or paste debug/verbose logs from the session, they should be authoritative and all that's needed to solve the issue unless it's very subtle and/or complex. I'm guessing it should be possible to see if any of the three items I mentioned is part of the problem by checking the debug logs tl; dr; Can you attach full debug logs and a relevant sample of one or two of the JS files? |
For reference, the regex used by JSFinder is (in Python):
I realize it's not likely that this will be directly applicable to feroxbuster regex, and that there's probably different logic around its use- but maybe helpful |
@JaveleyQAQ I forgot to ask- are you using the latest build? Either master from source, or a package from this repository? If you're using a package manager from a Linux distribution, the issue could be that the build is too old. This was the cause of a similar issue some time ago, in #519 |
howdy @JaveleyQAQ , thanks for submitting this! Also, thanks @mzpqnxow for helping flesh out the issue! Apologies up front, I haven't been as responsive lately as I have been in the past, have just been busy with non-ferox things for the past few months. The expected behavior for ferox is to
for example: a response contains a link fragment
If that's not the behavior you're seeing, we should dig a bit deeper and find out what's going on. Looking forward to your response @JaveleyQAQ |
Sorry, I forgot whether I installed it singly with apt in Kali or used https://github.com/epi052/feroxbuster/releases/download/v2.10.0/x86_64-linux-feroxbuster.zip I will compile the latest version and try again. |
The website I want to provide you cannot access, only Chinese IP addresses can access it. This kind of website uses a build technology called “webpack”. The frontend renders functional interfaces through endpoints in js files. So backend APIs will exist in frontend js files, extracting them is very necessary. Below is a comparison of me accessing the webpack official website using feroxbuster versus using a js endpoint extraction script only. And I placed two js files containing APIs, so I put them in my own github repository. If you can access this website, you can open the developer tools and use global search for keywords like href:, to:", url:, path: these are all endpoint characteristics of webpack. |
I understand your idea that including the base URL during extraction will reduce many false positives, but it will also miss many endpoints. If in JS it is var url = "http://demo.com/". location = url.concat("/api/add") this way will be missed. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
😠 |
sorry about that, thought i had replied. Stalebot helps me remember to do things, lol. so, based on this comment:
you're saying that if we used the command
and found
you would want to request Note the two different domains. |
I want to say that it does not recognize the directories in js files. First, don't consider other domain names, it should extract as many directories in js files as possible. |
... within reason, and perhaps with any sort of changes being "opt-in" and/or experimental- to avoid the burden of testing. Let the users test! 😀 Related to this- I've noticed some pages where links are not scraped despite being very obvious- to the human eye ;) But... I recall that the scope of the scraping was finite (durr) and that certain forms of "links" (those concatenated from multiple variables) were acknowledged to be unsupported at that time Continuing the "But..." and sorry to hijack the thread @JaveleyQAQ.. Maybe if I gather some specific examples I encountered frequently, especially those seen in popular frameworks that are generated programmatically, and therefore predictable- it might be worth considering under a separate issue Or, most likely, I'll have no time and/or will forget 😆 |
ive got some thoughts on how to improve ferox's link extraction, i'll play around with those and see if any yield better results than what they do now |
@epi052 I know you had previously asked me but now I ask you - what address can I reach you at? I had a few specific example URLs for this that aren't suitable for sharing via this issue (sites that I prefer random feroxbuster users to not start hammering!) Thanks! |
not a problem at all: epibar052@gmail.com |
There is already a link extraction but the link in the js file still cannot be extracted. Should be add like
JSFinder to collect it for dictionary requests
The text was updated successfully, but these errors were encountered: