-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add integration with turnitin/plagiabot/EranBot #24
Conversation
Add a checkbox to allow searching the EranBot/plagiabot database for Turnitin results, and display them in a similar form to the on-wiki EranBot reports if they exist. Add a new module (copyvios/turnitin.py) to handle fetching and parsing the EranBot results. Bug: https://phabricator.wikimedia.org/T110144 TODO: tweak display HTML/CSS; refactor/clean up turnitin.py; improve dev set-up so it doesn't always default to testwiki and can test without hardcoding page title
I'm still working on two front-end bits: figuring out how to keep the new checkbox and its label together when they wrap, and making the report output tabular and more easily readable. |
* Add a wiki timestamp parser to copyvios/misc.py * Refactor copyvios/turnitin.py for more sensible structure * Update templates/index.mako to incorporate diff link/timestamp and make it clearer that Turnitin is revision-based checking
This should be good to go. |
|
||
# extract percent match, words, and URL for each source in the report | ||
extract_info_pattern = re.compile( | ||
r'\n\* \w\s+(\d*)\% (\d*) words at \[(.*?) ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you don't need to escape the slash before the n? i.e. '\n' instead of '\n'. Maybe Python's regex engine is smarter than PHPs :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python's raw string notation is pretty awesome.
I'm not really qualified to review Python code, but it looks OK to me. Just added a couple minor suggestions. |
I'm a bit disappointed in myself – I really should have looked at this by now. I'll try to do it by tonight, though. |
I am going to merge this in, make a few tweaks, and then deploy. I'm describing any changes in edits below:
That's most of it. I'm a bit upset we can't integrate the URL results with the internal comparison tool. Thought: compare them, and if they don't yield a significant percentage, also show turnitin's evaluation? I'll work on it... |
Add integration with turnitin/plagiabot/EranBot
!!!!!!!! https://developer.yahoo.com/boss/search/ !!!!!!!! !?!??!?!??? |
Bug: https://phabricator.wikimedia.org/T110144
Useful test pages: