Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integration with turnitin/plagiabot/EranBot #24

Merged
merged 8 commits into from
Jan 20, 2016

Conversation

fhocutt
Copy link

@fhocutt fhocutt commented Dec 19, 2015

Bug: https://phabricator.wikimedia.org/T110144

  • Add the copyvios/turnitin.py module to query the plagiabot API and parse the results
  • Modify app.py and copyvios/checker.py to do the query and provide the results to the page template
  • Add checkbox and results div to templates/index.mako and associated CSS styling

Useful test pages:

Frances Hocutt added 5 commits December 16, 2015 20:48
Add a checkbox to allow searching the EranBot/plagiabot database for
Turnitin results, and display them in a similar form to the on-wiki
EranBot reports if they exist.

Add a new module (copyvios/turnitin.py) to handle fetching and parsing
the EranBot results.

Bug: https://phabricator.wikimedia.org/T110144

TODO: tweak display HTML/CSS; refactor/clean up turnitin.py;
      improve dev set-up so it doesn't always default to testwiki
      and can test without hardcoding page title
@fhocutt
Copy link
Author

fhocutt commented Dec 19, 2015

I'm still working on two front-end bits: figuring out how to keep the new checkbox and its label together when they wrap, and making the report output tabular and more easily readable.

Frances Hocutt added 2 commits December 18, 2015 18:38
* Add a wiki timestamp parser to copyvios/misc.py
* Refactor copyvios/turnitin.py for more sensible structure
* Update templates/index.mako to incorporate diff link/timestamp and
  make it clearer that Turnitin is revision-based checking
@fhocutt
Copy link
Author

fhocutt commented Dec 23, 2015

This should be good to go.


# extract percent match, words, and URL for each source in the report
extract_info_pattern = re.compile(
r'\n\* \w\s+(\d*)\% (\d*) words at \[(.*?) ')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you don't need to escape the slash before the n? i.e. '\n' instead of '\n'. Maybe Python's regex engine is smarter than PHPs :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python's raw string notation is pretty awesome.

@kaldari
Copy link

kaldari commented Jan 13, 2016

I'm not really qualified to review Python code, but it looks OK to me. Just added a couple minor suggestions.

@earwig
Copy link
Owner

earwig commented Jan 14, 2016

I'm a bit disappointed in myself – I really should have looked at this by now. I'll try to do it by tonight, though.

@earwig earwig self-assigned this Jan 20, 2016
@earwig
Copy link
Owner

earwig commented Jan 20, 2016

I am going to merge this in, make a few tweaks, and then deploy. I'm describing any changes in edits below:

  • Going to start with turnitin unchecked by default, at least until we've verified there are no breaking bugs. (83f5588)
  • ast.literal_eval -> json.loads; probably won't matter in practice but it's generally better/safer (see StackOverflow). (6d940b5) Just kidding, it's not really JSON. That's strange, because I looked through the code and it claims to be JSON. Oh well.
  • Minor fixup for form rendering—should be cleaner now. (0692227, 77a02b5)
  • Calmed down post-freakout (below), now back to work. Ugh.
  • Bugfix. (1100757)
  • Tweak wording; minor fixes or nitpicky adjustments to template. (cd5b6e4, 21bbbbd, etc.)

That's most of it.

I'm a bit upset we can't integrate the URL results with the internal comparison tool.

Thought: compare them, and if they don't yield a significant percentage, also show turnitin's evaluation?

I'll work on it...

earwig added a commit that referenced this pull request Jan 20, 2016
Add integration with turnitin/plagiabot/EranBot
@earwig earwig merged commit d31e24f into earwig:master Jan 20, 2016
@earwig
Copy link
Owner

earwig commented Jan 20, 2016

!!!!!!!! https://developer.yahoo.com/boss/search/ !!!!!!!!

!?!??!?!???

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants