Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/wayback machine #286

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

adaveinthelife
Copy link
Collaborator

This is my first iteration of a new feature for the Google Action: the Wayback Machine. The Wayback Engine function makes requests to metadata on archive.org and alexa rankings to get:

  • the earliest year that archiving began
  • the most recent year of archiving
  • the total number of unique URLs (uses initial URL count and then iterates through each new url count per year)
  • the Alexa World Rank
  • the Alexa US Rank

Still lots of work to be done here. I've attempted to write a basic mechanism for slot filling but it probably needs more helper methods. Hopefully Eugene can help me with that!

@@ -0,0 +1,134 @@
const request = require('request');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not critical, but I'd recommend to reorganize imports:

  1. put apart third part imports and local imports
  2. put improts in the alphabet order by name of lib:
const mustache = require('mustache');
const traverse = require('traverse');
const xml2js = require('xml2js');

You could check such organization in any of my js-files.

Such organization would help to read imports.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, we don't have request libs here, we already use axios (https://github.com/axios/axios). The reason why we use axios (in the single place - interceptors):

  • use single user-agent for all request
  • track request performance

https://github.com/internetarchive/internet-archive-google-action/blob/master/functions/src/setup.js#L31

Actually we could use another lib, but should have real motivation to switch to it.

function handler (app) {
// Check parameter for Wayback qualifier
let wayback = app.params.getByName('wayback');
if (wayback.includes('wayback') === false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't look javascript style way. I'd recommend to reformat it to:

if (!wayback.includes('wayback')) {

}

// Get url parameter
var url = app.params.getByName('url');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend to replace var with const and let. It would be safer.

var url = app.params.getByName('url');

// Create wayback object
var waybackObject = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't set waybackObject just in place? I'd recommend to rewrite it in:

waybackObject = {
  ulr,
  earliestYear: 0,

so you would need later assignment

waybackObject.url = url;

};

waybackObject.url = url;
var archiveQueryURL = 'http://web.archive.org/__wb/search/metadata?q=' + url;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we store all string constants in src/functions/config.js so it would be much simpler to rewrite all endpoints in once if we decide to use a proxy, new API or something else.

@@ -0,0 +1,134 @@
const request = require('request');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, we don't have request libs here, we already use axios (https://github.com/axios/axios). The reason why we use axios (in the single place - interceptors):

  • use single user-agent for all request
  • track request performance

https://github.com/internetarchive/internet-archive-google-action/blob/master/functions/src/setup.js#L31

Actually we could use another lib, but should have real motivation to switch to it.

var XMLparser = new xml2js.Parser();
XMLparser.parseString(allData[1], function (err, result) {
if (err) {
debug('Uh-oh, the XML parser didn\'t work');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should gracefully continue dialog here, for example, propose to retry or ask something else.

@@ -0,0 +1,134 @@
const request = require('request');
const traverse = require('traverse');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should lodash (https://lodash.com/) instead, it is much more powerful and we already have it in this app

@@ -0,0 +1,134 @@
const request = require('request');
const traverse = require('traverse');
const xml2js = require('xml2js');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all new imports should be installed by:

npm install --save xml2js

when you would need new lib for testing you should use instead:

npm install --save-dev xml2js

// Construct response dialog for action
if (waybackObject.alexaUSRank !== 0) {
waybackObject.speech = mustache.render(waybackStrings.speech, waybackObject);
waybackObject.speech += ' and <say-as interpret-as="ordinal">' + waybackObject.alexaUSRank + '</say-as> in the United States';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

});
}

function archiveEngine (archiveJSON, waybackObject) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just general comments. I'd recommend writing pure functions everywhere it possible.

Motivation:

  1. they are much easy to test
  2. easy to reuse

if (err) {
debug('The XML parser didn\'t work');
waybackObject.speech = waybackStrings.error;
dialog.ask(app, waybackObject);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we would like to change dialog flow here we should embed XMLparser.parseString in Promises pipeline. For example by wrapping it to:

new Promise((resolve, reject) => {
  XMLparser.parseString(allData[1].data, function (err, result) {
  //...
})

Because right now we will reach dialog.close(app, waybackObject); whether result you would get in XMLparser.parseString.

Actually, I'd recommend to create unit tests for that part of the code - it would help a lot. All async scenarios are very tricky.

const _ = require('lodash');
const axios = require('axios');
const mustache = require('mustache');
const xml2js = require('xml2js');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should install xml2js with:

npm install --save xml2js

to add it to package.json

waybackObject.latestYear = yearsArray[yearsArray.length - 1];

// Traverse URL category
const traverse = obj => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls put this function to ./utils directory. Btw it is a good candidate for unit tests :)

@@ -37,7 +37,8 @@
"object.entries": "^1.0.4",
"raven": "^2.6.0",
"replaceall": "^0.1.6",
"supports-color": "^5.4.0"
"supports-color": "^5.4.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, how do you use this lib here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm not sure how that ended up here! I believe that's an atom module I use for formatting in the editor. That can come out.

waybackObject.speech = waybackStrings.error;
dialog.ask(app, waybackObject);
} else {
alexaJSON = JSON.parse(JSON.stringify(result));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that you forgot to run resolve here, and btw, you don't need alexaJSON here, because you can just pass a result to resolve.

if (err) {
debug('The XML parser didn\'t work');
waybackObject.speech = waybackStrings.error;
dialog.ask(app, waybackObject);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it a good place for reject?

.catch(function (error) {
debug(error.message);
});
alexaEngine(alexaJSON, waybackObject);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems alexaEngine should be inside of Promise then pipeline

@@ -0,0 +1,31 @@
const _ = require('lodash');
const {debug} = require('../utils/logger')('ia:actions:wayback-machine');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'ia:actions:wayback-machine' should be 'ia:actions:utils:traverse'

dialog = mockDialog();
action.__set__('dialog', dialog);
});
it('check to see that a promise is returned with network requests', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please give some space here ;)

resolve(result);
}
});
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to store convertXML promise. You are already inside of Promise.then pipeline so you can return new Promise. And btw you need to pass archiveJSON down the promises chain. So you could use something like:

return Promise.all([archiveJSON, new Promise((resolve, reject) => {
  //...
}])

and btw it would need it because part of code from // Construct response dialog for action will be called before: alexaEngine(JSON.parse(JSON.stringify(fulfilled)), waybackObject);

Btw I'd recommend to write unit test of this pipeline, so you would see how promise will fire one after another.

In additional I'd recommend to put:

XMLparser.parseString(allData[1].data, function (err, result) {

in separate function so it would improve readability of this pipeline.

expect(Promise.resolve()).to.be.a('promise');
});

it('check to see that axios request promises are working', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't test axios library.

describe('wayback machine', () => {
let app;
let dialog;
let archiveRequest = axios.get('http://web.archive.org/__wb/search/metadata?q=cnn.com');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't request data for a test from an ourside. If you would need any data you should create fixture data.

Examples: https://github.com/internetarchive/internet-archive-google-action/tree/master/functions/tests/fixtures

let obj = { earliestYear: 0, latestYear: 0, totalUniqueURLs: 0 };
let result;

action.__set__('archiveEngine', function (a, b) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems this test has never come inside of this mock. You never call handler in this test on others. Actually, it is not clear what: check to see that archiveEngine is working means here, could you be more specific?

config.wayback.ALEXA, app, waybackObject
);

return Promise.all([axios.get(archiveQueryURL), axios.get(alexaQueryURL)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use callback nesting here. One of the main feature of Promises is avoiding callback hell. It should be something like:

action.then(a => {
  return Promise.all(/*....*/);
})
.then(b => {
  return Promise.all(/*....*/);
})
.then(c => {
});

// debug('Inside archiveEngine promise...');
// Create array of capture years and then find earliest year
// and most recent year.
let yearsArray = Object.keys(archiveJSON.captures);
Copy link
Member

@hyzhak hyzhak Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, it is better to sort years, to be sure that they are in the order

Copy link
Member

@hyzhak hyzhak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much better but still need to make few improvements

return Promise.all([archiveEngine(requestData[0].data, waybackObject), xmlConverter(requestData[1].data)]);
})
.then(response => {
return Promise.all([alexaEngine(response[1], waybackObject)]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to wrap alexaEngine( call to Promise.all here. Because you have single async function here. and btw you don't really need alexaEngine to be async (with Promise) because it has sync logic.

debug('Country not found');
debug(e);
}
if (waybackObject.alexaWorldRank > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to check that alexaWorldRank > 0 here, and what should be in case when alexaWorldRank <= 0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend that function alexaEngine would be in the pure-function way: https://en.wikipedia.org/wiki/Pure_function

You don't need to mutate waybackObject here, just return rands:

function getAlexaRanks (alexaJSON) {
  //.... some in logic to extract ranks from alexa json
  return {
    alexaWorldRank: 123456,
    alexsUSRank: 1345,
  };
}

and then you can merge the result inside of handler:

res = {}
// it merges res to empty object and then merge getAlexaRanks
res = Object.assign({}, res, getAlexaRanks (alexaJSON));

});
} // End of handler

function archiveEngine (archiveJSON, waybackObject) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function also should be pure-function and it doesn't need to be async

.then(response => {
return Promise.all([alexaEngine(response[1], waybackObject)]);
})
.then(a => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need argument in arrow function if you don't use it:

.then(() => {
  //... logic
})

Copy link
Member

@hyzhak hyzhak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pure functions! We have significant progress here 👍 But still need to make some improvements

});
} // End of handler

function archiveEngine (archiveJSON) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the inner logic of archiveEngine and alexaEngine are not asynchronous, so you don't need Promises here and there.

function archiveEngine (archiveJSON) {
return new Promise(function (resolve, reject) {
debug('Inside archiveEngine promise...');
let obj = { earliestYear: 0, latestYear: 0, totalUniqueURLs: 0 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be const and res (result) is a more canonic variable name here

return Promise.all([archiveEngine(requestData[0].data), xmlConverter(requestData[1].data)]);
})
.then(response => {
let res = {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't merge response right to waybackObject?


return Promise.all([axios.get(archiveQueryURL), axios.get(alexaQueryURL)])
.then(function (requestData) {
return Promise.all([archiveEngine(requestData[0].data), xmlConverter(requestData[1].data)]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because archiveEngine and alexaEngine are actually sync functions you don't need to wait for .then before process them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confused about this one... Don't I have to wait on Axios requests to complete before feeding them to archiveEngine and alexaEngine? Even if those functions are sync functions, how can they operate before that data has arrived?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, you don't do any async actions inside of archiveEngine or alexaEngine function, in particular, there are no any Axios requests inside of them. So you don't need to wrap them to Promise and return a result, you just can get a result right away. It should simplify all pipeline a lot ;)

Sure you need to get responses from axios.get above.

waybackObject.speech += '.';
}
dialog.close(app, waybackObject);
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want gracefully cover .catch case here because you throw reject/error in the pipeline, and request to a server could get an exception


return Promise.all([axios.get(archiveQueryURL), axios.get(alexaQueryURL)])
.then(function (requestData) {
return Promise.all([archiveEngine(requestData[0].data), xmlConverter(requestData[1].data)]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, you don't do any async actions inside of archiveEngine or alexaEngine function, in particular, there are no any Axios requests inside of them. So you don't need to wrap them to Promise and return a result, you just can get a result right away. It should simplify all pipeline a lot ;)

Sure you need to get responses from axios.get above.

);

return Promise.all([axios.get(archiveQueryURL), axios.get(alexaQueryURL)])
.then(function (requestData) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw you can make this part of code more readable by using arguments unpacking:

.then(([archiveRes, alexaRes]) => {
     return Promise.all([archiveEngine(archiveRes.data), xmlConverter(alexaRes.data)])
})

return res;
} else {
let error = new Error('alexaEngine didn\'t work.');
return error;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems we should throw an error here. return passes it as a successful response.

})
.then(([archiveEngineRes, xmlRes]) => {
waybackObject = Object.assign(waybackObject, archiveEngineRes);
return alexaEngine(xmlRes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alexaEngine is sync function it doesn't return Promise. So you can just merge its response right to waybackObject object without waiting then.

})
.catch(err => {
debug('Wayback handler has an error: ', err);
waybackObject.speech = waybackObject.error;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems waybackObject.error would be undefined here

expect(Promise.resolve()).to.be.a('promise');
});

it('check to see that archiveEngine is working', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adaveinthelife before merging this PR please fix those tests or note that they are not working and should be fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants