-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support high-availability setups for the interactive tools proxy #18481
Conversation
@@ -18,125 +19,129 @@ | |||
) | |||
from galaxy.model.base import transaction | |||
from galaxy.security.idencoding import IdAsLowercaseAlphanumEncodingHelper | |||
from galaxy.util.filelock import FileLock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I may be missing something regarding the FileLock
. What role was it actually playing? Isn't SQLite already taking care of locks?
The pager module effectively controls access for separate threads, or separate processes, or both. Throughout this document whenever the word "process" is written you may substitute the word "thread" without changing the truth of the statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Galaxy has a long history of concurrency issues with respect to SQLite. I cannot promise this FileLock was needed but experience taught me it was better to be safe than sorry here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some information that may help us understand whether the lock is needed in this case.
However, no matter what locking modes are used, SQLite will still always lock the database file once a transaction is started and DML (e.g. INSERT, UPDATE, DELETE) has at least been emitted, and this will block other transactions at least at the point that they also attempt to emit DML. By default, the length of time on this block is very short before it times out with an error.
This behavior becomes more critical when used in conjunction with the SQLAlchemy ORM. SQLAlchemy’s Session object by default runs within a transaction, and with its autoflush model, may emit DML preceding any SELECT statement. This may lead to a SQLite database that locks more quickly than is expected. The locking mode of SQLite and the pysqlite driver can be manipulated to some degree, however it should be noted that achieving a high degree of write-concurrency with SQLite is a losing battle.
2. Situations Where A Client/Server RDBMS May Work Better
[...]
High-volume Websites
SQLite will normally work fine as the database backend to a website. But if the website is write-intensive or is so busy that it requires multiple servers, then consider using an enterprise-class client/server database engine instead of SQLite.
[...]
High Concurrency
SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writers queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds. But there are some applications that require more concurrency, and those applications may need to seek a different solution.
From both excerpts I understand that it is likely that the long history of concurrency issues that Galaxy has with SQLite probably has to do with using the SQLAlchemy ORM under high load (maybe you can either support/refute this?).
As far as I know, the interactive tools mapping file is written only by InteractiveToolSqlite
(or InteractiveToolPropagatorSQLAlchemy
in this refactor) and read only by Galaxy (the same class) and the proxy. I understand that issues would appear if interactive tools are in extremely high demand (Galaxy constantly writes keys and tokens to the file). However, in that case, wouldn't it just make better sense, as the SQLite docs claim, that the Galaxy admin uses PostgreSQL rather than SQLite? Another solution would be increasing the SQLite lock timeout.
Maybe if @jdavcs agrees to go forward with a new config property for mappings based on arbitrary SQLAlchemy database URLs, it may even make sense to discourage the use of the original property interactivetools_map
in the Galaxy docs to favour the use of Postgres? After all, SQLite use is already discouraged.
In that case though, I think we should also have a look together at the proxy-side implementation, because it also has it's drawbacks regarding database setup (maybe it should just be done automatically by Galaxy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From both excerpts I understand that it is likely that the long history of concurrency issues that Galaxy has with SQLite probably has to do with using the SQLAlchemy ORM under high load (maybe you can either support/refute this?).
I think the core of the issue here is just SQLite: "SQLite will still always lock the database file once a transaction is started" - it's just suboptimal for write-intensive scenarios. Using the ORM makes it worse: the autoflush model mentioned in the docs ensures that the state of objects loaded from the db is kept in sync with the data in the db by issuing DML statements before retrieving new data (when needed) - so the db file gets locked much more frequently if we rely on the Session. However, we don't use the ORM here.
If we go with this proposed approached, we can still keep SQLite as an option (it may be still a good enough option for smaller instances or dev purposes, etc.; after all we support SQLite as an option for the main database too). We may consider adding a note to Galaxy's documentation and updating the IT tutorial accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a note (dddf5ba) to the documentation and opened galaxyproject/training-material#5178 and galaxyproject/training-material#5179
…ing for interactive tools Replace the class `InteractiveToolSqlite` in lib/galaxy/managers/interactivetool.py with a new class `InteractiveToolPropagatorSQLAlchemy`. The new class implements a SQLAlchemy "propagator" for `InteractiveToolManager` (on the same file). This propagator writes the mappings to the table named after the value of `DATABASE_TABLE_NAME`, in the database specified by the SQLAlchemy database url passed to its constructor. Change the constructor of `InteractiveToolManager` so that it uses `InteractiveToolPropagatorSQLAlchemy`. Change the method `_process_config` of `galaxy.config.GalaxyAppConfiguration` so that it converts the value of `interactivetools_map` to a SQLAlchemy database url if it is a path. Update documentation to reflect these changes.
355446f
to
b54bf35
Compare
Sounds good! Just make sure that the info field keeps supporting json, as I opted for storing extra attributes to the proxy that are used for path-based interactive tools as json in the "info" field instead of adding new fields to the schema (the info field is not used for SQL queries, so I don't think there is a need to explicitly type this as JSON, plain text should be fine). Currently the path-based ITs were reverted in the EU instance due to an error. I made a PR to revert this for the 24.1 release, now merged: usegalaxy-eu#242 So just make sure that the path-based ITs still run (first commit of that PR)! |
@sveinugu Regarding the JSON, it's stored as plain text. Seems to work fine (see the psql output below from a Galaxy instance). galaxy=> select * from gxitproxy;
key | key_type | token | host | port | info
---------------+---------------------------+---------------+----------------+-------+------------------------------------------------------------------------
326unmiv5sanl | interactivetoolentrypoint | 2jlcp7mzyjsbm | 192.168.122.64 | 32780 | {"requires_path_in_url": false, "requires_path_in_header_named": null}
(1 row) I'll test the path-based ITs and let you know if they're working. |
@sveinugu path-based ITs cause a redirection loop (I am using gx-it-proxy from galaxyproject/gx-it-proxy#21). galaxy=> select * from gxitproxy;
key | key_type | token | host | port | info
---------------+---------------------------+---------------+----------------+-------+-----------------------------------------------------------------------
1540u2qksgef3 | interactivetoolentrypoint | 2ypyg6q4fsp20 | 192.168.122.64 | 32782 | {"requires_path_in_url": true, "requires_path_in_header_named": null}
(1 row) Should they be working? Have you encountered this before? This may be a misconfiguration of my Galaxy instance, as I did not change what the proxy does with the sessions, just where it stores them. |
Sorry for the late reply, have started my vacation. Have you configured the nginx proxy correctly for path-based ITs (second config here in addition to the first config, which is the wildcard one)? (Note that second config should be part of the config for the main Galaxy domain and not appended to the first one, which is only activated for wildcard subdomains. |
Note that I don't think usegalaxy-eu has been configured to support path-based ITs yet, which is more likely the cause than your change. |
Although I am not testing this in production, that is correct. I had to launch the proxy with the argument In any case, this confirms path-based interactive tools are not affected by this PR, nor galaxyproject/gx-it-proxy#21. Below you have a screenshot of both types of interactive tools (path-based and domain-based) running side-by-side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kysrpex!
LGTM!
Please don't merge yet - I may have some comments. Reviewing now.. |
That is true. Gravity does this by default based on the |
I missed it because I am using our Ansible role usegalaxy-eu/ansible-gie-proxy, which does not provide the option to configure |
A few critical notes:
Overall, yeah, sqlite is far from optimal as a backend here. But, in my opinion, we shouldn't be reusing the main db either. FWIW, beaker was mentioned as a possible approach during today's joint meeting (ref: #15216, beaker). |
Also, whenever |
@jdavcs Thank you for the review, this is the sort of insight one does not easily get on their own. On 1 and 2, although it is possible to use any SQLAlchemy database URL, and thus any PostgreSQL db, code is written under the assumption that it should work even if Your review leads me though to ask you the following question: is supporting a PostgreSQL backend via SQLAlchemy a no-go (i.e. you would absolutely have a beaker-based solution for example), or would you be willing to go forward with this (of course addressing the concerns you have mentioned)? In the latter case I propose the following:
As I stated in my first message
the reason for proposing this change is that the interactive tools proxy has been blocking us from having two fully redundant Galaxy servers for a long time. This is negatively affecting us for a variety of reasons, the most important being delayed os updates. For us, it would not be a problem to have a dedicated PostgreSQL database just for the proxy. |
Enable access via path (e.g. `https://usegalaxy.eu/interactivetool/ep/3qg8nstfnl44r/fsjbqy66kln5`) to interactive tools that support it (those for which `requires_domain="False"` in the tool's xml file). More information: - nginx configuration: https://docs.galaxyproject.org/en/master/admin/special_topics/interactivetools.html#nginx-proxy-server-configuration-in-production - interactive tools proxy configuration: galaxyproject/galaxy#18481 (comment), https://github.com/galaxyproject/gx-it-proxy/blob/v0.0.6/lib/proxy.js#L101-L141, https://github.com/galaxyproject/gx-it-proxy/blob/v0.0.6/lib/main.js#L16
@kysrpex thank you for addressing my comments! I think we can go ahead with your proposed approach (i.e., SQLAlchemy backend to support a Postgres database with with protection against reusing Galaxy's main database, programmatically in code + a note in the docs) + a separate config property for the db url. We may opt for a more generic approach down the road, but given EU's urgent need, I think we can be pragmatic and treat this is a good enough solution. With regard to implementation, my suggestion would be to replace all the raw SQL with SQLAlchemy SQL expressions (see SQLAlchemy Core, not ORM). That way we'll get all the benefits of using SQLAlchemy (we rely on SA to translate our statements into SQL appropriate for the database we're using; we get Python objects instead of raw SQL strings, which is less error-prone, is testable, etc.) |
One more general (and minor) comment. Would you be open to using identifiers that are not sqlalchemy-specific? That is, for the config property, the EDIT: that said, I think it's perfectly OK to mention SQLAlchemy in the config property description, because SQLAlchemy determines what databases are currently supported. |
…chemy` is defined Given that `interactivetools_map` is ignored if `interactivetools_map_sqlalchemy` is set, first check `interactivetools_map_sqlalchemy`. If it's not set, then build the sqlite url; but if it is set, then set `interactivetools_map` to `None` for clarity (and to avoid potential bugs in the future) Co-authored-by: John Davis <jdavcs@gmail.com>
…ools_map_sqlalchemy` Authored-by: John Davis <jdavcs@gmail.com>
…tools_map` in the latter's description Co-authored-by: John Davis <jdavcs@gmail.com>
…sert()` or `delete()`
…tem__` Match @jdavcs' coding style.
…emy.get()` Emit a warning when `get()` returns no results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kysrpex thanks, this is looking good! I'll add a few comments inline; however, here's my main question. I've tested this manually and it didn't work correctly for me when enabling the sqlalchemy option: the database table is populated and the job is launched, but when the tool is supposed to become available, I see
Proxy target missing
on a blank screen. It may be due to some obvious local config options I haven't set correctly. However, given that it worked with sqlite settings, I think we should expect it to work with the other setting as well; if any additional config is required, that should be made clear in the documentation. Does it run for you?
@jdavcs Meet the elephant in the room, galaxyproject/gx-it-proxy#21! You need to use that version of the proxy.
One more general (and minor) comment. Would you be open to using identifiers that are not sqlalchemy-specific? That is, for the config property, the
InteractiveToolPropagatorSQLAlchemy
class and theparse_sqlalchemy_url
import alias in the config module. Here's why: although we very heavily rely on SQLAlchemy, it's just a tool, and the functionality we use here could be, hypothetically, implemented with another similar tool. So, semantically, it's really database-related, not sqlalchemy-related. The database URL format is, for the most part, an open standard (ref). Finally, from a practical standpoint, it's much easier to manage the code base when grepping for sqlalchemy returns only import statements.EDIT: that said, I think it's perfectly OK to mention SQLAlchemy in the config property description, because SQLAlchemy determines what databases are currently supported.
Yes, definitely! I think the same. interactivetools_map_sqlalchemy
is a terrible choice. Considering that it may even make sense to deprecate interactivetools_map
in favor of this one at some point, it would be better to choose a name not tied to SQLAlchemy. Something as simple as interactivetoolsproxymap
might work, but please suggest whatever you have in mind, I think brainstorming it is worth it. Another concern is how long-lived this setting would be. That should have an influence in the name and the decision is on you and the rest of the dev team.
Yes, I agree: ideally, we'd have one config property since both of these specify the same "type of thing". That one property would, likely, require a proper db url, just like the |
@jdavcs I removed dead code that was pending to be removed (0b6ad17) and renamed If I did not miss anything else, then finally all points have been covered and the PR is ready to be merged. |
@jdavcs Sorry, we also discussed privately that the config package should not depend on The documentation states that the the two/three databases should not be mixed, so as discussed, this should be enough. It prevents accidental mistakes such as copying and pasting the same value, although not more advanced mistakes such as accessing the same database using two different drivers. A more sophisticated check involving significant extra coding effort would be overengineering imo, specially considering that the old configuration setting |
…connection` and `install_database_connection` Using `urlparse` rather than `sqlalchemy.engine.make_url` decreases the precision of the comparison, but removes the need to declare `sqlalchemy` as a dependency of the config package. This is a tradeoff between convenience (preventing mistakes from the user) and complexity (number of dependencies of the config package).
This reverts commit b6ad695.
2649f65
to
7510372
Compare
With the force push I just fixed a missing comma (Python linting), nothing of importance. |
@kysrpex Looks great! Thank you for addressing all the comments! |
This PR closes #16138. At the moment, information exchanges between the interactive tools proxy and Galaxy are done via a SQLite database. This makes it impossible to have the high-availability (redundant) setup for the proxy that we would like to have at useGalaxy.eu.
To solve the problem, I opted for a relatively simple solution. Since Galaxy already uses SQLAlchemy, I refactored the existing class
InteractiveToolSqlite
from lib/galaxy/managers/interactivetool.py to use SQLAlchemy instead of Python's built-insqlite3
library. The changes are backwards compatible. Settinginteractivetools_map
to a path makes Galaxy use a SQLite database, just like it did before. Setting it to a SQLAlchemy database url allows to use any database supported by SQLAlchemy (e.g. PostgreSQL). Mappings are written to the table "gxitproxy", just as they did before.Of course, this also needs support on the interactive tools proxy's side, which will be contributed separately.
How to test the changes?
(Select all options that apply)
This is a refactor of the "propagator"
InteractiveToolSqlite
in lib/galaxy/managers/interactivetool.py. Tests are already available and can be run using./run_tests.sh -integration test/integration/test_interactivetools_api.py::TestInteractiveToolsIntegration
. I used standard SQL so that the code is compatible with the widest possible variety of databases.License