Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support precise type inference for element indexing #104

Open
LPTK opened this issue Apr 21, 2022 · 6 comments
Open

Support precise type inference for element indexing #104

LPTK opened this issue Apr 21, 2022 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@LPTK
Copy link
Contributor

LPTK commented Apr 21, 2022

The goal is to support precise type inference for the indexing operator a[i].

For instance, if a turns out to have type Array<int>, then a[0] should have type int | undefined; but if a[0] turns out to have a more precise type such as [int, bool, nat], then a[0] should have type int.

(Note: the ... | undefined return type for imprecise indexing type inference is in the process of being implemented in mahzoun99#1.)

The way to do this is as follows.

We should add some meta information in type variables, in addition to upper and lower bounds:

  • A list of "indexed in" info (A, R), which specifies what type the type variable is used to index into A and that the type resulting from this operation is R.

  • A list of "indexed by" info (T, R), which specifies what type the type variable is indexed into by some concrete T and that the type resulting from this operation is R.

For example, if a: ?A and i: ?I , then typing a[i] will result in creating a new type variable ?R and adding (?A, ?R) to the "indexed in" info of type variable ?I.

Then, the job of type inference is simply making sure that all ?I::(?A, ?R) and ?A::(?I, ?R) info remain consistent, by propagating concrete bounds that are added to the relevant type variables.

More specifically: when constraining a type variable with a new concrete lower bound B in the rec function of ConstraintSolver.scala, we should check:

  • its "indexed in" info. For each such (?A, ?R), we should add (B, ?R) to it, and while doing so we should make sure that this (B, ?R) is consistent with the existing lower bounds in ?A;

  • its "indexed by" info. For each such (T, ?R), we should make sure that B is consistent with it.

Example: if add a lower bound 1 | 2 to type variable I0 which has "indexed in" info of (?A0, ?R0), then I add (1 | 2, ?R0) as "indexed by" into in ?A0. Then, if [S, T, U] is a lower bound on?A0, I should look at what the index 1 | 2 yields from that tuple, in this case T and U, and add these as lower bounds in R0.


There is one subtlety to deal with, though: if the (?A, ?R) info we want to register into some ?I has a higher level than ?A, we need to extrude its components to the correct level, using the extrude method.

@LPTK LPTK added the enhancement New feature or request label Apr 21, 2022
@LPTK
Copy link
Contributor Author

LPTK commented Apr 27, 2022

Actually a simpler approach may be possible, although not completely equivalent. But it would be a cleaner first step.

The idea is to just introduce a new form of function-like type, call it IndexType(idx: SimpleType, res: SimpleType), written [idx] ~> res. When typing a[i], just create a fresh ?R and constrain lhs <! [i_ty] ~> ?R. Then solve the constraints in the obvious way.

We get ambiguous constraints when intersected indexed types appear on the LHS, but these will not actually arise from type inference, given the scheme above.

@LPTK LPTK assigned Matheart and unassigned Meowcolm024 Jul 19, 2022
@Matheart
Copy link

can i say for the indexed in case, one of i's upper bounds must be int, and string for the indexed by case?

@Matheart
Copy link

Matheart commented Jul 31, 2022

So the main algorithm is as follows:

  1. We handle the array and record cases separately, where we should write two functions IndexedIn (for array) and IndexedBy (for record) inside the ConstraintSolver class in ConstraintSolver.scala.
  2. These two functions should pass extra meta information to allow more accurate type inference. We should not change this information unless in the rec function where we need to use the info to add ty to TypeProvence. We should also pay attention to levels because incorrect levels may result in unexpected error.

So I only need to write code in the two new functions, also inside the rec function, is my understanding correct?

I still have one question, where should we call the two functions?

@Matheart
Copy link

Matheart commented Aug 1, 2022

For the other algorithm you mentioned, do I still need to write the two functions?

@LPTK
Copy link
Contributor Author

LPTK commented Aug 1, 2022

can i say for the indexed in case, one of i's upper bounds must be int, and string for the indexed by case?

  1. We handle the array and record cases separately, where we should write two functions IndexedIn (for array) and IndexedBy (for record)

No, that's not at all the semantics I proposed above. Please read again:

  • A list of "indexed in" info (A, R), which specifies what type the type variable is used to index into A and that the type resulting from this operation is R.
  • A list of "indexed by" info (T, R), which specifies what type the type variable is indexed into by some concrete T and that the type resulting from this operation is R.

Also for now let's only focus on arrays.

2. These two functions should pass extra meta information to allow more accurate type inference. We should not change this information unless in the rec function where we need to use the info to add ty to TypeProvence. We should also pay attention to levels because incorrect levels may result in unexpected error.

So I only need to write code in the two new functions, also inside the rec function, is my understanding correct?

It seems you have forgotten or you misremember the explanations I gave you in our last meeting. Did you take notes? I proposed op add the "indexed by" and "indexed in" lists as mutable var fields in TypeVariable, and treat them accordingly when constraining type variables (in the case when new upper/lower bounds are registered).

For the other algorithm you mentioned, do I still need to write the two functions?

Let's not consider that one for now.

@Matheart
Copy link

Things to be discussed in the meeting tmr (just want to drop it here so more convenient to take look)

The Remaining Cases:
(1). [1 + 1]: seems it goes into case (_, t: TypeVariable) but idk why implementation is missing at the end

(2). Defined things but not being assigned a value for string, float, etc... , I found (_, TypeRef(TypeName("string") can be only used to match this instead of literal string. To match literal string, I need to write (_, ClassTag(StrLit(_), _). So is it necessary for me to write both of them (which makes code look clumsy)

(3). For the foo case the type inference is still wrong: foo: Array[~undefined] -> nothing. I guess this might be related with my wrong implementation in the extrude and freshen functions. Also I have not done this part yet: if the (?A, ?R) info we want to register into some ?I has a higher level than ?A, we need to extrude its components to the correct level, using the extrude method.

(4). How to match this casedef haha2 : Array[int] haha2[0], and also for string, ... etc

The Report:
take a look at the current draft (finished the part before current problem and solution), and maybe take a look at the complete draft later this week

@LPTK LPTK moved this to 🆕 New in MLscript backlog Aug 31, 2022
@CAG2Mark CAG2Mark moved this from 🆕 New to 💔 Stalled in MLscript backlog Aug 1, 2024
@CAG2Mark CAG2Mark moved this from 💔 Stalled to 🆕 New in MLscript backlog Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants