← Back to team overview

taskflow-dev team mailing list archive

Thoughts on changes to make things more flexible

 

Hi all,

Angus and I were discussing about taskflow on IRC earlier today and a question he brought up as how does a task get its inputs satisfied.

Right now if you have the following (as an example)

@task
def count(context, a, b, c):
  pass

@task(provides=['a', 'b', 'c'])
def provide_count(context):
   return {
      'a': 1,
      'b': 2,
      'c': 3,
   }

flow = linear.Flow()
flow.add(provide_count)
flow.add(count)

Taking this simple example what happens here is that when provide_count (as a task) runs it returns a dictionary. This dictionary can be stored in the 'transaction log' and then when count runs the previous tasks will be examined for a, b, c, and the task that provides these will have its dictionary examined and values extracted and then count will have new kwargs a,b,c with the values returned by provide_count. So far so good.

This though makes the following hard to pull off.

flow = linear.Flow()
flow.add(provide_count)
flow.add(count)
flow.add(count)

This is more difficult since a developer might expect that the second count will use the first counts result, when in fact that is not what will happen (it will basically duplicate the same action as the first count). This makes it a little difficult to reuse functions with different inputs without recreating the count function with variables not called 'a', 'b', 'c' (since we in the @task decorator examine the functions args and automatically pick them up as thing the function requires to run). So this strongly ties a, b, c as the only way to provide values to count (when some other user may want to count inputs x, y, z instead). It seems relatively straightforward to change this to make it more flexible (turn off the automatically inferring what a task/function requires). So that was one thought that came out of the conversation.

The other possible solution that comes to mind is to provide each task a 'object' which they can use to fetch there needed requirements, instead of sending the requirements through kwargs we can just send them via this object and not get tied up in the tasks parameters and kwargs. This seems nice in that we can provide an object which lazily fetches the requirements from the provider tasks when requested (instead of having to fetch them all upfront and pass them in via kwargs). Thoughts here??

Another one was a question on how to establish a more complicated flow.

For example:

A -> B1 ---
               |--- C
Z -> B2 ---

Now the B task here is actually duplicated twice (but its connected to different 'providers', A and Z). So a question came up is how can taskflow accommodate this. One suggestion was that when we do the flow.add() it returns a uuid of the item added, and then when say connecting Z -> B2 one would provide the uuid of Z and the uuid of B2 (thus avoiding connecting Z to B1). This is also related to the above kwargs 'issue' since likely task B will have the same 'requires' but different 'providers' (A and Z) so we need to be able to ensure we select the right provider when running B1 or B2. Keeping uuids around makes this possible I think. Kevin I believe u are working on something like this??

This might look like:

# First add them all
puuid, c1uuid, c2uuid = flow.add_many([provide_count, count, count])

# Now connect them together
flow.connect(puuid, c1uuid)
flow.connect(puuid, c2uuid)

Seems like that will alleviate the problem that angus was hitting.

Thoughts welcome :-)

-Josh

Follow ups