Parallel rendering with an archetypal ECS #1589

ykafia · 2023-01-19T18:25:08Z

ykafia
Jan 19, 2023
Maintainer

Related to #1568

This is a little write up to present the ECS implementation i made recently and i think could hugely benefit Stride, it would also help make rendering in parallel simpler. I hope to get some of your ideas and criticism from it.

Also you should read this as "Maybe an idea if we want to make a Stride 5 version"

Premise

ECS design

In Bungie's multithreaded renderer, to make synchronization easier between game logic and rendering, they decided to copy carefully chosen component data to a separate storage in order to perform rendering work. This puts the copying/extracting phase as a hot path for optimization but allows for the renderer to work on its own thread(s) while the game logic works on the others.

Going from this idea, i started thinking of designing an ECS system but there were short comings.

.NET and allocation

In .NET, what impacts performance is data throughput :

If there's a lot of allocation of long term objects, there's more chance the GC will be called.
If there is fragmentation on the heap, the .NET runtime is occasionally rearranging objects, making sure they can be iterated over faster than if there was fragmentation.
Iterating in a collection of objects is slowed downed due to the fact objects are not stored on the heap contiguously.

Considering ValueTypes :

They are most likely getting allocated on the stack
They are passed by copy (incurring performance cost for large structs)
Iterating in a collection of structs is faster than objects since they are stored contiguously

In the newer .NET versions we are able to use ByRef value types, avoiding unnecessary copies, which removes part of the copy issues.

The new design

Having that in mind, I prototyped an ECS system with a archetypal storage similar to what legion, bevy-ecs and flecs proposed. So i followed some design principles

Entities
Entities are just indices
Components
Components are structs, in arrays/lists they are stored contiguously.
Systems
Systems/Processors should be able to process one or multiple entities, depending one or multiple component types.

Storage

If we want to iterate over entities that have multiple components we have to define a kind of storage that avoids fragmentation.
Given this set of entities with their associated components ( where A, B, C are types)

Entity 0 : Components [ A , B , C ]
Entity 1 : Components [ A , B , C ]
Entity 2 : Components [ A , B ]
Entity 3 : Components [ B , C ]
Entity 4 : Components [ C ]
Entity 5 : Components [ A , B ]
Entity 6 : Components [ B , C ]

We need to find a way to allocate arrays and make sure those arrays do not have empty values/padding. For that we group entities based on their archetypes. An archetype is defined by the types of components an entity has. In our example, we would separate entities like so :


Archetypes<A,B,C>
[
   Entities : [0,1]
   Storages : 
   [
       List<A> [A0,A1]
       List<B> [B0,B1]
       List<C> [C0,C1]
   ]
]
Archetypes<A,B>
[
   Entities : [2,5]
   Storages : 
   [
       List<A> [A2,A5]
       List<B> [B2,B5]
   ]
]
Archetypes<B,C>
[
   Entities : [3,6]
   Storages : 
   [
       List<B> [B3,B6]
       List<C> [C3,C6]
   ]
]

Once processors iterate over entities, they will do so with a constraint on the types present. A processor that deals with component A and B, will interrogate archetypes containing those types (where the archetype types is a supersets of queried types).

We can iterate over a subset of entities by just doing a superset/subset comparison.

There's a big draw back from this way of storing data, adding or removing a component to an entity forces us to move all the components from an archetype to another.
One way FLECS managed this is to create a graph of archetypes where edges link an archetype and all the direct supersets of it ( A * B - A * B * C, A*B - A * B * Z, etc).

Systems/Processors for game logic

This one was weird to think about. I wanted to keep the action of attaching scripts to entities as it feels like it's the best way to add logic to a game for a simple game developer. But with the storage defined above, it was a complex task so i had to get to a simpler API to build.
My initial idea was to use generics to simplify the creation of systems. A user should be able to define a function and use it as a game system. Separating rendering and game logic for better parallelism allows us to keep systems focused on game logic.

This time, inspired from bevy i wanted to have game logic created as simply as :

#[derive(Component)]
struct Person;

#[derive(Component)]
struct Name(String);

fn add_people(mut commands: Commands) {
    commands.spawn((Person, Name("Elaina Proctor".to_string())));
    commands.spawn((Person, Name("Renzo Hume".to_string())));
    commands.spawn((Person, Name("Zayna Nieves".to_string())));
}
fn greet_people(query: Query<&Name, With<Person>>) {
    for name in query.iter() {
        println!("hello {}!", name.0);
    }
}

fn main() {
    App::new()
        .add_startup_system(add_people)
        .add_system(greet_people)
        .run();
}

C#/.NET is not as performant or usable as rust in certain areas especially with tuples and structs but the biggest shortcoming was C#'s generics, which is fairly limited.

So i came up with a kind of systems/processors where users could design their own queries with functions + generics and the library would feed the function with the necessary data. The code looks like this

public struct PersonComponent {}

public struct NameComponent
{
    public string Name {get;set;}
    public NameComponent(string n) { Name = n; }
}

// A static function to be used as an processor/system
public static void GreetingsSystem(Query<NameComponent,PersonComponent>> query1)
{
  
      // query1 is a Query object that is instanciated when the processor for GreetingsSystem is created
      // It contains a function CreateIterator() which returns a ref struct of an iterator that can run over the data 
      var iter = query1.CreateIterator();
      while(iter.Next())
      {
          var (name, person) = iter;
          Console.WriteLine($"Hello {name}!");
      }
}

public class CreatePeople : Processor
{
    public override void Update()
    {
        world
            .CreateEntity()
            .With(new NameComponent("Elaine Maddock"))
            .With<PersonComponent>();
        world
            .CreateEntity()
            .With(new NameComponent("John Edvark"))
            .With<PersonComponent>();
        //...
    }
}

var world = new World();

world.AddStartup<CreatePeople>();
world.AddProcessor(
    (Query<NameComponent,PersonComponent> q1) => GreetingsSystem(q1) 
);

// The start function runs StartupProcessors once in the beginning.
world.Start();

// Update runs one iteration.
world.Update();

Of course this is still a work in progress, i'm in the process of adding async systems and sub-worlds. But so far this API makes the game logic allocation-free and avoids as much computation as possible.

Rendering in Parallel

As you saw, in this ECS implementation, we create a World (equivalent to a SceneInstance), we add Entities with any structs and add processors to create some game logic.

For us to render in parallel, we would have another World object containing its own storage and its own rendering processors. Both worlds would run in parallel.

There would be a point where both the game logic World and the render World would need to be synchronized (typically after each frame) to allow us to extract/copy data from the game logic World and copy it to the render World. Once it's done, they both can run in parallel until the next step of synchronization.

Considerations

Query api

The Query/Processor API is just one way to query the data over the storage. Since the storage is just a way to store data contiguously, there can be many ways to query over it, maybe even designing a script system, mix and match other many different type of query systems.

Serialization

Since the storage is made up of structs and structs can contain object references, there's a need to make sure each reference object is not duplicated during serialization. (Something Stride might already be doing)

Bundles/Plug-ins

There's also the bundle approach I have found interesting in Bevy. Basically bundling component types and processors into a single class/struct implementing an interface. Still WIP in my library but very possible.

manio143 · 2023-01-20T18:48:22Z

manio143
Jan 20, 2023
Maintainer

That's a very interesting concept. I would like to understand more how this archetypal data storage would affect single component processors - e.g. transform updates. Wouldn't there be a higher cost per processor if one needs to perform cross archetypal queries?
I like the idea of using structs and them living in continuous memory, so I'm wondering if those archetypes can be built in top of the storage as a sort of pointers - but I can see how multi component processors introduce complexities with making linear queries over the storage by component type.

N.b. I'm very much for Stride 5 breaking backwards compatibility - see latest discussion in the plugin RFC.

4 replies

manio143 Jan 20, 2023
Maintainer

SQL database comes to my mind - using indexes to perform efficient joins

BenjaBobs Jan 20, 2023

If you were to use a database for backend, such as SQLite, that would also lend itself to easy serialization/deserialization I think.
Not sure if the InMemory performance would be better than a custom solution though.

ykafia Jan 20, 2023
Maintainer Author

That's exactly what the iterator is, à bunch of indices that gets you to an entity relative to a query! (the discord message I made about allocation free updates etc)

Having multiple queries in a system is not going to give much room for optimization but I don't think it will be used a lot. Having them as a possibility gives much more freedom for complex processing, or for rendering, e.g. When we're rendering meshes it's cool to have cameras and meshes in the same processor.

ykafia Jan 20, 2023
Maintainer Author

I'm still implementing a scheduler btw. Depending how the renderer will work I will adjust it with parallelism. My main concern with the API was to have something very close to Stride's design with processors.

BenjaBobs · 2023-01-20T21:43:23Z

BenjaBobs
Jan 20, 2023

Here's a list of benchmarks for popular C# ECS libraries:
https://github.com/Doraku/Ecs.CSharp.Benchmark

The ones using archetypal ECS reach some quite impressive numbers.

4 replies

ykafia Jan 20, 2023
Maintainer Author

Surprising to see there are some archetypal ECS in C#, hadn't seen those. I doubled checked their APIs and they all look similar to legion!

I personally don't like their APIs but will surely check their source code for bettering my library

BenjaBobs Jan 20, 2023

Arch has an Extension library to generate queries using source generators.
I imagine that might also benefit performance.

ykafia Jan 20, 2023
Maintainer Author

Could be! I have to investigate that too, have you been using Arch?

BenjaBobs Jan 20, 2023

I have not, I'm just a lurker who likes to keep tabs on things. 😛

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel rendering with an archetypal ECS #1589

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 8 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Parallel rendering with an archetypal ECS #1589

ykafia Jan 19, 2023 Maintainer

Premise

ECS design

.NET and allocation

The new design

Storage

Systems/Processors for game logic

Rendering in Parallel

Considerations

Query api

Serialization

Bundles/Plug-ins

Replies: 2 comments · 8 replies

manio143 Jan 20, 2023 Maintainer

manio143 Jan 20, 2023 Maintainer

BenjaBobs Jan 20, 2023

ykafia Jan 20, 2023 Maintainer Author

ykafia Jan 20, 2023 Maintainer Author

BenjaBobs Jan 20, 2023

ykafia Jan 20, 2023 Maintainer Author

BenjaBobs Jan 20, 2023

ykafia Jan 20, 2023 Maintainer Author

BenjaBobs Jan 20, 2023

ykafia
Jan 19, 2023
Maintainer

Replies: 2 comments 8 replies

manio143
Jan 20, 2023
Maintainer

manio143 Jan 20, 2023
Maintainer

ykafia Jan 20, 2023
Maintainer Author

ykafia Jan 20, 2023
Maintainer Author

BenjaBobs
Jan 20, 2023

ykafia Jan 20, 2023
Maintainer Author

ykafia Jan 20, 2023
Maintainer Author