Opened 6 years ago

Closed 2 years ago

#1089 closed defect (fixed)

[PATCH] Properly serialize AI data in saved games

Reported by: Ben Brian Owned by: mimo
Priority: Must Have Milestone: Alpha 19
Component: AI Keywords:
Cc: Yves Patch:

Description

AIs require some serializer enhancements to properly save/load their data, unlike the rest of the simulation. We used a hack in Alpha 8, basically resetting the AIs to their initial state when loading a saved game. This is less than ideal because an AI might be in the middle of planning some task when the game was saved, information it would lose under the current system.

Attachments (3)

qbot_full_serialize.patch (50.5 KB) - added by Ben Brian 4 years ago.
AIFix.patch (8.5 KB) - added by wraitii 4 years ago.
global.diff (1.7 KB) - added by mimo 3 years ago.

Download all attachments as: .zip

Change History (28)

comment:1 Changed 5 years ago by Yves

Cc: Yves added

comment:2 Changed 5 years ago by Ben Brian

Owner: set to Ben Brian

comment:3 Changed 5 years ago by Ben Brian

Keywords: serializer ai removed

I'm afraid this is easier said than done. Implementing #406 and #407 help, but qBot requires significant changes to serialize fully without fatal errors. For example, no functions can be added to its state which breaks entity collection filters and a number of other things. After fixing those, the result is possibly too slow (serialization could easily take over 1s) and the state too large (23 MB uncompressed) to be used without excluding some things. I'm not extremely familiar with the AIs to know what can be left out or how to go about it, so maybe someone else who knows the details can come along and finish whatever is started.

The plan is to have at least the infrastructure in place to supports this for ~A14.

comment:4 Changed 4 years ago by ben

In 13429:

Extends binary serializer to support some standard JS classes: Number, String, Boolean. Fixes #406.
Extends binary serializer to support typed arrays.
Extends binary serializer to support custom JS prototype objects in AIs, fixes #407.
Allows full serialization of AIs (not yet implemented). Refs #1089, #1886
Increases binary serializer script backref arena from 8 MB to 16 MB, refs #1842

comment:5 Changed 4 years ago by Ben Brian

Attaching patch to fully serialize qBot using changes from r13429.

It's a significant patch but mostly straightforward once you're used to it. It helps to enable debug annotations in the binary serializer and add some debug output to print constructor names of objects that aren't using the standard Object prototype and which aren't yet registered serializable prototypes. Functions of course can't be stored in the serializable AI state, so e.g. entity collection filters have to be rethought as prototype objects or something else.

The same approach should work for other AIs, but I chose to start with qBot to not conflict with active development of Aegis. Earlier comment still applies, it's too slow and produces too much data to be usable in practice :(

Last edited 4 years ago by Ben Brian (previous) (diff)

Changed 4 years ago by Ben Brian

Attachment: qbot_full_serialize.patch added

comment:6 Changed 4 years ago by wraitii

Am I right in thinking this does a full through serialization of AI state (ie even prototypes, things like that), rather then extract the useful data and serialize only that?

It's really up for debate but I'd rather use the other approach, completely opt-in, where the AI developer returns an array of data that will be serialized (as a string, then), and deserialized, and from which the AI would recreate all necessary data. This would make it much easier to serialize only the strictly needed.

comment:7 in reply to:  6 Changed 4 years ago by Ben Brian

Replying to wraitii:

Am I right in thinking this does a full through serialization of AI state (ie even prototypes, things like that), rather then extract the useful data and serialize only that?

It's really up for debate but I'd rather use the other approach, completely opt-in, where the AI developer returns an array of data that will be serialized (as a string, then), and deserialized, and from which the AI would recreate all necessary data. This would make it much easier to serialize only the strictly needed.

That's what we have now and it's not working :)

But if the AIs were rewritten from scratch with that in mind, and every future developer understood the implications, then it might be practical. But it's going to be very difficult to ensure the AIs don't have internal state that affects their decisions but isn't serialized manually. The more complex the AIs are, the more difficult it will be, and worse it could take a long time to discover such bugs (they may only occur in few code paths).

Then consider what happens at deserialization. We would need the state at the point when each object was initialized to be available at deserialization. What happens if an object constructed from state at turn 0 later gets deserialized at turn 300? The resulting objects may no longer match and we have the same problems we wanted to avoid (OOS on rejoin, or saving not being deterministic). Also keep in mind that even order of de/serialization is important: if the AI iterates properties of an object, the order could differ between the originally initialized object and the reconstructed deserialized one, leading to subtly different behavior.

What data do AIs use that we shouldn't serialize? I can think of:

  • entity and template objects - the AI should never store them and they should never be reachable from the AI state (accessed only through a getter interface that return copies of data), but in fact they are reachable, I found that in my tests and it's easy to verify, that's why the full AI state produced is so large and slow. I think as AIs developed and serialization was broken anyway, nobody paid much attention to that (see why I'm reluctant for an opt-in solution...)
  • large maps (typed arrays) - more debatable but they can be stored very efficiently by the serializer and I believe the main problem is the template data which contains a huge number of strings.

comment:8 Changed 4 years ago by wraitii

I'm not really getting you... You're saying we have opt-in right now and it's not working? Well sure it's not, nothing has be done to make it work (hey, it's opt-in, after all). I mean we could get it to work, though I do agree that this will take a lot of care.

I do not understand your point about deserialization. We'll jsut deserialize what we serialized. I'm not understanding your "init at turn 0 and deserialize at turn 300" argument.

Order would be important, but if we store everything in a javascript associative array, it becomes much less important.

I'm also not getting your point about entity and template objects being unreachable from AI state. First because it seems to me like copying this all the time will be hilariously slow. Secondly because that's not at all what we do and there's a comment somewhere in CcmpAIManager that says "ideally we should freeze the gamestate but that's too slow so we don't do that". I would agree that we don't need to serialize that, but then we would need to call "GetFullRepresentation?" on deserialization, and we can't because AIManager gets deserialized too early.

Large maps: some can be recreated, given the gamestate at deserialization (resource maps…). Some probably will have to be saved by AIs.

comment:9 in reply to:  8 Changed 4 years ago by Ben Brian

Replying to wraitii:

I'm not really getting you... You're saying we have opt-in right now and it's not working? Well sure it's not, nothing has be done to make it work (hey, it's opt-in, after all). I mean we could get it to work, though I do agree that this will take a lot of care.

A lot of care for every AI we maintain, and every AI of the future :) Doesn't that seem much more prone to error and difficulty than designing the API from the ground up to be serializable? (And it's not just me saying this, Philip will tell you the Testbot API which every AI we have is based on, didn't address this problem at all, which is rather significant)

I don't think it's reasonable to expect/require AI devs to be experts on serialization if it can be avoided, and it can, with a better design of the AIs :) AI devs should spend time writing challenging clever AIs, not debugging serialization errors. (Note how well this works with JS simulation components, many have been added without any thought to serialization, because it just works in most cases - it was part of the design.)

I do not understand your point about deserialization. We'll jsut deserialize what we serialized. I'm not understanding your "init at turn 0 and deserialize at turn 300" argument.

If you don't serialize everything, you can't deserialize everything :)

That means the AI logic is responsible for re-constructing objects at deserialization time, and there's a lot of complex state in the AIs, but what data is used to construct them? It has to be the data available to them at deserialization or later, but the simulation state changes every turn. If the "re-constructed" objects don't use the exact same data and likely in the exact same order, they will differ from the original constructed objects, the AI state will differ and could behave differently. Which is an OOS in multiplayer or an AI that breaks after a save in single player.

Maybe I can't prove it, but I have a feeling from looking at AI serialization that it will prove very very difficult to get the post-serialization AI state to match exactly with an opt-in approach, and it will prove as difficult to troubleshoot any errors, because the only observable difference will be different commands issued by the AI or in the partial state we do serialize (and e.g. only 45 minutes into the game on a particular map, when a player rejoins after hitting a code path that doesn't serialize correctly).

I'm also not getting your point about entity and template objects being unreachable from AI state.

It should be unreachable (i.e. iterating the AI object properties should never lead you to raw template or entity data), but it's not, because the AIs hold references to GameState objects which hold references to that data. I'm afraid without being neck deep in AI serialization, it's difficult to see such indirect references :(

I haven't attempted full serialization of Aegis with its shared state, but at least with qBot, that seemed to be the main obstacle. But I do think it would be conceptually easy to fix, even if it meant changing hundreds of lines of the AI code, by factoring out the raw template and entity data into something only accessed via functions (still stored in JS) and easily repopulated during deserialization. Once accessed, it could be cached, so there wouldn't be much copying.

So I guess what I'm advocating is more of an opt-out approach, where the opt-out is built into the API, rather than an opt-in built into every AI :)

comment:10 Changed 4 years ago by wraitii

Okay, I understand your position better. I seem to recall that there were issues with the whole "serialize everything" part though, mainly because AIs cross-referenced too much, and it was too slow?

It might be interesting to serialize the AIs fully and the Shared component only opt-in, since it's much simpler, which would easily avoid copying all templates data and so on.

About your last point, still: I think I see what you mean. Basically using "for" on an AI object, we get the templates? This is unavoidable if we want the AI to access entity/template data without using global objects or passing the shared script fully (indeed the gamestate holds references to them). I think we could either change this, or change their properties so that they can't be reached in a for loop (I'm fairly sure this is doable). Afaik, we are already "factoring out the raw template and entity data into something only accessed via functions (still stored in JS) and easily repopulated during deserialization" mostly (there's a few parts where this data is accessed directly for lazyness).

Finally if we move the shared component to C++, this needs to be taken into account.

comment:11 in reply to:  10 Changed 4 years ago by Ben Brian

Replying to wraitii:

Okay, I understand your position better. I seem to recall that there were issues with the whole "serialize everything" part though, mainly because AIs cross-referenced too much, and it was too slow?

Of course it was slow and huge, when there's 6 MB of template data with quite a few strings in it, that is terribly inefficient to serialize :) To see where I get that number, apply the attached patch which still works with qBot, save some games and open the binary state in a hex editor and compare addresses of the start and end of the _templates object (the .0adsave is just a renamed zip file).

With a qBot on Acropolis I at the start of a game, the entire state was 9 MB, that means 2/3 of it was useless template data, and if you look at the rest, entity data takes up a lot of space too. Other things might come up that we have to consider, but that clearly has to go :)

It might be interesting to serialize the AIs fully and the Shared component only opt-in, since it's much simpler, which would easily avoid copying all templates data and so on.

Yeah, one of the reasons I started working on qBot was because Aegis and its shared state weren't finished and were changing more frequently. The shared state could help and will certainly need to be considered. For one thing, it could cut down on duplication between AIs, reducing the size of the overall state.

About your last point, still: I think I see what you mean. Basically using "for" on an AI object, we get the templates? This is unavoidable if we want the AI to access entity/template data without using global objects or passing the shared script fully (indeed the gamestate holds references to them). I think we could either change this, or change their properties so that they can't be reached in a for loop (I'm fairly sure this is doable). Afaik, we are already "factoring out the raw template and entity data into something only accessed via functions (still stored in JS) and easily repopulated during deserialization" mostly (there's a few parts where this data is accessed directly for lazyness).

We can define properties so they're not enumerable, this is already done with some properties in BaseAI, but I changed that to test the full serialization approach (no properties skipped, so it should work perfectly). Basically, even if the serializer skips some properties, they will have to be restored exactly as before on deserialization, which means adding special handling for them. I would rather that special handling be part of the API, rather than hiding some data and leaving each AI to handle it separately, as it is now (AIs have serialize and deserialize functions that pass "opted-in" data back and forth, called by CmpAIManager)

Last edited 4 years ago by Ben Brian (previous) (diff)

comment:12 Changed 4 years ago by wraitii

I think we should consider changing the global serialization order for AIs. Imo AIs should serialize last and deserialize last (well mostly deserialize last, but I think that implies serialize last). This way, we can use the gamestate sent by the GUI to reconstruct template and entity data on deserialization without saving them (as you said, this is between 5 and 8Mb of data depending on the map). There's no risk of OOS as it's the simulation data that we would use.

I hadn't realized that if we serialize everything… We need to serialize everything :) . Can you give me a little rundown of issues in serializing svn aegis? I'm assuming to do it this way means to change a few things...

comment:13 Changed 4 years ago by wraitii

Milestone: BacklogAlpha 15

Quick bumping: this patch will make the AI reinitialize without errors when you load a game, so you can play, though it's quite likely to be fairly broken.

It should work properly, requires recompiling. I'd be glad if we could have this in A15.

Best thing is that at last Gameboy will stop spamming everything.

Last edited 4 years ago by wraitii (previous) (diff)

Changed 4 years ago by wraitii

Attachment: AIFix.patch added

comment:14 Changed 4 years ago by wraitii

Milestone: Alpha 15Backlog

Committed above fix in [14329].

comment:15 Changed 4 years ago by Josh

Component: Core engineAI

comment:16 Changed 3 years ago by stanislas69

Priority: Should HaveMust Have

comment:17 Changed 3 years ago by mimo

r14329 can not work because GetFullRepresentation? only deals with entities which have been changed (flagged in AIInterface by this.changedEntities. The problem with deserialization is that, while each deserialized entity is correctly flagged in the AIInterface component, this AIInterface component is destroyed and replaced by a new one with an empty list of changedEntities at the end of the deserialization (I've not understood how and why), and thus no entities are transfered to the ais. An easy fix would be to make the changedEntities list a global variable as in the patch attached.

Changed 3 years ago by mimo

Attachment: global.diff added

comment:18 Changed 3 years ago by stanislas69

Keywords: review patch added
Owner: changed from Ben Brian to mimo

comment:19 Changed 3 years ago by stanislas69

Owner: changed from mimo to Ben Brian
Summary: Properly serialize AI data in saved games[PATCH] Properly serialize AI data in saved games

comment:20 Changed 3 years ago by Ben Brian

Type: enhancementdefect

comment:21 Changed 3 years ago by Ben Brian

Owner: Ben Brian deleted

comment:22 Changed 3 years ago by Ben Brian

See #3060.

comment:23 Changed 3 years ago by Ben Brian

Milestone: BacklogAlpha 19

comment:24 Changed 3 years ago by mimo

Keywords: review patch removed

Remove the "review patch" which have been added in comment 18 as the patch is not in a review state.It was a wip version which solve only part of the problems. There are still some entities not initialized in AIProxy when deserializing, and I've not understood the reason.

comment:25 Changed 2 years ago by mimo

Owner: set to mimo
Resolution: fixed
Status: newclosed

In 16626:

fix entities deserialization in AIInterface and remove duplicates entities and templates in sharedAI, refs #3060 and fixes #1089

Note: See TracTickets for help on using tickets.