Opened 9 years ago

Closed 9 years ago

#611 closed task (fixed)

ChooseCacheSize picks crazy numbers

Reported by: Philip Taylor Owned by: Jan Wassenberg
Priority: Must Have Milestone: Alpha 5
Component: Core engine Keywords:
Cc: Patch:

Description

Re r8319:

I'm on 64-bit Linux with 4GB RAM. In ChooseCacheSize, I get values like total == 3924, available == 303. (I've apparently got around 1.2GB of OS caches/buffers, which are not counted as available - the available figure is always pretty low). That results in ChooseCacheSize choosing its "below min-spec" path (which is silly) and returning 2304 MB (which is also silly, and if this were a 32-bit OS (with PAE) it could be trying to allocate more than the entire virtual address space which sounds problematic).

The calculation seems strange anyway, and lacking in rationale. We don't have 500MB of (public) game data (it's more like 250MB), the 'game' isn't 400MB (it depends on the amount of data and it often looks more like 200MB for me (including file cache)), we don't load all the data at once (the point of biomes was partly to cut the amount of data required per map by only using a well-defined subset). Also we have lots of other caches that reduce the need for everything to be kept in the file cache (I'm assuming it's only useful if a file gets loaded twice) and that contend with the file cache for RAM (it seems better to spend the RAM on higher-level caches to save on loading cost), so it's not clear that a large file cache has any benefit and/or doesn't cause any harm. Also it's contending with the OS disk cache, which will cache the compressed .zip file and therefore make better use of RAM than our own file caching.

Also it looks like OperatingSystemFootprint is going to give a highly uninformative assertion failure message as soon as somebody runs the game on Windows 8 (and even when we update the code, people will still try running old releases), which sounds like a bad idea.

So it seems that this change is adding a lot of complexity and unpredictable non-deterministic system-dependent performance-affecting behaviour, without it being clear that it's any better than the old fixed-size cache. So I'd be much happier if either it was made much simpler again, or if it was made clear to me why I'm missing the point and it's actually a good idea :-)

Change History (21)

comment:1 Changed 9 years ago by Philip Taylor

Via historic_bruno on IRC: On 32-bit XP with Intel 915GM, if ChooseCacheSize returns 1024*MiB, then the game runs with the default software OpenGL 1.1 instead of the proper GL drivers. If it returns 768*MiB (or 512 or 96 etc) then GL works properly.

This seems a bit bizarre, but I suppose it's not implausible that things will break (e.g. fail some allocations during driver initialisation?) when half the entire virtual address space is taken up by a single contiguous allocation.

Probably we should avoid taking up half the entire virtual address space for this.

comment:2 Changed 9 years ago by Jan Wassenberg

(In [8329]) better ChooseCacheSize?: always clamp to 500 MB, remove assertion (refs #611)

comment:3 Changed 9 years ago by Jan Wassenberg

There is a need for a non-constant size because the previous 96 MB is definitely too small (saw a good deal of evictions while debugging the recent assertion in headerless.cpp), but a 512 MB system can't afford the 200..300 MB that would otherwise be a reasonable default. If there were infinite (=64-bit :P) address space, it wouldn't even be a bad idea to choose $LOTS, because physical memory is only committed as needed.

available == 303

Ooh, bummer. Windows reports cache/buffers as available. Any ideas on how to get reasonable numbers on Linux?

500MB of (public) game data (it's more like 250MB)

I'm going by the VFS size, which includes internal. The svn data folder is (including its .base copy) also 1 GB, so that seems in agreement.

the 'game' isn't 400MB

hm, I've seen ~350 in the past, but 300 is probably a more reasonable default.

Also we have lots of other caches that reduce the need for everything to be kept in the file cache (I'm assuming it's only useful if a file gets loaded twice)

Yeah, anything that's decoded (like textures) is better cached at a higher level. However, the block cache is a big win for files in archives (even if only loaded once), because IOs have to start at even block boundaries, which usually requires rounding down the file offsets (so blocks at the edge of files get read twice).

Also it looks like OperatingSystemFootprint is going to give a highly uninformative assertion failure message as soon as somebody runs the game on Windows 8 (and even when we update the code, people will still try running old releases), which sounds like a bad idea.

Good point, removed.

Via historic_bruno on IRC: On 32-bit XP with Intel 915GM, if ChooseCacheSize returns 1024*MiB, then the game runs with the default software OpenGL 1.1 instead of the proper GL drivers. If it returns 768*MiB (or 512 or 96 etc) then GL works properly.

heh, that's funny. Lack of address space would be a problem, it is now clamped to 500.

How's the new version? You should get 500 now.

comment:4 Changed 9 years ago by Kieran P

Not sure I fully understand the changes. Does this mean more memory is used ingamenow? CPU never goes above 40% and memory never goes about 50-60%, so there is plenty left to use, and it would be nice to do so in order to support more units, aka more path finding at once

comment:5 Changed 9 years ago by Jan Wassenberg

Low CPU usage is probably because you have a multicore CPU, and most of our work is done in the main thread. This change is only about how much of memory is used for caching. The unit count isn't limited by memory, but more rather the renderer and pathfinder.

comment:6 Changed 9 years ago by historic_bruno

jan: new version seems to work ok for me now.

Why not just have this be a config setting? Then you could test if the config setting is reasonable, if so, use that, otherwise some smaller value.

comment:7 Changed 9 years ago by Jan Wassenberg

Glad to hear it.

IMO (this is colored by my day job, automated image analysis), user-defined parameters are basically an admission of defeat, passing the buck on to someone else. There are hard facts available (we know the size of our working set, and the available memory), so it seems this would be a feasible problem. The alternative of leaving it up to the user is less than optimal insofar as many wouldn't know about it at all, and others would be correct to think "what, can't you decide that for yourself"?

comment:8 in reply to:  3 Changed 9 years ago by Philip Taylor

Replying to jan:

There is a need for a non-constant size because the previous 96 MB is definitely too small (saw a good deal of evictions while debugging the recent assertion in headerless.cpp)

Huh, that seems weird. If I launch Arcadia, leave, launch Arcadia again, then quit, with a 96MB file cache, I get

Hits: 48 (0.097441 MB); misses 1300 (28.5927 MB); ratio: 3%

It's only loading 30MB of data but it seems to throw an entry out of the cache pretty much every time it loads a new file, so everything misses and the cache is useless. With 512MB I get pretty much the same figures - r8329 makes the cache useless again. With 1024MB it doesn't throw anything away and I get 10MB cache hits (which is all the per-map data for the start of this map).

Looks like this is because it allocates in blocks of 1MB (BLOCK_SIZE), so a 512MB cache is really a 512 file cache, and we load more than 512 files per map. The data size is irrelevant.

So it seems RAM is not an issue at all, the problem is solely virtual address space. Even if we could allocate a 1GB cache without breaking these Intel drivers, we could still only store 1024 files in it, and we'll often load more files than that, so the cache is often completely broken.

Sounds like we need to significantly reduce BLOCK_SIZE so there's not so much internal fragmentation. (Can that be done trivially, or does it need the file cache to be redesigned?). Once that problem is solved, a 96MB cache should be plenty, given that Arcadia only seems to need about 10MB of data files.

Yeah, anything that's decoded (like textures) is better cached at a higher level. However, the block cache is a big win for files in archives (even if only loaded once), because IOs have to start at even block boundaries, which usually requires rounding down the file offsets (so blocks at the edge of files get read twice).

This size issue is just about the file cache, not the block cache - the block cache only stores 16 blocks, as far as I can tell. So the small block cache makes reading mostly-adjacent files from zips faster, and the high-level texture/etc caches can prevent us from reloading files at all, and I'm not sure where the usefulness of the intermediate file cache is.

comment:9 Changed 9 years ago by Jan Wassenberg

Uhoh, good call on BLOCK_SIZE. Just reducing it would impact IO efficiency and the block cache, so what I need to do is decouple that and file cache alignment. The only constraint is that the addresses must be sector-aligned, so 4K might suffice and 64K will definitely be enough.

given that Arcadia only seems to need about 10MB of data files.

hm, that sounds rather doubtful. I suspect underreporting, since a lot of the file_stats seems to be inactive (not hooked up again after VFS redesign).

the block cache only stores 16 blocks, as far as I can tell.

Yep, I saw that - not good. Need to measure its hit rate and possibly tie its size to the file cache size.

I'm not sure where the usefulness of the intermediate file cache is.

Not everything is cached on a higher level as it should be (PMD?). Even if its hit rate is 0, the file cache is used to efficiently allocate aligned IO buffers, so its size mustn't be too small.

comment:10 in reply to:  9 Changed 9 years ago by Philip Taylor

Replying to jan:

given that Arcadia only seems to need about 10MB of data files.

hm, that sounds rather doubtful. I suspect underreporting, since a lot of the file_stats seems to be inactive (not hooked up again after VFS redesign).

Ah - if I use strace to log all open() syscalls then it looks more like 20MB of files that it's loading. Still not a huge amount.

I'm not sure where the usefulness of the intermediate file cache is.

Not everything is cached on a higher level as it should be (PMD?).

They're cached by CMeshManager (but the cache is reset after leaving a map, like most of the other high-level caches - if that's a problem it probably would be best to change their lifetimes and give them proper eviction strategies). I think the only thing that's reloaded frequently is GUI XMB files, since they're reparsed every time you e.g. open a new message box. (Could the VFS API let us tell it that these GUI files are more likely to be reloaded and should be kept in the cache longer than other files?)

comment:11 Changed 9 years ago by philip

(In [8371]) Avoid wasting address space on currently-broken cache (see #611).

comment:12 Changed 9 years ago by Kieran P

Milestone: OS Alpha 2OS Alpha 3

Pushing this back to the next release. The changes were reverted, it's now back to a constant size. Needs more investigation, but worth pursuing.

comment:13 Changed 9 years ago by Kieran P

Priority: majorcritical

comment:14 Changed 9 years ago by Kieran P

Milestone: Alpha 3Alpha 4

comment:15 Changed 9 years ago by Kieran P

Priority: criticalminor

comment:16 Changed 9 years ago by Kieran P

Type: defecttask

comment:17 Changed 9 years ago by Kieran P

Milestone: Alpha 4Alpha 5

comment:18 Changed 9 years ago by Kieran P

Milestone: Alpha 5Backlog

comment:19 Changed 9 years ago by Kieran P

Milestone: BacklogAlpha 5
Priority: Nice to HaveMust Have

comment:20 Changed 9 years ago by Jan Wassenberg

There's finally a good solution, motivated (amusingly enough) by the O_DIRECT rant and by #832. See that ticket for details.

comment:21 Changed 9 years ago by Jan Wassenberg

Resolution: fixed
Status: newclosed

(In [9445]) fix ChooseCacheSize? - more simple, no longer relies on available-memory reporting (because Linux has a weird understanding of "available") closes #611, refs #832

Note: See TracTickets for help on using tickets.