Opened 13 years ago
Closed 13 years ago
#985 closed defect (fixed)
test show: "APIC: all zero"
Reported by: | fabio | Owned by: | Jan Wassenberg |
---|---|---|---|
Priority: | Should Have | Milestone: | Alpha 8 |
Component: | Core engine | Keywords: | |
Cc: | Patch: |
Description
Since some days when I run the tests on my Ubuntu 10.10 I get this output:
Running 253 tests.............................................................APIC: all zero ................................................................................................................................................................................................OK!
Dunno if it's a problem or not...
Attachments (3)
Change History (18)
comment:1 by , 13 years ago
comment:2 by , 13 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
Thanks very much for posting this (and Alex for notifying me). So the CPU is not returning sensible APIC IDs, which is what triggered a recent bug (because that code path didn't receive much testing).
I'd love to understand why it is happening. Would you please attach your logs/system_info.txt?
comment:3 by , 13 years ago
I just ran tests and I see such message too. When I ran 0ad there is also following message:
$ ./pyrogenesis Cache: 500 (total: 3960) MiB TIMER| InitVfs: 253.196 ms TIMER| InitScripting: 6.3663 ms TIMER| CONFIG_Init: 234.818 ms APIC: not unique <...>
by , 13 years ago
Attachment: | system_info.txt added |
---|
by , 13 years ago
Attachment: | system_info-Ben.txt added |
---|
comment:4 by , 13 years ago
Uploaded mine too, since I get this error and it may help determine what we all have in common :)
comment:5 by , 13 years ago
Thanks for the info, guys! I see what's happening with Ben's machine - it's just a single processor, not even HT-enabled, so the only APIC ID is 0. heh, haven't seen such a CPU in a loong time. I will add code to fix that later, but it doesn't really matter in a single-processor context anyway.
Alex: your issue is much more worrisome :S Please insert the following after the current line 74 of apic.cpp:
for(size_t i = 0; i < numIds; i++) debug_printf(L"APIC %d: 0x%x\n", i, processorApicIds[i]);
and also a debug_printf immediately after the "generate fake but legitimate APIC IDs" comment to indicate whether that is happening.
comment:6 by , 13 years ago
Changed code:
static Status InitApicIds() { const Status status = GetAndValidateApicIds(); for(size_t i = 0; i < numIds; i++) debug_printf(L"APIC %d: 0x%x\n", i, processorApicIds[i]); if(status < 0) // failed { // generate fake but legitimate APIC IDs debug_printf(L"generate fake APIC IDs code reached\n"); for(size_t processor = 0; processor < numIds; processor++) processorApicIds[processor] = sortedApicIds[processor] = (ApicId)processor; } return status; }
Few consecutive runs of tests (for .pyrogenesis
similar results):
$ ./test Running 253 tests..................................................................................................................................APIC: not unique APIC 0: 0x2 APIC 1: 0x2 APIC 2: 0x2 APIC 3: 0x2 generate fake APIC IDs code reached ...........................................................................................................................OK! $ ./test Running 253 tests..................................................................................................................................APIC: not unique APIC 0: 0x1 APIC 1: 0x1 APIC 2: 0x1 APIC 3: 0x1 generate fake APIC IDs code reached ...........................................................................................................................OK! $ ./test Running 253 tests..................................................................................................................................APIC: not unique APIC 0: 0x3 APIC 1: 0x3 APIC 2: 0x3 APIC 3: 0x3 generate fake APIC IDs code reached ...........................................................................................................................OK! $ ./test Running 253 tests..................................................................................................................................APIC: all zero APIC 0: 0x0 APIC 1: 0x0 APIC 2: 0x0 APIC 3: 0x0 generate fake APIC IDs code reached ...........................................................................................................................OK!
Other values than 0x0, 0x1, 0x2 and 0x3 don't appears.
This explains why there was "all zero" for tests and "not unique" for pyrogenesis yesterday :)
comment:7 by , 13 years ago
Aha! Thanks very much for the output :D It looks like the APIC IDs are legitimate (0..3 for your quad-core CPU). However, the problem is that we disabled affinity support on Linux :( (#547) I've attached a patch that hopes to fix this - it's unfortunately untested since I only run Windows. Would you please give it a try?
Reference as to the functions that were used: http://www.unix.com/man-page/Linux/3/CPU_ISSET_S/
comment:8 by , 13 years ago
It don't builds now:
../../../source/lib/sysdep/os/linux/lcpu.cpp: In function ‘uintptr_t os_cpu_SetThreadAffinityMask(uintptr_t)’: ../../../source/lib/sysdep/os/linux/lcpu.cpp:104:25: error: ‘CONFIG_NR_CPUS’ was not declared in this scope ../../../source/lib/sysdep/os/linux/lcpu.cpp:127:2: error: ‘ret’ was not declared in this scope make[1]: *** [obj/lowlevel_Release/lcpu.o] Error 1 make: *** [lowlevel] Error 2
Second error is just 'int' missing before ret. But I don't know what to do with first one. Where CONFIG_NR_CPUS is defined?
comment:9 by , 13 years ago
I asked Philip about CONFIG_NR_CPUS:
<fcxSanya> Philip`, http://trac.wildfiregames.com/ticket/985#comment:8 - do you know where CONFIG_NR_CPUS is defined (or maybe it should not be defined)? <Philip`> fcxSanya: It's defined in the kernel source code, I believe <Philip`> (not a user-space thing) <fcxSanya> hm, so we can't use it? <Philip`> No <Philip`> hence the comment in the pre-patched code about having to loop to detect it at run-time
by , 13 years ago
Attachment: | lcpu_affinity_547.patch added |
---|
follow-up: 12 comment:10 by , 13 years ago
Thanks for the feedback and asking Philip! Too bad we can't get at that config variable - I had hoped the header used when building Linux was available to apps. I bit the bullet and implemented the looping logic - please give it another go :)
comment:11 by , 13 years ago
Looks working now:
$ ./test Running 253 tests..................................................................................................................................APIC 0: 0x0 APIC 1: 0x1 APIC 2: 0x2 APIC 3: 0x3 ...........................................................................................................................OK! $ ./pyrogenesis Cache: 500 (total: 3960) MiB TIMER| InitVfs: 91.8194 ms TIMER| InitScripting: 6.44537 ms TIMER| CONFIG_Init: 162.187 ms APIC 0: 0x0 APIC 1: 0x1 APIC 2: 0x2 APIC 3: 0x3 TIMER| RunHardwareDetection: 33.2567 ms <...>
comment:12 by , 13 years ago
Replying to jan:
Too bad we can't get at that config variable - I had hoped the header used when building Linux was available to apps.
The problem isn't just that it's inaccessible - people could recompile their kernel with different configuration and then run their old applications, so applications could never safely rely on that value at compile-time.
I don't think there's any reason that CONFIG_NR_CPUS
should be expected to have a maximum of 4096 in DetectMaxCpus
, so that loop probably shouldn't have an upper limit. (If we knew an upper limit then we could just hard-code an allocation of that amount instead of looping.)
It'd perhaps be nice for the patch to retain a reference to #547 since that explains some of the relevant context.
comment:13 by , 13 years ago
comment:14 by , 13 years ago
Alex: thanks for testing! I have just committed the patch after further minor changes to reflect Philip's comments. #547 and the problem with CONFIG_NR_CPUS are now mentioned. I've also lifted the limit to 64K, although even 4K is pretty much a stretch for current shared-memory machines. A (large) static limit might cause problems if Linux decides to reject TOO large cpu_sets, so this loop approach is better. I believe a cutoff is helpful to prevent infinite loops in the case of some other failure within sched_getaffinity.
Leaving this ticket open until the zero problem is also fixed.
comment:15 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
This is debug message from
source/lib/sysdep/arch/x86_x64/apic.cpp:57
, Jan should know what it means. I sent him forum message.