Opened 13 years ago

Closed 13 years ago

#985 closed defect (fixed)

test show: "APIC: all zero"

Reported by: fabio Owned by: Jan Wassenberg
Priority: Should Have Milestone: Alpha 8
Component: Core engine Keywords:
Cc: Patch:

Description

Since some days when I run the tests on my Ubuntu 10.10 I get this output:

Running 253 tests.............................................................APIC: all zero
................................................................................................................................................................................................OK!

Dunno if it's a problem or not...

Attachments (3)

system_info.txt (5.2 KB ) - added by fcxSanya 13 years ago.
system_info-Ben.txt (1.6 KB ) - added by historic_bruno 13 years ago.
lcpu_affinity_547.patch (2.9 KB ) - added by Jan Wassenberg 13 years ago.

Download all attachments as: .zip

Change History (18)

comment:1 by fcxSanya, 13 years ago

This is debug message from source/lib/sysdep/arch/x86_x64/apic.cpp:57, Jan should know what it means. I sent him forum message.

comment:2 by Jan Wassenberg, 13 years ago

Owner: set to Jan Wassenberg
Status: newassigned

Thanks very much for posting this (and Alex for notifying me). So the CPU is not returning sensible APIC IDs, which is what triggered a recent bug (because that code path didn't receive much testing).

I'd love to understand why it is happening. Would you please attach your logs/system_info.txt?

comment:3 by fcxSanya, 13 years ago

I just ran tests and I see such message too. When I ran 0ad there is also following message:

$ ./pyrogenesis
Cache: 500 (total: 3960) MiB
TIMER| InitVfs: 253.196 ms
TIMER| InitScripting: 6.3663 ms
TIMER| CONFIG_Init: 234.818 ms
APIC: not unique
<...>

by fcxSanya, 13 years ago

Attachment: system_info.txt added

by historic_bruno, 13 years ago

Attachment: system_info-Ben.txt added

comment:4 by historic_bruno, 13 years ago

Uploaded mine too, since I get this error and it may help determine what we all have in common :)

comment:5 by Jan Wassenberg, 13 years ago

Thanks for the info, guys! I see what's happening with Ben's machine - it's just a single processor, not even HT-enabled, so the only APIC ID is 0. heh, haven't seen such a CPU in a loong time. I will add code to fix that later, but it doesn't really matter in a single-processor context anyway.

Alex: your issue is much more worrisome :S Please insert the following after the current line 74 of apic.cpp:

	for(size_t i = 0; i < numIds; i++)
		debug_printf(L"APIC %d: 0x%x\n", i, processorApicIds[i]);

and also a debug_printf immediately after the "generate fake but legitimate APIC IDs" comment to indicate whether that is happening.

comment:6 by fcxSanya, 13 years ago

Changed code:

static Status InitApicIds()

{

	const Status status = GetAndValidateApicIds();
	for(size_t i = 0; i < numIds; i++)
		debug_printf(L"APIC %d: 0x%x\n", i, processorApicIds[i]);

	if(status < 0)	// failed

	{

		// generate fake but legitimate APIC IDs
		debug_printf(L"generate fake APIC IDs code reached\n");

		for(size_t processor = 0; processor < numIds; processor++)

			processorApicIds[processor] = sortedApicIds[processor] = (ApicId)processor;

	}



	return status;

}

Few consecutive runs of tests (for .pyrogenesis similar results):

$ ./test
Running 253 tests..................................................................................................................................APIC: not unique
APIC 0: 0x2
APIC 1: 0x2
APIC 2: 0x2
APIC 3: 0x2
generate fake APIC IDs code reached
...........................................................................................................................OK!
$ ./test
Running 253 tests..................................................................................................................................APIC: not unique
APIC 0: 0x1
APIC 1: 0x1
APIC 2: 0x1
APIC 3: 0x1
generate fake APIC IDs code reached
...........................................................................................................................OK!
$ ./test
Running 253 tests..................................................................................................................................APIC: not unique
APIC 0: 0x3
APIC 1: 0x3
APIC 2: 0x3
APIC 3: 0x3
generate fake APIC IDs code reached
...........................................................................................................................OK!
$ ./test
Running 253 tests..................................................................................................................................APIC: all zero
APIC 0: 0x0
APIC 1: 0x0
APIC 2: 0x0
APIC 3: 0x0
generate fake APIC IDs code reached
...........................................................................................................................OK!

Other values than 0x0, 0x1, 0x2 and 0x3 don't appears.

This explains why there was "all zero" for tests and "not unique" for pyrogenesis yesterday :)

comment:7 by Jan Wassenberg, 13 years ago

Aha! Thanks very much for the output :D It looks like the APIC IDs are legitimate (0..3 for your quad-core CPU). However, the problem is that we disabled affinity support on Linux :( (#547) I've attached a patch that hopes to fix this - it's unfortunately untested since I only run Windows. Would you please give it a try?

Reference as to the functions that were used: http://www.unix.com/man-page/Linux/3/CPU_ISSET_S/

comment:8 by fcxSanya, 13 years ago

It don't builds now:

../../../source/lib/sysdep/os/linux/lcpu.cpp: In function ‘uintptr_t os_cpu_SetThreadAffinityMask(uintptr_t)’:
../../../source/lib/sysdep/os/linux/lcpu.cpp:104:25: error: ‘CONFIG_NR_CPUS’ was not declared in this scope
../../../source/lib/sysdep/os/linux/lcpu.cpp:127:2: error: ‘ret’ was not declared in this scope
make[1]: *** [obj/lowlevel_Release/lcpu.o] Error 1
make: *** [lowlevel] Error 2

Second error is just 'int' missing before ret. But I don't know what to do with first one. Where CONFIG_NR_CPUS is defined?

comment:9 by fcxSanya, 13 years ago

I asked Philip about CONFIG_NR_CPUS:

<fcxSanya> Philip`, http://trac.wildfiregames.com/ticket/985#comment:8 - do you know where CONFIG_NR_CPUS is defined (or maybe it should not be defined)?
<Philip`> fcxSanya: It's defined in the kernel source code, I believe
<Philip`> (not a user-space thing)
<fcxSanya> hm, so we can't use it?
<Philip`> No
<Philip`> hence the comment in the pre-patched code about having to loop to detect it at run-time

by Jan Wassenberg, 13 years ago

Attachment: lcpu_affinity_547.patch added

comment:10 by Jan Wassenberg, 13 years ago

Thanks for the feedback and asking Philip! Too bad we can't get at that config variable - I had hoped the header used when building Linux was available to apps. I bit the bullet and implemented the looping logic - please give it another go :)

comment:11 by fcxSanya, 13 years ago

Looks working now:

$ ./test
Running 253 tests..................................................................................................................................APIC 0: 0x0
APIC 1: 0x1
APIC 2: 0x2
APIC 3: 0x3
...........................................................................................................................OK!

$ ./pyrogenesis 
Cache: 500 (total: 3960) MiB
TIMER| InitVfs: 91.8194 ms
TIMER| InitScripting: 6.44537 ms
TIMER| CONFIG_Init: 162.187 ms
APIC 0: 0x0
APIC 1: 0x1
APIC 2: 0x2
APIC 3: 0x3
TIMER| RunHardwareDetection: 33.2567 ms
<...>

in reply to:  10 comment:12 by Philip Taylor, 13 years ago

Replying to jan:

Too bad we can't get at that config variable - I had hoped the header used when building Linux was available to apps.

The problem isn't just that it's inaccessible - people could recompile their kernel with different configuration and then run their old applications, so applications could never safely rely on that value at compile-time.

I don't think there's any reason that CONFIG_NR_CPUS should be expected to have a maximum of 4096 in DetectMaxCpus, so that loop probably shouldn't have an upper limit. (If we knew an upper limit then we could just hard-code an allocation of that amount instead of looping.)

It'd perhaps be nice for the patch to retain a reference to #547 since that explains some of the relevant context.

comment:13 by Jan Wassenberg, 13 years ago

(In [10373]) proper fix for brain-dead Linux affinity API (that code was disabled in #547) refs #985

comment:14 by Jan Wassenberg, 13 years ago

Alex: thanks for testing! I have just committed the patch after further minor changes to reflect Philip's comments. #547 and the problem with CONFIG_NR_CPUS are now mentioned. I've also lifted the limit to 64K, although even 4K is pretty much a stretch for current shared-memory machines. A (large) static limit might cause problems if Linux decides to reject TOO large cpu_sets, so this loop approach is better. I believe a cutoff is helpful to prevent infinite loops in the case of some other failure within sched_getaffinity.

Leaving this ticket open until the zero problem is also fixed.

comment:15 by Jan Wassenberg, 13 years ago

Resolution: fixed
Status: assignedclosed

(In [10374]) apic: avoid warning message for ancient single-core, non-HT processors. fixes #985 wvm: improve diagnostics if allocation failed

Note: See TracTickets for help on using tickets.