The creation of a tuple (1,) is optimized away by the compiler. On the other hand, the list is always created. Look at dis.dis
>>> import dis
>>> dis.dis('a.extend((1,))')
1 0 LOAD_NAME 0 (a)
2 LOAD_METHOD 1 (extend)
4 LOAD_CONST 0 ((1,))
6 CALL_METHOD 1
8 RETURN_VALUE
>>> dis.dis('a.extend([1])')
1 0 LOAD_NAME 0 (a)
2 LOAD_METHOD 1 (extend)
4 LOAD_CONST 0 (1)
6 BUILD_LIST 1
8 CALL_METHOD 1
10 RETURN_VALUE
Notice, it takes less byte-code instructions, and merely does a LOAD_CONST on (1,). On the other hand, for the list, BUILD_LIST is called (with a LOAD_CONST for 1).
Note, you can access these constants on the code object:
>>> code = compile('a.extend((1,))', '', 'eval')
>>> code
<code object <module> at 0x10e91e0e0, file "", line 1>
>>> code.co_consts
((1,),)
Finally, as to why += is faster than .extend, well, again if you look at the bytecode:
>>> dis.dis('a += b')
1 0 LOAD_NAME 0 (a)
2 LOAD_NAME 1 (b)
4 INPLACE_ADD
6 STORE_NAME 0 (a)
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
>>> dis.dis('a.extend(b)')
1 0 LOAD_NAME 0 (a)
2 LOAD_METHOD 1 (extend)
4 LOAD_NAME 2 (b)
6 CALL_METHOD 1
8 RETURN_VALUE
You'll notice for .extend, it that requires first resolving the method (which takes extra time). Using the operator on the other hand has it's own bytecode: INPLACE_ADD so everything is pushed down into that C layer (plus, magic methods skip instance namespaces and a bunch of hooplah and are looked up directly on the class).