1*9880d681SAndroid Build Coastguard Worker; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=+sse4.1 -O0 < %s | FileCheck %s 2*9880d681SAndroid Build Coastguard Worker 3*9880d681SAndroid Build Coastguard Worker; Check that at -O0, the backend doesn't attempt to canonicalize a vector load 4*9880d681SAndroid Build Coastguard Worker; used by an INSERTPS into a scalar load plus scalar_to_vector. 5*9880d681SAndroid Build Coastguard Worker; 6*9880d681SAndroid Build Coastguard Worker; In order to fold a load into the memory operand of an INSERTPSrm, the backend 7*9880d681SAndroid Build Coastguard Worker; tries to canonicalize a vector load in input to an INSERTPS node into a 8*9880d681SAndroid Build Coastguard Worker; scalar load plus scalar_to_vector. This would allow ISel to match the 9*9880d681SAndroid Build Coastguard Worker; INSERTPSrm variant rather than a load plus INSERTPSrr. 10*9880d681SAndroid Build Coastguard Worker; 11*9880d681SAndroid Build Coastguard Worker; However, ISel can only select an INSERTPSrm if folding a load into the operand 12*9880d681SAndroid Build Coastguard Worker; of an insertps is considered to be profitable. 13*9880d681SAndroid Build Coastguard Worker; 14*9880d681SAndroid Build Coastguard Worker; In the example below: 15*9880d681SAndroid Build Coastguard Worker; 16*9880d681SAndroid Build Coastguard Worker; __m128 test(__m128 a, __m128 *b) { 17*9880d681SAndroid Build Coastguard Worker; __m128 c = _mm_insert_ps(a, *b, 1 << 6); 18*9880d681SAndroid Build Coastguard Worker; return c; 19*9880d681SAndroid Build Coastguard Worker; } 20*9880d681SAndroid Build Coastguard Worker; 21*9880d681SAndroid Build Coastguard Worker; At -O0, the backend would attempt to canonicalize the load to 'b' into 22*9880d681SAndroid Build Coastguard Worker; a scalar load in the hope of matching an INSERTPSrm. 23*9880d681SAndroid Build Coastguard Worker; However, ISel would fail to recognize an INSERTPSrm since load folding is 24*9880d681SAndroid Build Coastguard Worker; always considered unprofitable at -O0. This would leave the insertps mask 25*9880d681SAndroid Build Coastguard Worker; in an invalid state. 26*9880d681SAndroid Build Coastguard Worker; 27*9880d681SAndroid Build Coastguard Worker; The problem with the canonicalization rule performed by the backend is that 28*9880d681SAndroid Build Coastguard Worker; it assumes ISel to always be able to match an INSERTPSrm. This assumption is 29*9880d681SAndroid Build Coastguard Worker; not always correct at -O0. In this example, FastISel fails to lower the 30*9880d681SAndroid Build Coastguard Worker; arguments needed by the entry block. This is enough to enable the DAGCombiner 31*9880d681SAndroid Build Coastguard Worker; and eventually trigger the canonicalization on the INSERTPS node. 32*9880d681SAndroid Build Coastguard Worker; 33*9880d681SAndroid Build Coastguard Worker; This test checks that the vector load in input to the insertps is not 34*9880d681SAndroid Build Coastguard Worker; canonicalized into a scalar load plus scalar_to_vector (a movss). 35*9880d681SAndroid Build Coastguard Worker 36*9880d681SAndroid Build Coastguard Workerdefine <4 x float> @test(<4 x float> %a, <4 x float>* %b) { 37*9880d681SAndroid Build Coastguard Worker; CHECK-LABEL: test: 38*9880d681SAndroid Build Coastguard Worker; CHECK: movaps (%rdi), [[REG:%[a-z0-9]+]] 39*9880d681SAndroid Build Coastguard Worker; CHECK-NOT: movss 40*9880d681SAndroid Build Coastguard Worker; CHECK: insertps $64, [[REG]], 41*9880d681SAndroid Build Coastguard Worker; CHECK: ret 42*9880d681SAndroid Build Coastguard Workerentry: 43*9880d681SAndroid Build Coastguard Worker %0 = load <4 x float>, <4 x float>* %b, align 16 44*9880d681SAndroid Build Coastguard Worker %1 = call <4 x float> @llvm.x86.sse41.insertps(<4 x float> %a, <4 x float> %0, i32 64) 45*9880d681SAndroid Build Coastguard Worker %2 = alloca <4 x float>, align 16 46*9880d681SAndroid Build Coastguard Worker store <4 x float> %1, <4 x float>* %2, align 16 47*9880d681SAndroid Build Coastguard Worker %3 = load <4 x float>, <4 x float>* %2, align 16 48*9880d681SAndroid Build Coastguard Worker ret <4 x float> %3 49*9880d681SAndroid Build Coastguard Worker} 50*9880d681SAndroid Build Coastguard Worker 51*9880d681SAndroid Build Coastguard Worker 52*9880d681SAndroid Build Coastguard Workerdeclare <4 x float> @llvm.x86.sse41.insertps(<4 x float>, <4 x float>, i32) 53