GCC上归纳变量的优化.pdf_三一文库31doc.com

资源描述

《GCC上归纳变量的优化.pdf》由会员分享，可在线阅读，更多相关《GCC上归纳变量的优化.pdf（23页珍藏版）》请在三一文库上搜索。

1、Induction-variable Optimizations in GCC 程斌 2013.11 Outline Background Implementation of GCC Learned points Shortcomings Improvements Question for (i = 0; i cand 0: cand 1: cand 1: cand 2: cand 2: cand 3: cand 3: inta100; int *iv1 = a, iv2 = 202; for (; iv2 != 2;) *iv1 = iv2; iv1 += 4; iv2 -= 2; int

2、a100; int *iv1 = a, iv2 = 202; for (; iv2 != 2;) *iv1 = iv2; iv1 += 4; iv2 -= 2; Implementation of GCC Example IR before IVOPT IV uses IV candidates : # i_14 = PHI i.0_4 = (unsigned int) i_14; _5 = i.0_4 * 4; _7 = a_6(D) + _5; _8 = 101 - i_14; _9 = _8 * 2; *_7 = _9; i_11 = i_14 + 1; if (i_11 != 100)

3、 goto ; else goto ; : # i_14 = PHI i.0_4 = (unsigned int) i_14; _5 = i.0_4 * 4; _7 = a_6(D) + _5; _8 = 101 - i_14; _9 = _8 * 2; *_7 = _9; i_11 = i_14 + 1; if (i_11 != 100) goto ; else goto ; use 0: use 1: use 2: use 0: use 1: use 2: cand 0: cand 1: cand 2: cand 3: cand 4: cand 0: cand 1: cand 2: can

4、d 3: cand 4: Implementation of GCC Example Representation Costs use 0use 1use 2 cand 0 880 cand 1 0260 cand 2 NA60 cand 3 NA20 cand 4 NA20 use 0use 1use 2 cand 0 202-2*iv_canda + 4 * iv_candiv_cand != 100 cand 1 iv_canda+2*(202-iv_cand)iv_cand != 2 cand 2 NAMEMiv_candiv_cand != a+400 cand 3 NAMEMiv_

5、candiv_cand != a+400 cand 4 NAMEMiv_candiv_cand != a+400 Implementation of GCC Example Choose optimal iv set Rewrite iv uses and remove dead code : iv.9_2 = (unsigned int) a_6(D); : # i_14 = PHI # iv.6_3 = PHI # iv.9_1 = PHI _9 = (int) iv.6_3; _16 = (void *) iv.9_1; MEMbase: _16, offset: 0B = _9; iv

6、.9_3 = iv.9_1 + 4; iv.6_2 = iv.6_3 - 2; if (iv.6_2 != 2) goto ; else goto ; : iv.9_2 = (unsigned int) a_6(D); : # i_14 = PHI # iv.6_3 = PHI # iv.9_1 = PHI _9 = (int) iv.6_3; _16 = (void *) iv.9_1; MEMbase: _16, offset: 0B = _9; iv.9_3 = iv.9_1 + 4; iv.6_2 = iv.6_3 - 2; if (iv.6_2 != 2) goto ; else g

7、oto ; : # i_14 = PHI i.0_4 = (unsigned int) i_14; _5 = i_14 * 4; _7 = a_6(D) + _5; _8 = 101 - i_14; _9 = _8 * 2; *_7 = _9; i_11 = i_14 + 1; if (i_11 != 100) goto ; else goto ; : # i_14 = PHI i.0_4 = (unsigned int) i_14; _5 = i_14 * 4; _7 = a_6(D) + _5; _8 = 101 - i_14; _9 = _8 * 2; *_7 = _9; i_11 =

8、i_14 + 1; if (i_11 != 100) goto ; else goto ; _5 = i_14 * 4; _7 = a_6(D) + _5; _8 = 101 - i_14; _5 = i_14 * 4; _7 = a_6(D) + _5; _8 = 101 - i_14; i_11 = i_14 + 1;i_11 = i_14 + 1; use:0 cand:0 cand:1 cand:1 use:1 cand:0 cand:0 cand:4 use:2 cand:0 cand:0 cand:1 Implementation of GCC Learned points Inf

9、rastructure Unified algorithm Cost Model rtx cost, like “+,-,*,/,%,.” address cost Context information Loop invariant Register pressure Interact with other optimizations IVOPT ALGORITHM IVOPT ALGORITHM INPUTOUTPUT Cost Model addressing mode reg reg + reg reg + regconst reg + const! reg, #const Short

10、comings implementation fine tuned only for x86/x86_64 scaled addressing mode not supported for ARM, etc. support of auto-increment addressing mode is weak for ARM inta100, i; for (i = 0; i 100; i+) MEMa+i2 = 202 - 2*i inta100, i; for (i = 0; i 100; i+) MEMa+i2 = 202 - 2*i inta100; int *iv1 = a, iv2

11、= 202; for (; iv2 != 2;) MEM(iv1, #4) = iv2; iv2 -= 2; inta100; int *iv1 = a, iv2 = 202; for (; iv2 != 2;) MEM(iv1, #4) = iv2; iv2 -= 2; addressing mode reg reg + reg reg + regconst reg + const! reg, #const inta100; int *iv1 = a, iv2 = 202; for (; iv2 != 2;) *iv1 = iv2; iv1 += 4; iv2 -= 2; inta100;

12、int *iv1 = a, iv2 = 202; for (; iv2 != 2;) *iv1 = iv2; iv1 += 4; iv2 -= 2; inta100, i; for (i = 0; i 100; i+) tmp = a + i*4 MEMtmp = 202 - 2*i inta100, i; for (i = 0; i 100; i+) tmp = a + i*4 MEMtmp = 202 - 2*i Shortcomings implementation iv uses outside of loop are not handled along loop exit edges

13、 induction variables in both branches of IF- statement are not recognized ai2 = 0 i1 = i2 i2 = i2 + 1 if (i2 100) use i1 ai2 = 0 i2 = i2 + 1 if (i2 100) use (i2-1) inta100, i = 0; while (i 100) if (i % 2 = 0) ai = 0; i+; else ai = 1; i+; inta100, i = 0; while (i 100) if (i % 2 = 0) ai = 0; i+; else

14、ai = 1; i+; Shortcomings implementation complex address expression hard to process inaccurate cost x=t+c or t=a+c; x=t+b x=a+b+c t=a+b; x=t+c or t=a+c; x=t+b IVOPT Algorithm IVOPT Algorithm RTL passesRTL passes TREE IR a.S TREE IR RTL IR Cost Model IVOPT ALGORITHM IVOPT ALGORITHM INPUTOUTPUT Cost Mo

15、del Improvements Work & patches Support scaled addressing mode for architectures other than x86/x86_64. http:/gcc.gnu.org/ml/gcc-patches/2013-08/msg01642.html http:/gcc.gnu.org/ml/gcc-patches/2013-09/msg01927.html Simplify address expressions for IVOPT http:/gcc.gnu.org/ml/gcc-patches/2013-11/msg005

16、37.html http:/gcc.gnu.org/ml/gcc-patches/2013-11/msg01075.html Fix wrong address cost for auto-increment addressing mode http:/gcc.gnu.org/ml/gcc-patches/2013-11/msg00156.html Expedite the use of auto-increment addressing mode http:/gcc.gnu.org/ml/gcc-patches/2013-09/msg00034.html more patches comin

17、g. Compute outside loop iv uses along exit edges http:/gcc.gnu.org/ml/gcc-patches/2013-11/msg00535.html Improve the optimal iv set choosing algorithm patches coming. Miscellaneous fixes for IVOPT . Improvements Work & patches Following work Fix address cost mode and RTL passes, e.g. fwprop_addr addr

18、 moderegreg+regreg+regireg+offsetreg, #off cost64320 Improvements Benchmark data Performance Coremark EEMBC_v1 automotive, office, consumer, telecom Spec2000 Code size CSIBE Question and Answer Thank You! References source code & internal & mailing list gcc.gnu.org Advanced compiler design and implementation Symbolic Evaluation of Chains of Recurrences for Loop Optimization, etc.

展开阅读全文