%PDF-1.4 5 0 obj << /S /GoTo /D (section.1) >> endobj 8 0 obj ( Introduction) endobj 9 0 obj << /S /GoTo /D (section.2) >> endobj 12 0 obj ( Design of Tuned Numerical Libraries) endobj 13 0 obj << /S /GoTo /D (subsection.2.1) >> endobj 16 0 obj ( Well-Designed Numerical Software Libraries) endobj 17 0 obj << /S /GoTo /D (subsubsection.2.1.1) >> endobj 20 0 obj ( Portability) endobj 21 0 obj << /S /GoTo /D (subsubsection.2.1.2) >> endobj 24 0 obj ( Performance) endobj 25 0 obj << /S /GoTo /D (subsubsection.2.1.3) >> endobj 28 0 obj ( Scalability) endobj 29 0 obj << /S /GoTo /D (subsection.2.2) >> endobj 32 0 obj ( Motivation for Automatic Tuning) endobj 33 0 obj << /S /GoTo /D (section.3) >> endobj 36 0 obj ( Automatically Tuned Numerical Kernels: the Dense Case) endobj 37 0 obj << /S /GoTo /D (subsection.3.1) >> endobj 40 0 obj ( AEOS -- Fundamentals of Applying Empirical Techniques to Optimization) endobj 41 0 obj << /S /GoTo /D (subsubsection.3.1.1) >> endobj 44 0 obj ( Elements of the AEOS Method.) endobj 45 0 obj << /S /GoTo /D (subsection.3.2) >> endobj 48 0 obj ( ATLAS Overview) endobj 49 0 obj << /S /GoTo /D (subsection.3.3) >> endobj 52 0 obj ( Tuning the Level 3 BLAS Using a Simple Kernel ) endobj 53 0 obj << /S /GoTo /D (subsection.3.4) >> endobj 56 0 obj ( Overview of ATLAS GEMM Kernel Search ) endobj 57 0 obj << /S /GoTo /D (subsection.3.5) >> endobj 60 0 obj ( Source Generator Details) endobj 61 0 obj << /S /GoTo /D (subsection.3.6) >> endobj 64 0 obj ( The Importance of Multiple Implementation) endobj 65 0 obj << /S /GoTo /D (section.4) >> endobj 68 0 obj ( Automatically Tuned Numerical Kernels: the Sparse Case) endobj 69 0 obj << /S /GoTo /D (subsection.4.1) >> endobj 72 0 obj ( Challenges and Surprises in Tuning) endobj 73 0 obj << /S /GoTo /D (subsubsection.4.1.1) >> endobj 76 0 obj ( Overheads Due to Sparse Data Structures) endobj 77 0 obj << /S /GoTo /D (subsubsection.4.1.2) >> endobj 80 0 obj ( Surprising Performance Behavior in Practice) endobj 81 0 obj << /S /GoTo /D (subsection.4.2) >> endobj 84 0 obj ( A Hybrid Off-line/Run-time Empirical Search-Based Approach to Tuning) endobj 85 0 obj << /S /GoTo /D (subsubsection.4.2.1) >> endobj 88 0 obj ( Example: Using Empirical Models and Search to Select a Register Block Size) endobj 89 0 obj << /S /GoTo /D (subsubsection.4.2.2) >> endobj 92 0 obj ( Contrast to a Static Compilation Approach) endobj 93 0 obj << /S /GoTo /D (subsection.4.3) >> endobj 96 0 obj ( Summary of Optimization Techniques and Speedups) endobj 97 0 obj << /S /GoTo /D (subsubsection.4.3.1) >> endobj 100 0 obj ( Techniques for Sparse Matrix-Vector Multiply) endobj 101 0 obj << /S /GoTo /D (subsubsection.4.3.2) >> endobj 104 0 obj ( Techniques for Other Sparse Kernels) endobj 105 0 obj << /S /GoTo /D (subsection.4.4) >> endobj 108 0 obj ( Remaining Challenges and Related Work) endobj 109 0 obj << /S /GoTo /D (section.5) >> endobj 112 0 obj ( Statistical determination of numerical algorithms) endobj 113 0 obj << /S /GoTo /D (subsection.5.1) >> endobj 116 0 obj ( Dynamic Algorithm Determination) endobj 117 0 obj << /S /GoTo /D (subsection.5.2) >> endobj 120 0 obj ( Statistical analysis) endobj 121 0 obj << /S /GoTo /D (subsubsection.5.2.1) >> endobj 124 0 obj ( Feature extraction) endobj 125 0 obj << /S /GoTo /D (subsubsection.5.2.2) >> endobj 128 0 obj ( Training stage) endobj 129 0 obj << /S /GoTo /D (subsection.5.3) >> endobj 132 0 obj ( Numerical test) endobj 133 0 obj << /S /GoTo /D (subsection.5.4) >> endobj 136 0 obj ( Technical Approach) endobj 137 0 obj << /S /GoTo /D (subsection.5.5) >> endobj 140 0 obj ( Results) endobj 141 0 obj << /S /GoTo /D (section.6) >> endobj 144 0 obj ( Conclusion) endobj 145 0 obj << /S /GoTo /D [146 0 R /Fit ] >> endobj 148 0 obj << /Length 2383 /Filter /FlateDecode >> stream xڝXKϯ-TՈb崻ޭx㊓8DaF!gǿ>_5N6@hC2L|u"ܦɪȷamW;<7IoLH4tZYiΪy[k?\EMU,b@6kk4E?(Fn?VeX]da,LSÿ(.s"?
]PIޚOF9=\.agy6qFytĿ(8}yOeSԇzUy"L_wO