BeBOP: pOSKI
v1.0.0
parallel Optimized Sparse Kernel Interface library
|
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <poski/poski_threadcommon.h>
#include <poski/poski_matrixcommon.h>
#include <poski/poski_tunematcommon.h>
#include <poski/poski_matmulttype.h>
#include <poski/poski_kerneltype.h>
#include <poski/poski_malloc.h>
#include <poski/poski_print.h>
Functions | |
char * | poski_GetMatTransforms (const poski_matstruct_t A_tunable) |
Returns the estimated number of seconds available for tuning. | |
int | poski_TuneMat_run (poski_mat_t A_tunable) |
Tune matrix using the selected threading model. | |
int | poski_TuneHint_Structure_run (poski_mat_t A_tunable, poski_tunehint_t hint, int k, int *r, int *c) |
char* poski_GetMatTransforms | ( | const poski_matstruct_t | A_tunable | ) |
Returns the estimated number of seconds available for tuning.
Fraction of observed workload available for tuning. Fraction of hint workload available for tuning. The estimate is based on the larger of the following two quantities:
Implement this routine, given that we will need to allocate temporary vectors.
Check that the new data structure really is faster than the old.
Basic outline of this routine's implementation:
WHILE !IsTuned(A) AND tuning_time_left > 0 AND i_heur <= NUM_HEURISTICS DO LET heur = GetHeuristic( i_heur ); LET results = NULL IF GetTotalCostEstimate(heur, A) <= tuning_time_left THEN LET t0 = GetTimer(); results = EvaluateHeuristic( heur, A ); // results == NULL if heuristic does not apply to A LET elapsed_time = GetTimer() - t0; tuning_time_left -= elapsed_time; ENDIF IF results THEN LET t0 = GetTimer(); A_tuned = ApplyHeuristic( heur, results, A ); // convert A to new data structure LET elapsed_time = GetTimer() - t0; tuning_time_left -= elapsed_time; A = ChooseFastest( A, A_tuned, A->trace ); ENDIF i_heur = i_heur + 1; DONE
[in] | A_tunable | Valid matrix handle. |
int poski_TuneHint_Structure_run | ( | poski_mat_t | A_tunable, |
poski_tunehint_t | hint, | ||
int | k, | ||
int * | r, | ||
int * | c | ||
) |
Attempt to allocate space for this list. If it fails, we still record that there is a mix of block sizes present, but do not record the list. This behavior is OK because the implementation is free to regard/disregard hints as desired.