I'd like get to know OpenMP a bit, cause I'd like to have a huge loop parallelized. After some reading (SO, Common OMP mistakes, tutorial, etc), I've taken as a first step the basically working c/mex code given below (which yields different results for the first test case).
- The first test does sum up result values - functions
serial, parallel-, - the second takes values from an input array and writes the processed values to an output array - functions
serial_a, parallel_a.
My questions are:
- Why differ the results of the first test, i. e. the results of the
serialandparallel - Suprisingly the second test succeeds. My concern is about, how to handle memory (array locations) which possibly are read by multiple threads? In the example this should be emulated by
a[i])/cos(a[n-i]. - Are there some easy rules how to determine which variables to declare as private, shared and reduction?
- In both cases
int iis outside thepragma, however the second test appears to yield correct results. So is that okay or hasito be moved into thepragma omp parallelregion, as being said here? - Any other hints on spoted mistakes?
Code
#include "mex.h"
#include <math.h>
#include <omp.h>
#include <time.h>
double serial(int x)
{
double sum=0;
int i;
for(i = 0; i<x; i++){
sum += sin(x*i) / cos(x*i+1.0);
}
return sum;
}
double parallel(int x)
{
double sum=0;
int i;
#pragma omp parallel num_threads(6) shared(sum) //default(none)
{
//printf(" I'm thread no. %d\n", omp_get_thread_num());
#pragma omp for private(i, x) reduction(+: sum)
for(i = 0; i<x; i++){
sum += sin(x*i) / cos(x*i+1.0);
}
}
return sum;
}
void serial_a(double* a, int n, double* y2)
{
int i;
for(i = 0; i<n; i++){
y2[i] = sin(a[i]) / cos(a[n-i]+1.0);
}
}
void parallel_a(double* a, int n, double* y2)
{
int i;
#pragma omp parallel num_threads(6)
{
#pragma omp for private(i)
for(i = 0; i<n; i++){
y2[i] = sin(a[i]) / cos(a[n-i]+1.0);
}
}
}
void mexFunction(int nlhs, mxArray* plhs[], int nrhs, const mxArray* prhs[])
{
double sum, *y1, *y2, *a, s, p;
int x, n, *d;
/* Check for proper number of arguments. */
if(nrhs!=2) {
mexErrMsgTxt("Two inputs required.");
} else if(nlhs>2) {
mexErrMsgTxt("Too many output arguments.");
}
/* Get pointer to first input */
x = (int)mxGetScalar(prhs[0]);
/* Get pointer to second input */
a = mxGetPr(prhs[1]);
d = (int*)mxGetDimensions(prhs[1]);
n = (int)d[1]; // row vector
/* Create space for output */
plhs[0] = mxCreateDoubleMatrix(2,1, mxREAL);
plhs[1] = mxCreateDoubleMatrix(n,2, mxREAL);
/* Get pointer to output array */
y1 = mxGetPr(plhs[0]);
y2 = mxGetPr(plhs[1]);
{ /* Do the calculation */
clock_t tic = clock();
y1[0] = serial(x);
s = (double) clock()-tic;
printf("serial....: %.0f ms\n", s);
mexEvalString("drawnow");
tic = clock();
y1[1] = parallel(x);
p = (double) clock()-tic;
printf("parallel..: %.0f ms\n", p);
printf("ratio.....: %.2f \n", p/s);
mexEvalString("drawnow");
tic = clock();
serial_a(a, n, y2);
s = (double) clock()-tic;
printf("serial_a..: %.0f ms\n", s);
mexEvalString("drawnow");
tic = clock();
parallel_a(a, n, &y2[n]);
p = (double) clock()-tic;
printf("parallel_a: %.0f ms\n", p);
printf("ratio.....: %.2f \n", p/s);
}
}
Output
>> mex omp1.c
>> [a, b] = omp1(1e8, 1:1e8);
serial....: 13399 ms
parallel..: 2810 ms
ratio.....: 0.21
serial_a..: 12840 ms
parallel_a: 2740 ms
ratio.....: 0.21
>> a(1) == a(2)
ans =
0
>> all(b(:,1) == b(:,2))
ans =
1
System
MATLAB Version: 8.0.0.783 (R2012b)
Operating System: Microsoft Windows 7 Version 6.1 (Build 7601: Service Pack 1)
Microsoft Visual Studio 2005 Version 8.0.50727.867