– mkdir build
– cd build
– cmake ..
– make -j
– make install
You must follow these instructions to build and install using CMake, otherwise CMake will
not be able to find FFTW3 when you try to build your project.
Reporting Errors
Contact the lecturer directly if you believe that there is a problem with the starting code, or the build process
in general. To make error reporting useful, you should include in your description of the error: error messages,
operating system version, compiler version, and an exact sequence of steps to reproduce the error. Questions
regarding the clarity of the instructions are allowed. Any corrections or clarifications made after the initial
release of this coursework will be written in the errata section and announced.
Assignment: A Universe in a Box
Important: Read all these instructions before starting programming. In particular, some marks are awarded
for a reasonable git history (as described in Part 3), that should show what you were thinking as you
developed.
0.2 Overview: N-body Simulations and Particle-Mesh Methods
This assignment will ask you to produce a simulation of particles subject to gravitational forces. This is
known as an “n-body simulation”. Please read through this background section carefully, even if you are
familiar with the subject area, as this contains the models that you will have to implement and introduces
notation that will be used throughout the assignment text.
N.B. In order to keep diagrams clear, illustrations will use a 2D grid. You should keep in mind however that
this will be a three dimensional simulation.
Don’t worry about the underlying physics, we’ll describe the mathematical process that needs
to be implemented in detail.
The basic idea of an gravitational n-body simulation is that:
• We have some space in which there are many particles
• We start with some intitial distribution of particles; in our case this they will be uniform randomly
distributed in a box (a unit cube).
• We have np particles.
• Every particle has the same mass, mp.
• Particles have individual positions and velocities.
• The time evolution of the system is broken up into equal time steps.
• We perform the following process in a loop, starting at t = 0 and ending at t = tmax:
– Calculate the acceleration of all particles due to gravity
– Update the position and velocity of all particles
– Increase the time t by the time-step t.
Furthermore, we’ll choose our coordinate system so that:
• Particles exist in a box, with sides of size W.
• Coordinates ˛r in the box will be relative to the size of the box, so particle positions will always
been within the unit cube: x, y, z œ [0, 1).
– The actual distance between two particles is therefore W˛r
4
– This allows us to model an expanding universe simply by updating the value of W, without
updating the positions of each particle.
– Particles with a fixed position as W increases are moving apart from each other in line with the
overall expansion of the universe; this is why these coordinates are sometimes known as co-moving
coordinates.
– We only keep track of what’s called the peculiar velocity of each particle: the velocity the particle
has when the expansion is ignored.
We will label particles from 0 to (np≠1); every particle has its own position (˛ri), velocity (˛vi), and acceleration
(˛ai). Position, velocity, and acceleration for a given particle are all three dimensional vectors with an x, y,
and z component.
5
0.3 Notation
We will define notation as we introduce it throughout this assignment, but the notation used is summarised
here for reference.
6
0.3.1 Mathematical notation:
• ˛r is used for a point/vector in real space coordinates
• x, y, z are the x, y, z components of the ˛r vector
• ˛
k is used for a point/vector in frequency space coordinates.
• kx, ky, kz are the x, y, z components of the ˛
k vector
• k2 is used for ˛
k · ˛
k
• f(˛r) is a function in real space coordinates, while ˜f(˛
k) is the frequency space representation.
• FT(f) is used for the Fourier transform from real space (˛r) to frequency space (˛
k).
– ˜f(˛
k) = FT(f(˛r))
• FT≠1( ˜f) is used for the inverse Fourier transform from frequency space (˛
k) to real space (˛r)
– f(˛r) = FT≠1( ˜f(˛
k))
0.3.2 Physical functions:
• fl(˛r): the density
• (˛r): the gravitational potential
• ˛g(˛r): the gravitational field; this is the acceleration felt by a particle at ˛r
0.3.3 Simulation properties:
• W: width of the box
• nc: the size of each dimension of the mesh in cells
– The mesh is an nc ◊ nc ◊ nc grid
• Nc: the total number of cells in the mesh, Nc = n3
c
• wc: the width of a cell
• (i, j, k) is used for indexing the mesh with separate x, y, and z indices (i, j, and k respectively)
• µ is used for indexing the mesh with a single index
– µ = k + nc(j + nci)
• np: the number of particles
• mp: the mass of a particle
• ˛ri: the position of particle i
– xi: the x-position of particle i
– yi: the y-position of particle i
– zi: the z-position of particle i
– xi, yi, zi œ [0, 1)
• ˛vi: the velocity of particle i
• ˛ai: the acceleration of particle i
• t: the time coordinate (begins at 0)
• t: the time-step, the time dierence between one simulation step and the next
• tmax: the finish time of the simulation
• F: the expansion factor per timestep.
0.4 Updating the particles
We can update a particle’s velocity using its current acceleration, and we can update a particle’s position by
using its velocity. We will take a very simple approach to this update known as Euler integration.
˛vi(t) = ˛vi(t ≠ t) + ˛ai(t)t
˛xi(t) = ˛xi(t ≠ t) + ˛vi(t)t
Note that this formulation has implied an ordering: we update the velocity first, and then update the position
using the new velocity values. This order is specified to avoid ambiguity in the implementation rather than
for theoretical reasons.
7
This method is not the most accurate, but it is fast and simple to implement. In order to perform this update
we need to be able to calculate the acceleration at a given time step for every particle, which is dependent on
the relative positions of all of the particles.
We will treat the space as periodic, so that if a particle moves outside of the box it reappears on the other
side.
0.5 Calculating Accelerations: Particle-Particle and Particle-Mesh methods
0.5.1 Particle-Particle
The simplest approach is to calculate the acceleration on each particle due to every other particle directly. If
there are np particles, labeled 0...(np ≠ 1), then the acceleration experienced by particle i due to all other
particles is:
˛ai = ≠q
j”=i
mp
|˛rj≠˛ri|2 rˆji,
where rˆji is the unit vector pointing to particle j from particle i, and mp is the mass of a particle. (We have
also ignored a factor of Newton’s constant, G, which we can set to 1 by a choice of units. This keeps the
mathematical structure as bare as possible.)
This calculation would need to be done for all n particles. This approach is known as the Particle-Particle
(PP) method, since it calculates forces between every pair of particles directly. There are unfortunately a
few problems with this approach:
1. The acceleration function is singular where rij = 0, and thus we can get extreme behaviour (or
even inf/nan issues) when particles are very close. This would need to mitigated by so called “force
softening”.
2. This algorithm rapidly becomes very slow as the number of particles increases (O(n2) due to pair-wise
approach).
3. We would like to simulate an infinite universe, so how do we calculate interactions more distant than
the size of the box?
0.5.2 Particle-Mesh
Another approach, which we shall take in this project, is to use a Particle-Mesh (PM) method. ParticleMesh is so called because it still has particles in a space, but now we also keep track of discretised functions
on the space, which are represented on a mesh. The mesh is just a gridding of our 3D space: our space is
split up into a finite number of cells, and a function can be represented by a value in each cell.
8
The mesh is a 3D uniform grid of cells; the grid is nc cells wide, high, and deep. This means that the width
of a cell, wc, is given by:
wc = W
nc ,
and the volume of a cell is:
Vc = w3
c .
Note that wc is the physical width of the cell, not the coordinate width. The width of a cell in comoving
coordinates ˛r is always 1
nc , since the coordinates are scaled so that the box is a unit cube. We need the
physical width and volume of the cell in order to calculate physical quantities like gradients and densities.
9
To represent a function, we need to assign a value to every cell in the mesh. To avoid doing the pair-wise
acceleration function, we want to convert the distribution of particles into a density function fl. This density
function can then be transformed into a gravitational potential, which can then be used to calculate the
gravitational field (the acceleration that a particle experiences due to gravity at a given point in space).
0.5.2.1 The density function
The density function, fl, in a given cell is calculated by counting the particles within that cell and using the
following formula:
fl = M
Vc = pcmp
Vc ,
where pc is the number of particles in the cell, and M = pcmp is the total mass of inside the cell.
0.5.2.2 The potential function
The potential, , is related to the density by the following formula:
Ò2 = 4fifl,
where again we have ignored G. Calculating the potential involves solving this dierential equation, which
can be converted to a simpler equation using a Fourier transform (FT). Don’t worry about this process: we
will be using a library to perform the FT itself. All you need to know is:
1. Fourier transforms convert our function in real space f(˛r) to a function in a reciprocal frequency space
˜f(˛
k) where k is related to the reciprocal of r. For example they might convert between time and
frequency, or distance and wavenumber.
2. Fourier transforms can be used to solve some kinds of dierential equations.
3. The frequency representation makes functions in finite spaces inherently periodic. This is ideal for our
circumstances as we are treating our box as periodic as an approximation to an infinite space with
infinite particles. We will therefore be solving the above equation with periodic boundary conditions
i.e. f(x + W) = f(x).
4. Our functions have a discrete representation on an nc ◊ nc ◊ nc mesh; the discrete Fourier transform
that we will use will likewise give us a discrete representation on an nc ◊ nc ◊ nc mesh. In real space
each cell in the mesh corresponds to a dierent position (˛r), whereas in frequency space each cell
corresponds to a dierent wavenumber (˛
k).
After applying Fourier transforms to both sides of our equation, we have the following frequency space
expression relating the potential and the density function:
≠k2˜(˛
k)=4fifl˜(˛
k),
where k2 = ˛
k · ˛
k. This can be straightforwardly solved since it is not a dierential equation:
10
˜(˛
k) = ≠4fi
k2 fl˜(˛
k),
which just involves scaling our existing fl˜(˛
k) in each cell in our mesh by a factor of 4fik≠2. (See the section
on FFTW for how to find the value of ˛
k for each cell of the mesh in frequency space.) Once we have obtained
˜(˛
k) we can get the potential in real space (˛r) by performing an inverse Fourier transform.
(˛r) = FT≠1(˜(˛
k)) = FT≠1( ≠4fi
k2 FTfl(˛r))
The sequence of operations computationally is then:
1. Find the density function fl(˛r)
2. Apply a fourier transform to the density function to get fl˜(˛
k)
3. Recale fl˜(˛
k) to get ˜(˛
k)
4. Apply an inverse fourier transform to ˜(˛
k) to get the real space potential (˛r)
0.5.2.3 The gravitational field
In order to calculate the acceleration of a particle we need the gravitational field, which is the negative of the
gradient of the potential.
˛g(˛r) = ≠Ò˛ ˛(r)
The gradient is calculated using a simple finite dierence method, defined by the following formula:
df
dx ¥ f(x+x)≠f(x≠x)
2 .
As with the density and potential, the gravitational field function is defined on the mesh with indices (i, j, k);
unlike the density and potential however the gravitational field at any given point is a 3D vector.
To calculate the gradient in a given cell in the mesh, we take the value of the potential in the neighbouring
cells, and approximate the gradient using the formula:
gx(i, j, k) = ≠ (i+1,j,k)≠(i≠1,j,k)
2wc
gy(i, j, k) = ≠ (i,j+1,k)≠(i,j≠1,k)
2wc
gz(i, j, k) = ≠ (i,j,k+1)≠(i,j,k≠1)
2wc
Remember that the functions in the box are periodic, so indices should wrap around if they go below 0 or
above nc ≠ 1.
11
Figure 1: Examples of finding neighbouring cells in the x direction on a periodic mesh; right shows how
indices wrap around.
To get the acceleration of particle i at position ˛ri we simply need to get the value of the gravitational field at
this point:
˛ai = ˛g(˛ri).
Since ˛g is defined on our mesh like all our other functions, we simply need to find which cell in the mesh the
position ˛ri falls in and then get the value of ˛g there.
0.6 Making the Universe Expand
In reality the universe doesn’t just stay one size, but it expands over time. As the universe expands, our
particles move with it: remember that we are using co-moving coordinates so that positions are always
relative to the size of the box. Co-moving coordinates means:
• Positions and velocities are relative to the size of the box.
• The coordinates of the particle positions don’t change when the box changes size W. This is because
particles move apart as the universe expands.
• The velocities need to scale down when the box expands. The velocities that we keep track of are
the motion of the particles on top of the background expansion, and these shouldn’t increase along with
the size of the box.
To model a universe which expands we need to do two things:
1. Scale W by a factor F, so W(t + t) = W(t) ◊ F
2. Divide the velocities of all particles by F
Remember that other variables can depend on W, like the physical width of a cell wc = W
nc , and calculations
like the density and gradient rely on the physical size of a cell. During each simulation time step
you will need to make sure that you are using an up to date value for W and other variables
derived from it.
12
0.7 Summary: the Simulation Algorithm
The full simulation involves setting an initial states and looping over just five functions. It can be summarised
as follows:
1. Set up np particles with coordinates in the unit box.
• Initial positions are usually uniform random and initial velocities are 0.
2. Define the width of the box W, the expansion rate F, and the width of the mesh in cells nc.
3. Starting at t0, apply the following loop until t>tmax:
• Calculate density from particle positions (counts in cells)
• Calculate the potential:
– Forward FT applied to density
– Solve for potential (frequency space)
– Inverse FT to get (real space) potential
• Calculate gradient to get gravitational field (acceleration)
• Update particles positions and velocities
• Scale the box (scale up W and scale down velocity)
0.8 The FFTW Library
This project will make use of the very common FFTW Library. The documentation for this library can be
confusing, so we will explain how to use the relevant parts of the library. The most important thing that we
need to understand is how data is formatted going into and out of these functions, and how data gets scaled
by these functions.
We will describe the use of the FFTW library only as it applies to the specific functions and data structures
that we use in this assignment. These statements should not be taken as fully general and may not hold for
other methods in the FFTW library which have dierent properties!
13
All of the following types and methods are available in the fftw3.h header file.
0.8.1 Data order and ˛
k values
FFTW can perform multi-dimensional Fourier transforms, but the data is provided as a one-dimensional
array, which means we must adopt some ordering for multi-dimensional data.
We can convert the 3D indices (i, j, k) to a flat index µ using the following expression:
µ = k + nc(j + nci),
where i is indexing along the x axis, j along the y axis, and k along z.
• We will usually write the three dimensional indexing (i, j, k) when describing what needs to be done in
this assignment, as it is easier to understand the geometry of the problem in this way.
• You will need to use this conversion to fill out your FFTW buers in practice and to understand the
arrangement of data in memory.
• This data ordering is used for both input and output buers.
In order to calculate the potential, we need to solve an equation with ˛
k in it,
˜(˛
k) = ≠4fi
k2 fl˜(˛
k).
Or, treating our functions as indexed on the mesh, we could write:
˜(i, j, k) = ≠ 4fi
k2(i,j,k)fl˜(i, j, k)
This means that we need to associate a value of ˛
k with each cell for our frequency space function.
The ˛
k components (kx, ky, kz) all range over the values [0, 1
W , 2
W , ..., nc≠1
W ]. The value of ˛
k at indices (i, j, k)
is:
˛
k(i, j, k)=( i
W , j
W , k
W ).
0.8.2 Scaling
The Fourier transforms computed by FFTW are not normalised. For a normalised FT we would have:
FT≠1(FT(f(x))) = f(x).
However, using the fast Fourier transform as defined in FFTW, we have the following relation instead
FFTW≠1(FFTW(f(x)) = (2nc)3f(x),
for a 3D Fourier transform. (More generally, it is scaled by 2ni where ni is the number of cells in each
dimension.) Since our method involves a forwards and inverse transform, we pick up this extra factor of 8n3
c
along the way, which requires the elements in the buer to be rescaled. We’ll take care of this scaling when
we calculate the potential .
0.8.3 FFTW input / output buers
The FFT method that we will use takes arrays with elements of the type fftw_complex, which is included in
the <fftw3.h> header. This is just an alias for the type double[2], so we can access the real and imaginary
parts using code such as the following:
#include <fftw3.h>
#include <iostream>
int main()
{
fftw_complex z;
14
//set real part
z[0] = 1.5;
//set imaginary part:
z[1] = 0.2;
//print complex number
std::cout << z[0] << "+" << z[1] << "i" << std::endl;
return 0;
}
FFTW has its own methods for allocating a buer in memory, which should be used whenever using the
library.
The following line allocates an array called buffer to contain N_elements values of type fftw_complex. The
type of buffer is fftw_complex *, which is a pointer i.e. it is implemented as a C-style array.
fftw_complex * buffer = (fftw_complex *) fftw_malloc(sizeof(fftw_complex) * N_elements);
Note that this does not initialise the elements in the buer.
• The FT has separate input and output buers, and both must be allocated before the FT is called.
• Whatever is in the output buer will be rewritten with the result of the FT.
• The input and output buers are the same size.
Buers which are allocated with fftw_malloc must also be freed using fftw_free, for example:
fftw_complex * buffer = (fftw_complex *) fftw_malloc(sizeof(fftw_complex) * N_elements);
fftw_free(buffer);
0.8.4 FFTW plans and executing the Fourier transform
FFTW uses “plans” to describe how the Fourier transform that needs to be performed. Plans determine how
many dimensions the data has (ours is three dimensional), what the input and output buers will be, and
whether the transform is a forward or inverse Fourier transform. The plans do not need to be re-made every
time you want to perform the transform: you can re-use a plan as many times as you like as long as
these properties don’t change.
The plans have type fftw_plan, and are declared, allocated, and deallocated as follows:
Declaration and Allocation:
The forward plan is initialised using the fftw_plan_dft_3d function with the following arguments:
fftw_plan forward_plan = fftw_plan_dft_3d(n_x,
n_y,
n_z,
input_buffer,
output_buffer,
FFTW_FORWARD,
FFTW_MEASURE);
• n_x, n_y, and n_z are the size of the three dimensions. These should all be nc since we are using a box
with an equal number of cells in each direction.
• input_buffer is the function to which you want to apply the forward transform. This is a fftw_complex
* type array, and is a real space function f(˛r).
• output_buffer is where you stored the result of the transform. This is also a fftw_complex * type
array, and is a frequency space function ˜f(˛
k).
15
• FFTW_FORWARD and FFTW_MEASURE are constants defined in fftw3.h. You should keep these as they are!
The inverse plan is initialised similarly:
fftw_plan inverse_plan = fftw_plan_dft_3d(n_x,
n_y,
n_z,
input_buffer,
output_buffer,
FFTW_BACKWARD,
FFTW_MEASURE);
• Note that FFTW_FORWARD has been changed to FFTW_BACKWARD
• The input buer should be a frequency space representation g˜(˛
k)
• The output buer is a real space representation g(˛r)
Deallocation:
Both plans need to be deallocated using the fftw_destroy_plan function:
fftw_destroy_plan(forward_plan);
fftw_destroy_plan(inverse_plan);
1 Simulating Particles in a Box (60 marks)
In the first part of this assignment you’ll create a serial (not parallelised) particle-mesh simulation. We will
specify the representation of some functions due to the need to interface with the FFTW library, but for the
most part selecting data structures will down to you, along with appropriate naming and code organisation.
You should think carefully about your use of functions and classes to structure your code.
You should aim to make your code as performant as you can, but still clear, safe, and well structured. You
may assume that optimisations will be turned on to a high level (-O3) for a release build. (If debugging you
should usually keep the optimisations o, unless you are explicitly using the debugger to see how the compiler
has optimised the code.)
We will explicitly ask you to write a Simulation class and a number of functions, but you can, and should,
add further classes and function where you feel that they are appropriate and useful.
1.1 The Initial Repository
Start by familiarising yourself with the starter repository.
There are multiple folders which contain source files:
1. lib for library code (this is where the bulk of your simulation code will be developed).
• include contains library headers.
2. apps for application code.
3. test for test code.
4. benchmarks for benchmarking (timing) code.
You will find in the source files:
1. A skeleton Simulation class that you will need to complete.
2. Some partially coded tests that you will need to adapt to use your classes / data structures.
3. A method for converting density functions to images (.pbm format)
4. A method for calculating a radial correlation function, which will be used to calculate summary statistics
later on and which you may need to adapt to work with your data structures.
You should pay attention to the CMakeLists.txt files as well.
16
1.2 Representing Particles (8 marks)
It is up to you to decide how to represent your particles. You will need to be able to represent a large number
of them (in the millions).
• Each particle must have its own position (˛r) and velocity (˛v), which are three dimensional.
• Positions are always measured relative to the size of the box. This means that each coordinate of
the position is 0 Æ ri < 1, regardless of the width of the box.
• All particles have the same mass, which is constant throughout the simulation.
The initial collection of particles will need to be passed in to the Simulation class. You need to be able to
create custom initial states as well as random ones.
• You must be able to straightforwardly create a custom distribution of particles, for example one particle
in the middle of the box. This will be important for constructing test cases.
– You must be able to set the position of every particle individually, but you may assume that the
initial velocities are all zero.
• You must also write a function or constructor which creates a random distribution of particles. In this
case:
– The position of each particle must be uniform random (between 0 and 1) for each dimension (x, y,
z).
– The velocity of each particle must be zero in all dimensions.
– The seed for the random number generator must be settable. This is vital for reproducability.
TASKS:
1. Decide on your particle implementation and write any class or function definitions that you need.
2. In Responses.md explain the choices that you have made, and point out where the relevant code
can be found. You should consider both ease of use and performance expectations. (You may wish to
come back to this response later if you decide to try more than one approach.)
1.3 The Simulation Class (7 marks)
In your library code you will find a class called Simulation. This class will be used to represent a single
simulation which can be run, including holding relevant data like the state of the particles in the simulation,
the timestep, the length of the simulation (tmax), and the various functions defined on the mesh.
• You may want to build this class representation as you work through the assignment, adding member
variables as you encounter new things to be represented.
• You may make any functions in this class that you want to test public, in order to make
constructing tests and benchmarks practical. For data, or functions which do not require direct
tests themselves, you should make normal use of access specifiers.
TASKS
1. You should add a constructor to your Simulation class which takes the following information:
• The total time tmax.
• The time step t.
• The initial collection of particles.
• The width of the box W.
• The number of cells wide that the box is, nc.
– Remember that this is just the size of the box in one dimension; the total number of cells in
the 3D space will be Nc = n3
c .
• The expansion factor F.
– This is the factor that the universe will expand by each timestep.
You should use your constructor to set and initialise any member variables that you want to keep in the
class. Below is some advice on what data structures are needed for the Fourier transform part of the
code. You should consult also the section of the Background about the FFTW library and its usage.
17
FFTW Buers and Plans
Your simulation class will internally use FFTW as a library for calculating the fast Fourier transforms
necessary for this approach. These methods require storing both the input and output buers, which are the
discrete representations of our density and potential functions on the mesh, and FFTW “plans”.
1.3.0.1 Buers
You will need three data buers for the FT parts:
• Two for real space functions (density fl(x) and potential (x)).
– In principle we could use just one buer, since we don’t need to use fl and at the same time, but
using two will help clarity as it prevents ambiguity over what is contained in the buer at any
given time. This also allows us to keep both the density and potential functions until the end of
an iteration, which can be useful for outputs and debugging. If you are worried about your total
memory usage however, you can use just one buer.
• One for frequency space values. This will only be used during the calculation of the potential function,
and will be used to represent both fl˜(k) and ˜(k). After the forward transform, this buer will contain fl˜.
This buer then gets modified (in place) to represent ˜, before being input into the inverse transform.
– Using one buer here is less confusing because it only needs to be used within one single function.
If we were interested in using the frequency space representations elsewhere, and memory usage
was not a major concern, then we might consider splitting this into two buers as well.
Each of the three buers will be of the size n3
c .
1.3.0.2 Plans
You will need to use two plans: one for the forward transform and one for the inverse transform.
• The forward transform must take the real space density buer as input and the frequency space buer
as output.
• The inverse transform must take the frequency space buer as input and the real space potential buer
as output.
1.4 Calculating the density function (7 marks)
The density function, as with all functions in our 3D space, is defined on our nc ◊ nc ◊ nc mesh. The density
function is defined by a single value in each cell of this 3D gridding, and is stored in one of our fftw_complex
* buers. This buer is a contiguous piece of memory, i.e. all n3
c values (of type fftw_complex) are stored in
sequential memory addresses.
Our mesh is divided into cells with indices (i, j, k) where:
• i ranges over 0, 1, . . . , nc ≠ 1 and represents cell indices in the x coordinate,
• j ranges over 0, 1, . . . , nc ≠ 1 and represents cell indices in the y coordinate,
• k ranges over 0, 1, . . . , nc ≠ 1 and represents cell indices in the z coordinate.
Since our coordinate system is scaled so that x, y, z are always between 0 and 1, the cell at indices (i, j, k)
covers coordinates in 3D space where:
• 1
nc i Æ x < 1
nc (i + 1),
• 1
nc j Æ y < 1
nc (j + 1),
• 1
nc k Æ z < 1
nc (k + 1).
18
The corresponding index µ of the buer is:
µ = k + nc(j + nci).
The value of in the density function for a given cell with indices (i, j, k) is equal to the number of particles
whose position falls within that cell, multiplied by mp
Vc where Vc = w3
c is the volume of the cell. (N.B. The
volume of the cell Vc is calculated using wc = W
nc i.e. does not use scaled coordinates. As the
size of the box changes, the volume of a cell also changes.)
TASKS
1. Write a function in your Simulation class to calculate the density function and fill in the buer.
• The imaginary part of every value in the density buer should be set to 0 since the density
is a real function.
• Bear in mind that this function will be called every time step.
2. Test your density function. You should check that:
• All imaginary components are 0.
• Density is everywhere 0 if there are no particles.
• Density is correct when a single particle is present.
– Only a single cell should therefore have non-zero density. You should check that the value and
index of this is correct.
• Density is correct when multiple particles are present.
1.5 Calculating the Potential (7 marks)
In order to calculate the potential we need to make use of our Fourier transform.
Calculating the potential involves 3 steps:
1. Apply the forward Fourier transform (input: fl(x), output: fl˜(k))
2. Apply scaling to the frequency space representation (see below).
3. Apply the inverse Fourier transform (input: ˜(k), output: (k))
Applying the forward/inverse transform simply requires calling fftw_execute (defined in fftw3.h) with the
correct plan:
fftw_execute(forward_plan);
// ...
// manipulate the frequency space buffer
// ...
fftw_execute(inverse_plan);
After the FFT is complete the output buer will contain the non-normalised transformed density function
(see the FFTW section in the Background). We can use this to calculate the potential by scaling this buer
by k≠2.
The ordering of elements in the frequency space representation is the same as the real space array. In the
frequency representation, a function is represented as a sum of periodic functions, which oscillate with dierent
frequencies in the x, y, and z directions. How many quickly the function oscillates in each direction is given
by the vector ˛
k. We need to associate a vector ˛
k with every cell in our mesh.
The value of ˛
k for indices (i, j, k) is:
˛
k = (kx, ky, kz) = 1
W (i, j, k)
k2 = ˛
k · ˛
k = 1
W2 (i
2 + j2 + k2)
19
(˛
k) = ≠4fi
k2 fl(˛
k)
• The first value of k2 is 0, which would produce a singular result due to the division. Instead of dividing
the density by 0 here, we should just set this element of the buer to 0.
– This does not aect the result of the simulation since this is the “0 frequency” component of the
potential i.e. it represents a constant function in space with no oscillations at all. The impact that
it has on the real-space representation of the function is therefore just an additive constant; since
we only care about the potential in order to take a gradient to find the acceleration, this constant
does not contribute at all to our dynamics and we can ignore it.
• You must apply this scaling to both the real part and imaginary part of frequency space buer,
since we have used a complex FT.
There is also a multiplicative normalisation factor to apply:
fnorm = (8n3
c )≠1.
This needs to be applied to the real and imaginary part of the entire buer as well.
Finally, calculate the real space potential by transforming back using the inverse Fourier transform.
• Once we have transformed back, we can treat the potential function as being real, and we will ignore
any values in the imaginary part of the buer.
TASKS
1. Add a function to your Simulation class to calculate the potential function.
2. Test your potential function using the partially coded example in test_simulation.cpp.
• You will need to fill out this test to make use of the implementations that you have defined so far.
• This test for the potential is very approximate but should be able to determine whether your calculation
is happening correctly.
• It looks at the potential due to a single point particle in the middle of your box.
• Because the conditions are periodic, this is actually the potential due to an infinite series of particles
spaced W apart from one another!
• The test builds a very approximate expected potential function along the x-axis of the box and compares
it to the potential function that you calculate.
• We only care about the real component of the potential buer; the imaginary component can be ignored.
IMPORTANT: The expected potential function should be negative. If your partially coded example does
not have a - sign in line 47 (double expected_pot = -mass *(1/ dx1 + 1/dx2 + 1/dx3);) then please
add one in.
You may also want to inspect what your potential function looks like by saving the potential function along
this axis and making a graph to check that it has the right shape.
1.6 Calculating acceleration (7 marks)
The gravitational field in space is just the negative of the gradient of the potential function. Since this is a
3D space, the field is a vector ˛g = (gx, gy, gz) = ≠( d
dx , d
dy , d
dz ), so there are three values defined for every cell
in the mesh.
Note that the gravitational field is real valued, and when calculating the gradient you should ignore the
imaginary part of the potential buer.
TASKS
1. Write a function to calculate the gradient of a function defined on your mesh using the finite dierence
method in the background.
• This should take the function to be dierentiated as an argument to make it easier to test.
• Remember that the functions in the box are periodic, so indices should wrap around if they go below
0 or above nc ≠ 1.
20
2. Test your gradient function.
• You do not need to test that the gravitational field for a particle is physically accurate. The function
merely calculates a gradient of an arbitrary input.
• You should test that your function correctly calculates the (approximate) gradient for some test case(s).
1.7 Updating particles (5 marks)
You now need to be able to update the state of your particles according to their acceleration and velocity.
This is called after the gravitational field functions have been calculated for a given timestep.
TASKS
1. Write code to update the acceleration, velocity, and position of each of your particles.
• The acceleration of a particle should be value of the gravitational field in the cell that the particle falls
within.
• Calculate the new velocity as ˛vÕ = ˛v + ˛a ◊ t.
• Calculate the new position as ˛rÕ = ˛r + ˛vÕ ◊ t. (This should use the updated velocity value.)
2. Test your particle update function.
1.8 Expanding the box (3 marks)
We model the expansion of the universe by scaling up the width of the box. This does not change the
position of the particles, as they move with the expansion of the box.
TASKS
1. Write a function which expands the box:
• Multiply the width of the box W by the expansion factor F.
• Divide the velocities of all particles by the expansion factor F.
• If you store any other variables which depend on scale, for example the width of a cell, then these
also need to be updated.
1.9 The simulation loop (2 marks)
It’s now time to complete the simulation class by writing a loop which links all our functions together and
simulates the evolution of our system over time.
*TASKS
1. Complete the Simulation::run function to implement the simulation loop, starting at t = 0 and
stepping by t until t Ø tmax.
In each step you will need to (in this order):
• Calculate the density function
• Calculate the potential
• Calculate the acceleration
• Update the particles
• Expand the box
1.10 Complexity (6 marks)
We mentioned earlier that the particle-particle (PP) method has O(n2
p) complexity due to its pair-wise
calculations. Having implemented all steps of the particle-mesh (PM) method you should now be able to
comment on the complexity of the PM approach. Unlike PP, the PM approach has two variables which
determine the size of the problem: the number of particles and the number of cells in the mesh. You can
consider how the problem scales with each of these separately.
21
TASKS
In Responses.md, use Big-O notation to give the time complexity with respect to Nc, the total number of
cells (Nc = n3
c ), and np (the total number of particles) for the following routines in your program:
1. the density calculation,
2. the potential calculation,
• You may assume that the execution of the FFT is O(n log(n)) for an input/output buer of total size n.
3. the gravitational field calculation (calculating the gradient of the potential),
4. and the particle update function.
Explain your reasoning in Responses.md.
Given that the number of particles must always be larger than the total number of cells in the mesh (in order
to achieve a smooth density function), do you expect the Particle-Mesh method to scale better or worse than
the Particle-Particle method for large simulations?
1.11 Application 1: Visualising A Developing Universe (8 marks)
The first application that you will write will produce a series of images like those below:
Each of these images is a visualisation of the density function on the mesh at a point in time during the
simulation. The simulation starts from a uniform random distribution and particles with 0 velocity, and
evolves for a period of time.
The functionality to produce the images from the denisity buer is provided for you.
TASKS
1. In app, complete the command line application NBody_Visualiser.cpp. If the -h flag is used it should
display a help message; in this case the simulation should not run. Otherwise, it should take the
following command line arguments:
• -nc: the number of cells wide the box is (nc),
• -np: the average number of particles per cell,
• -t: the total time,
• -dt: the time step,
• -F: the expansion factor,
• -o: an output folder,
• -s: and the random seed.
2. It should set up and run a simulation with the following properties:
• The total number of particles np is n3
c multiplied by the average particles per cell (given in the
command line arguments).
• The initial distribution of particles must be uniform random using the given random seed.
• The width of the box W should be 100.
• The mass of the particles should be 105
np . For example, if there are 10 million particles, the mass
should be 0.01. This keeps the total amount of mass the same regardless of the number of particles
that you run with.
22
3. Modify the run method to take an std::optional<string> which defaults to nullopt. If not null,
the run method should save images of your density distribution to the given output folder every 10
timesteps. Your application should call this run function with the output folder string provided by the
command line arguments.
• A function to save your density distribution as an image is provided in Utils.hpp/Utils.cpp.
4. Your images should be clearly named so that their order is obvious.
Run your program (./build/bin/NBody_Visualiser <args>) and record the results with the following
recommended settings:
• Expansion factor = 1.0, and expansion factor = 1.02. This is the only parameter which should change
between recorded runs.
• Number of cells wide = 101. (If you program is too slow then you can scale this down.)
• Particles per cell = 10.
• Total time = 1.5.
• Timestep = 0.01.
• Fixed random seed of your choice. (This means that both runs should have identical initial states. The
final state should dier due to the dierence in expansion.)
The images from each run should be saved into appropriately named subfolders of the Images folder.
Make sure you have optimisations turned on, but if your program runs too slowly then you can adjust
the number of cells down. You should not have the particles per cell any lower than 10 in order to get a
smooth density function; if your code runs quickly, feel free to increase the particles per cell to get a more
detailed simulation!
Document in your Responses.txt exactly what parameters you used to run, and where to find the images.
2 Parallelising Your Simulation (30 marks)
In this part of the assignment you will use OpenMP to parallelise your simulation code, and MPI to run
ensembles of simulations. You will also use timing code to benchmark your simulations and explore how the
performance scales. You will only need to be concerned with parallelising you own code: you can ignore
the FFTW library calls and the helper functions which we have provided for your use (SaveToFile and
CorrelationFunction).
2.1 Shared Memory Parallelism (15 marks)
You should go through the functions you have written for your simulation and decide which you can parallelise
with OpenMP.
The loop in run is inherently serial, since each iteration of the loop depends on the final state of the iteration
before it. The functions inside the loop also cannot be executed out of order. However, you can look into
each of the functions that you call from inside the loop to see if they themselves can be parallelised.
TASKS
1. Write in Responses.md each of the functions which you identify as being possible to parallelise.
2. Use OpenMP to parallelise these loops if you can.
• If you think something can be parallelised in principle but you are not sure how to, make a note of it in
your Responses.md. Describe what you think you would need to be able to in order to implement such
a parallelisation.
3. Benchmark (see below) these pieces of code using 1-4 threads (or more, if available on your CPU).
4. Don’t forget to run your tests after parallelising to make sure you haven’t broken anything! You should
also re-run your simulation and check that you get the same results with multi-threaded and single
threaded runs.
23
Do NOT parallelise the random initialisation of the particle positions. This would introduce
non-determinism in the assignment of the positions and prevent your simulation from being
reproducable. This MUST remain serial.
Benchmarking
Benchmarks should be implemented in benchmarks.cpp. In this file we have provided a class called
BenchmarkData which has some built in methods to time your functions as well as print out information.
You should look carefully at this class and use it to print clear and useful information about your timing
experiments. Alternatively, you can write your own code to do this.
For each function which you parallelise:
• Set the number of threads using omp_set_num_threads defined in omp.h. This is equivalent to setting
the OMP_NUM_THREADS environment variable, but allows us to change this from inside the code.
• When setting up simulations to benchmark, use properties (number of cells and particles) which are the
same as when you ran your app, or larger if you need more time to get an accurate result. Make sure
that any properties that you use which might aect the time of the benchmark are properly reported
(e.g. by setting the info property in the BenchmarkData class.)
• Report the timing of each of the functions for dierent numbers of threads in Responses.md.
• If you experiment with using collapse or schedule, then include the results of these experiments in
your Responses.md as well. Any collapse or schedule statements which are not useful may be removed
from your program, but should be properly reported in the responses.
If an attempt to parallelise a function lead to an increase in time, or result in failed tests, then report this in
Responses.md and then comment out the OMP pragma. Do not delete it entirely, so that we can see
you attempts at parallelisation.
2.2 Application 2: Comparing Universes with Distributed Memory (15 marks)
In this section you will create a new application which uses MPI to run multiple simulations in separate
processes. Each process will run its own simulation with the same initial particle distribution, but a dierent
expansion factor. Once the simulation is complete, the process will calculate a summary statistic (the radial
correlation function), which is represented as a list of values. One process, the master process, will gather the
data from the other processes and output it to a single file so that it can be plotted.
The number of processes is determined by the MPI run command:
mpirun -np <num_processes> <application> <args>
e.g.
mpirun -np 4 ./build/bin/NBody_Comparison <args>
would run the application with 4 processes.
TASKS
1. Your program should take command line arguments for:
• An output folder,
• a minimum expansion factor,
• a maximum expansion factor.
1. You should create one simulation per process. These should dier by their expansion factor F; the
dierent expansion factors should be evenly spaced between the minimum and maximum given in the
command line arguments.
24
• You will need to get the number of processes to work out this spacing.
2. Each simulation should otherwise have the same parameters as your earlier imaging runs.
• These should be hard coded to simplify the command line interface.
• Every simulation must start with the same initial state (initial particle distribution).
3. Each process should run a simulation; you should pass nullopt to the run method now as we don’t
want our MPI programs to output images.
4. Once a process has finished running its simulation, it should use the correlationFunction method
provided in Utils.hpp / Utils.cpp in order to calculate a summary statistic (the log of the radial
correlation function) for the end point of this simulation.
• You can modify this function and/or move it in order to make it work with your choice of data structures
and access specifiers. Be sure to note what you’ve done in your Responses though so it is easy to find.
• The function as provided takes a list of positions as a vector<array<double, 3>>, and a number of
bins as an int. As with the previous coursework, the number of bins is essentially the resolution of the
histogram that this function calculates. It returns the log of the calculated correlation function as a
vector<double> of length n_bins.
• You should call this with n_bins of around 100.
5. Each non-master process should send its correlation function to the master process.
• You can get the pointer to the underlying data buer for a vector<double> using v.data() for a
vector v.
6. The master process should then output all the correlation functions to a csv file.
• The first row (header) should label each column with the expansion factor
• Each correlation function should then be a column
Run your mpi program with at least 2 processes and:
• Report your command line arguments in the Responses.md.
• Save the csv file in the Correlations folder.
• You should then make a plot of this data using python or gnuplot, which should also be saved to the
Correlations folder.
• The plot should feature all of the correlations as lines on a single plot, labelled with their correct
expansion factor.
3 Additional Considerations (10 marks)
As in the first assignment, you are expected to engage in good software engineering practices. Specifically,
you will be marked on:
• Appropriate naming for functions and classes.
• Good code organisation (i.e. appropriate choice of files and folders)
• Informative and useful git commit log.
• Eective in-code commenting.
• Code formatting.
You should make sure that you README.md contains clear and correct instructions for building and using your
code, including examples.
Your Responses.md must clearly label which section of the assignment you are responding to for
every response you give.
Hints:
• Familiarise yourself with writing good git commit messages.
• Familiarise yourself with Best practices for writing code comments.
25
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:821613408 微信:horysk8 电子信箱:[email protected]
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。