联系方式

您当前位置:首页 >> C/C++编程C/C++编程

日期:2024-11-04 09:34

Sorbonne Université – SESI M2

——–

MU5IN160 – Parallel Programming

Hands-on Session 6 – Dataflow for Motion Application

Very important, about the submission of your work At the end of this session you will have to

upload the following files on Moodle: 1) a zip of the src folder and 2) a zip of the include folder. After

that you will have 2 weeks to complete your work and to update your first submission. You have to work

in group of two people but each of you will have to upload the file on Moodle. Finally, please write your

name plus the name of your pair at the top of all these files.

Short introduction In this session, we will work on a streaming application that detects and tracks

moving objects from a video sequence. Contrary to the previous sessions, we will not use EasyPAP this

time. The later is not adapted for streaming applications. A working streaming application will be given

to you and you will have to use StreamPU to implement the Motion application through an explicit

dataflow representation.

1 Appetizer

First you need to clone the repository of the Motion project:

git clone --recursive https://gitlab.lip6.fr/parallel-programming/motion-sesi.git

The Motion project uses CMake in order to generate a Makefile: follow the README instructions to

compile the code.

grayscale image

(t − 1) motion detection Σ∆ (per pixel) mathematical morphology

opening-closing

connected components

labeling (CCL)

connected components

analysis (CCA)

surface filtering

grayscale image

(t) motion detection Σ∆ (per pixel) mathematical morphology

opening-closing

connected components

labeling (CCL)

connected components

analysis (CCA)

surface filtering

k-nearest neighboor

matching (k-NN)

temporal

tracking

grayscale pixels

p ∈ [0; 255]

blob of binary pixels

p ∈ {0, 1}, 0 → stationary, 1 → moving

image of labels

l ∈ [1; 232 − 1]

CCs = list of regions,

surface S & centroid (xG, yG)

sub-list of CCs with S ∈ [Smin, Smax]

list of (t − 1, t)

associations

final list of

moving objects

Figure 1: Motion detection and tracking processing graph. In gray and italic: the output of each

processing.

Fig. 1 presents the different algorithms used to detect moving objects and to track them over time. To

make it work, two strong assumptions are made: 1) the camera is fixed, 2) the light intensity is constant

over time. First, an image is read from a camera (or a video sequence) and then it is converted in a

grayscale image. Then, the Σ∆ algorithm is triggered. This algorithm is able to detect if a pixel is

moving over time. It returns a binary image, if a pixel value is 0, then it means that it is not moving.

Otherwise, if a pixel value is 1, then it means that it is moving. After that, morphology algorithms are

applied1

. This is a pre-processing to regroup moving pixels together and eliminate isolated pixels. Then,

from a binary image, a connected components labeling (CCL) algorithm is performed. The later, gives

the same label to a group of pixel that are connected to each other. CCL returns an image of labels where

l = 0 means no object and l > 0 means a moving object. From this image of label, some features are

extracted (CCA): for each object the center of mass (xG, yG), the bounding box ([xmin, xmax, ymin, ymax])

and the surface S are extracted. Depending on their surface, the objects are filtered (Smin < S < Smax).

From two images at t − 1 and t, a matching algorithm determines which objects are the same in the two

different images (mainly according to their distance). At the end, the identified objects are tracked to

have a constant identifier over time.

This graph of tasks is then repeated until the video sequence is over. It is not mandatory to understand

perfectly each algorithm. The purpose of this session is to work on a streaming application, representative

of a real application, and to perform optimizations at the task graph level.

1Mathematical morphology: https://en.wikipedia.org/wiki/Mathematical_morphology

In this graph, two tasks cannot be replicated. The per pixel motion algorithm requires its previous

output to compute the current binary image. It detects intensity variations over time. It is almost the

same for the tracking algorithm that maintains a list of tracks that are updated according to the last

frame.

If you’d like to better understand the algorithms used in this project, some of them are described in more

detail in the document’s appendix. In any case, it’s worth noting that you don’t need to understand

exactly what these algorithms do to complete this lab.

1.1 Run Motion

To run the code you will need some input videos. You can download a videos collection on Moodle (see the

“Artifacts” section) or from this web link: http://www.potionmagic.eu/~adrien/data/traffic.zip.

First, unzip the traffic.zip and from the build directory run the code with the following command:

./bin/motion2 --vid-in-path ./traffic/1080p_day_street_top_view_snow.mp4 \

--flt-s-min 2000 --knn-d 50 --trk-obj-min 5 --vid-out-play --vid-out-id

You should see a window with a top view of a highway and some moving cars (see Fig. 2) and you should

see green bounding boxes around the cars.

Figure 2: Motion screenshot (with –-vid-out-play –-vid-out-id parameters).

1.2 Architecture of the Project

Motion is mainly a C-style project but it is compiled in C++ to use StreamPU. The sources are

located in the src folder, and there are 3 sub-folders:

• common: contains implementations of the processing tasks,

• main: contains source files that correspond to a final binary executable,

• wrapper: contains C++ files to wrap the C-style processing functions into StreamPU modules

and tasks.

The headers are located in the include folder. Inside there are two sub-folders: c/motion for the C-style

headers and cpp/motion for the C++ headers.

Page 2

2 From Imperative to Dataflow Programming

We will convert the motion2 main into a dataflow description (= StreamPU modules and tasks). The

motion2 is located here src/main/motion2.c. This implementation is very close to the task graph

presented in Fig. 1.

Task #1 Understand the code, run the motion2 executable and play with the parameters (-h shows

and describes the available parameters).

To help you in the task, we created an other main based on motion2.c and we converted some C functions

into StreamPU modules for you. See the motion2_spu.cpp file.

Task #2 Understand the code, run the motion2-spu executable and play with the parameters (-h

shows and describes the available parameters). Understand the code of motion2_spu.cpp by comparing

it with the C-style motion2.c code.

Task #3 Create new StreamPU stateful modules, each time you will create new .cpp and .hpp

files in the wrapper folders. You will only declare input and output sockets (DON’T use forward

sockets at this time):

1. Sigma_delta: Add a StreamPU compute task that will call the sigma_delta_compute function,

2. Morpho: Add a StreamPU compute task that will call the C morpho_compute_opening3 and

morpho_compute_closing3 functions,

3. CCL: Add a StreamPU apply task that will call the C CCL_LSL_apply function,

4. Features_CCA: Add a StreamPU extract task that will call the C features_extract function,

5. Features_filter: Add a StreamPU filter task that will call the C features_filter_surface

and features_shrink_basic functions (note that the maximum input size of the features differs

from the maximum size of the output features: indeed, the main purpose of the shrink function is

to reduce the maximum number of features and to save memory space),

6. KNN: Add a StreamPU match task that will call the C kNN_match function,

7. Tracking: Add a StreamPU perform task that will call the C tracking_perform function.

Add the StreamPU modules and tasks incrementally in the motion2_spu.cpp file and you will test

if their integration is working (you can compare the logs with a diff, see Note #2 below). Have a look

on how we did this for the other StreamPU tasks that are given to you. You will follow the same

philosophy: 1) bind the sockets to the buffers allocated in the main file and 2) call the exec() method

explicitly.

Note #1 It is NOT possible to create sockets of RoI_t structure. Only the basic C types are supported.

To get around this limitation you can count the number of bytes in the structure. For instance, you can

do something like:

auto si_RoIs = this->template create_socket_in<uint8_t>(t, "in_RoIs", max_size * sizeof(RoI_t));

Note #2 motion2 is our golden model. To compare the results of motion2 and motion2-spu you need

to generate the logs of motion2 executable first (we do it for only 20 frames to execute faster):

./bin/motion2 --vid-in-path ./traffic/1080p_day_street_top_view_snow.mp4 \

--vid-in-stop 20 --flt-s-min 2000 --knn-d 50 --trk-obj-min 5 --log-path logs_refs

Secondly, you need to generate the logs of the motion2-spu executable:

Page 3

./bin/motion2-spu --vid-in-path ./traffic/1080p_day_street_top_view_snow.mp4 \

--vid-in-stop 20 --flt-s-min 2000 --knn-d 50 --trk-obj-min 5 --log-path logs_spu

Finally you need to compare the logs together:

diff logs_refs logs_spu

If the later command returns nothing, it means that motion2 and motion2-spu are equivalent (in term

of features). This is good, your new implementation is correct! If not... it is time to debug :’-(.

Task #4 At this point, you should only have StreamPU tasks that call their exec() method explicitly

(no more C style function calls). However, the code is still using the data allocated in the main function.

This can be improved because StreamPU performs the data allocation and deallocation automatically

for you. In order to remove most of these allocations you have to perform partial “output to input socket”

bindings. For instance, if we only consider to eliminate the IB0 buffer, it is possible to remove “pointer

to output socket” bindings and to add “output to input socket” bindings instead, as shown in Code 1.

Do it for all the buffers, EXCEPT for IG0 and IG1. It is strongly advised to do it step by step and to

check if the code is giving exactly the same results after each modification (please refer to Note #2).

// [...]

// step 1: motion detection (per pixel) with Sigma-Delta algorithm

sd0["compute::in_img"].bind(IG0[0]);

// sd0["compute::out_img"].bind(IB0[0]); // this line can be removed

sd0("compute").exec();

// step 2: mathematical morphology

// mrp0["compute::in_img"].bind(IB0[0]); // this line can be removed

mrp0["compute::in_img"] = sd0["compute::out_img"]; // <-- [NEW] output to input socket binding

// mrp0["compute::out_img"].bind(IB0[0]); // this line can be removed

mrp0("compute").exec();

// step 3: connected components labeling (CCL)

uint32_t n_RoIs_tmp0;

// ccl0["apply::in_img"].bind(IB0[0]); // this line can be removed

ccl0["apply::in_img"] = mrp0["compute::out_img"]; // <-- [NEW] output to input socket binding

ccl0["apply::out_labels"].bind(L10[0]);

ccl0["apply::out_n_RoIs"].bind(&n_RoIs_tmp0);

ccl0("apply").exec();

// [...]

Source code 1: Example of partial socket binding to eliminate IB0 buffer allocation/deallocation in the

main function.

Task #5 Now, replace IG0 and IG1 buffers by the binding of the video["generate::out_img_gray8"]

socket. For this, you will need to use a Delayer module in order to keep the t − 1 image in memory

(previously kept in the IG0 buffer). If you don’t use it, the t − 1 image will always be overwritten when

executing the video("generate") task.

Note #3 In the motion2 executable, some tasks are not executed in the first stream (see the following

condition in the motion2.c file: “if (n_processed_frames > 0)”). To manage it you have two possible

options:

• Always execute the tasks (no control flow) but in this case you need to carefully initialize the

Delayer module to the first frame with the Delayer::set_data() method (this solution is

simpler to implement),

Page 4

• Use a Switcher and a Controller_limit module to implement the control flow (= if condition).

To simplify, you will only put the Sigma_delta.compute() task in the condition. In other terms,

the CCL, the CCA and the filtering will be executed anyway.

Task #6 At this point you should not have memory allocations and deallocations anymore in the main

function. Next objective is to get rid of the multiple exec() calls over the tasks. You will separate the

binding from the execution. To do this, the socket bindings need to be moved outside of the while(1)

loop and the while(1) loop needs to be replaced by a StreamPU Sequence. Once it is done, only one

exec() call should remain: the one over the newly created Sequence object. Of course, you will check

if it works correctly (please refer to Note #2).

Sub-sequence 0

Delayer

exec order: [13]

addr: 0x16b1660a8

memorize (id = 13)

Tracking

exec order: [14]

addr: 0x16b166468

perform (id = 14)

Sigma_delta

Sigma_delta1

exec order: [7]

addr: 0x16b166cc8

compute (id = 7)

Morpho

Morpho1

exec order: [8]

addr: 0x16b166b38

compute (id = 8)

CCL

CCL1

exec order: [9]

addr: 0x16b1669a8

apply (id = 9)

Features_CCA

CCA1

exec order: [10]

addr: 0x16b166808

extract (id = 10)

Features_filter

Ftr_filter1

exec order: [11]

addr: 0x16b166618

filter (id = 11)

KNN

exec order: [12]

addr: 0x16b166548

match (id = 12)

Delayer

exec order: [0]

addr: 0x16b1660a8

produce (id = 0)

Sigma_delta

Sigma_delta0

exec order: [1]

addr: 0x16b166d98

compute (id = 1)

Morpho

Morpho0

exec order: [2]

addr: 0x16b166c00

compute (id = 2)

CCL

CCL0

exec order: [3]

addr: 0x16b166a70

apply (id = 3)

Features_CCA

CCA0

exec order: [4]

addr: 0x16b1668d8

extract (id = 4)

Features_filter

Ftr_filter0

exec order: [5]

addr: 0x16b166710

filter (id = 5)

Video

exec order: [6]

addr: 0x16b166e98

generate (id = 6)

out[0]:out

in[0]:in_img

out[1]:status

out[1]:out_img

in[0]:in_img

out[2]:status

out[1]:out_img

in[0]:in_img

out[2]:status

out[1]:out_labels

in[0]:in_labels

0

in[0]:in_labels

1

out[2]:out_n_RoIs

in[1]:in_n_RoIs

0

in[1]:in_n_RoIs

1

out[3]:status

out[2]:out_RoIs

in[2]:in_RoIs

out[3]:status

out[3]:out_labels out[4]:out_n_RoIs

in[1]:in_n_RoIs0

out[5]:out_RoIs

in[0]:in_RoIs0

out[6]:status

out[0]:out_img

in[0]:in_img

0

in[0]:in

1

out[1]:out_frame

in[0]:in_frame

out[2]:status

out[1]:out_img

in[0]:in_img

out[2]:status

out[1]:out_img

in[0]:in_img

out[2]:status

out[1]:out_labels

in[0]:in_labels

0

in[0]:in_labels

1

out[2]:out_n_RoIs

in[1]:in_n_RoIs

0

in[1]:in_n_RoIs

1

out[3]:status

out[2]:out_RoIs

in[2]:in_RoIs

out[3]:status

out[3]:out_labels out[4]:out_n_RoIs

in[3]:in_n_RoIs1

0

in[2]:in_n_RoIs

1

out[5]:out_RoIs

in[2]:in_RoIs1

out[6]:status

out[4]:out_RoIs0 out[5]:out_RoIs1

in[1]:in_RoIs

out[6]:out_nearest out[7]:out_distances out[8]:status

out[1]:status

out[3]:status

Figure 3: Expected StreamPU task graph without logs, without visualization and without control flow.

Note #4 To help you in the debugging, you can print the sequence graph with the export_dot method.

Enable/disable the logs, enable/disable the visualization and observe the impact on the task graph. If

you chose to do not implement control flow, the output graph should looks like in Fig. 3. Note that you

can personalize the name of a module with the set_custom_name(std::string custom_name) method.

Task #7 Before the sequence execution, you will enable the statistics of the task (call the get_modules

method on a sequence object). And after the sequence execution you will print them at the end

(tools::Stats::show function). The application will display the statistics only if there is the --stats

parameter. What do you see? Is it different than from the motion2 executable? Explain.

[Bonus] Task #8 When you think it’s necessary, create new tasks, postfixed with a f, that use forward

socket instead of input/output sockets combination. For instance, if we consider a task named compute

without forward socket, the task that uses forward socket will be named computef. You will NOT

replace the former compute task. Using forward sockets should help you to remove useless copies.

Do it incrementally to validate that the application is still working (see Note #2). Can you see an

improvement in the statistics of the tasks?

Page 5

Appendix

2.1 Sigma-Delta Algorithm (Σ∆)

The motion detection problem consists in separating moving and static areas in each frame. At each

instant, each pixel must be tagged with a fixed/moving binary identifier. When the camera is fixed, such

detection can be performed using the time differences computed for each pixel.

The following notations apply:

• t : current instant of time, used to identify the frames,

• It: grayscale source image at time t,

• It−1: grayscale source image at time t − 1,

• Mt: background image (mean image),

• Ot: grayscale difference image,

• Vt: image of variance (standard deviation) computed for each pixel,

• Lt: binary label image (motion/background), Lt(x) = {0, 1} or Lt(x) = {0, 255} to encode

{background, movement},

• x: the current pixel with (i, j) coordinates.

Most of motion detection techniques in an image sequence It(x) are based on an estimate of the modulus

of the temporal gradient |

∂I

∂t |. If the light intensity of the scene vary slowly (= is constant between two

consecutive images), then a significant variation in the pixel grayscale (above a threshold) between two

images will imply that there is movement at that point.

The Σ∆ algorithm assumes that the noise level can vary at any point. To achieve this, the pixel grayscale

is modeled by a mean Mt(x) and a variance (standard deviation) Vt(x). If the difference between the

current image and the background image is greater than N times the standard deviation, then movement

occurs. The value of N is a parameter. In this project, N is always set to 2.

This is a motion detection system based on the estimation of static background statistics using Σ∆

modulation: an iterative analog/digital conversion method that increments or decrements the digitized

value by one unit according to the result of the comparison between the analog value and the current

digitized value.

Algorithm 1: Sigma-Delta (Σ∆).

1 [Part #1: mean computation]

2 foreach pixel x do // Step #1: Mt estimation

3 if Mt−1(x) < It(x) then Mt(x) ← Mt−1(x) + 1

4 if Mt−1(x) > It(x) then Mt(x) ← Mt−1(x) − 1

5 otherwise do Mt(x) ← Mt−1(x)

6 [Part #2: difference computation]

7 foreach pixel x do // Step #2: Ot computation

8 Ot(x) = |Mt(x) − It(x)|

9 foreach pixel x do // Step #3: Vt update and clamping

10 if Vt−1(x) < N × Ot(x) then Vt(x) ← Vt−1(x) + 1

11 if Vt−1(x) > N × Ot(x) then Vt(x) ← Vt−1(x) − 1

12 otherwise do Vt(x) ← Vt−1(x)

13 Vt(x) ← max(min(Vt(x), Vmax), Vmin)

14 foreach pixel x do // Step #4: Lˆt estimation

15 if Ot(x) < Vt(x) then Lˆt(x) ← 0

16 else Lˆt(x) ← 1

The algorithm initialization for t = 0 is the following: M0(x) ← I0(x) and V0(x) ← Vmin. Then, the

algorithm is applied to the images from t = 1. The Vmin and Vmax constants are used to restrict the

possible values of Vt. Typically, Vmin = 1 and Vmax = 254. The complete algorithm after initialization is

shown in Alg. 1.

In the Motion project, a naive Σ∆ implementation is given to you:

Page 6

• Header: in the include/c/motion/sigma_delta/sigma_delta_compute.h file,

• Source: in the src/common/sigma_delta/sigma_delta_compute.c file.

See the sigma_delta_compute function.

2.2 Mathematical Morphology

In this project, we consider squared elements B of size 3 × 3. Let X be the set of pixels associated with

the B element. There are two basic operations: the dilation of X noted δB(X) and the erosion of X

noted ϵB(X). The application of mathematical morphology operators is similar to filtering operators

(stencils or convolutions), but with non-linear operations.

For binary images, dilation consists in computing a OR on the B neighborhood in the source image and

writing it to the destination image. Conversely, erosion consists in computing a AND on the neighborhood.

So, if a point in the neighborhood is 1, the dilation produces a 1 (since x OR 1 == 1), thus dilating the

binary connected component. Conversely, if only one pixel is 0 in the B neighborhood, the erosion will

produce a 0 (since x AND 0 == 0), thus eroding the connected component.

Erosion is used to reduce noise in images: if we consider that a small group of pixels is the noise that

we’re trying to remove, then applying erosion with a B element of size 3 × 3 will make any group of

pixels with a radius smaller than its size disappear.

Figure 4: Left: the initial binary image. Center: eroded image with a 3 × 3 squared element: the gray pixels are

removed. Right: dilated image with a 3 × 3 squared element: the gray pixels are added. Source: Wikipedia.

Let r be the radius and d = 2r + 1 the diameter of a squared element B, then an erosion of radius r

removes, to any connected component, a thickness of r pixels of contour while a dilation of radius r adds

a thickness of r pixels to the contour (see Fig. 4, note that in the figure the logic is reversed: pixels at 1

are black while pixels at 0 are white).

Figure 5: Left: the initial binary image. Center: opened image with a 3 × 3 squared element: the gray pixels

are removed. Right: closed image with a 3 × 3 squared element: the gray pixels are added. Source: Wikipedia.

From these two operators, two others can be defined: the closing ϕB(X) = ϵB(δB(X)) and the opening

γB(X) = δB(ϵB(X)). Closing reduces (or even completely close) holes in connected components, while

opening does the opposite, enlarging these same holes (see Fig. 5, note that in the figure the logic is

reversed: pixels at 1 are black, while pixels at 0 are white).

One of the advantages of opening and closing is that they preserve the (discrete) size of the regions,

unlike erosion, which reduces it, or dilation, which increases it. Depending on requirements, either a

closing or an opening can be chosen. As these operators are idempotent, applying them several times

does not change the result (which will be identical to that obtained after a single application). On the

other hand, they can be chained (opening and then closing or closing and then opening) to improve the

result image (noise reduction, filling holes, ...). By gradually increasing their radius, we obtain sequential

alternating filters, which are particularly effective for removing noise.

In the Motion project, naive 3 × 3 mathematical morphology implementations are given to you:

• Header: in the include/c/motion/morpho/morpho_compute.h file,

Page 7

• Source: in the src/common/morpho/morpho_compute.c file.

See the morpho_compute_opening3 and morpho_compute_closing3 functions.

Page 8


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:821613408 微信:horysk8 电子信箱:[email protected]
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:horysk8