forked from jtpaulo/dedisbench
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
220 lines (141 loc) · 9.68 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
/* DEDISbench
* (c) 2010 2010 U. Minho. Written by J. Paulo
*/
I/O benchmark:
The benchmark is written in C and can simulate several processes writing/reading fixed size
blocks concurrently into multiple files or in a raw device. The location for the write operations is
given by NURand function from TPC-C benchmark that generates hotspots for I/O
operations. A realistic distribution extracted from real storage systems is
used to generate the blocksÕ content. Moreover, it is possible to
use other distributions and load them from a custom file using DEDISgen. DEDISbench provides two
workload modes, one reproduces a fully or peak loaded system, as Bonnie++, that
performs as maximum write I/O operations per second as possible. The second reproduces
a system under a reasonable nominal load, and can be useful for understanding the
behavior of aliasing operations in a stable system with a limited I/O throughput.
For more information regarding DEDISbench algorithm you may read these two published papers:
- "DEDISbench: A Benchmark for Deduplicated Storage Systems"
- "Towards an Accurate Evaluation of Deduplicated Storage Systems"
DEDISbench distributions:
DEDISbench comes with three distinct distributions extracted from real storage systems with different requirements and assumptions:
- An Archival storage where most files have a write-once policy, with non-significant updates, but with sporadic data deletion.
This distribution is called dist_archival
- A Personal Files storage where some files are updated and deleted frequently and the I/O requests latency is expected to be lower
than the one found in archival storages. This distributions is called dist_personalfiles
- A High Performance storage where most files are updated and deleted frequently and I/O latency is expected to be as minimal as possible.
This distribution is called dist_highperf
All these distributions are available and can be simulated by DEDISbench with the option -g <distribution name> or it is possible to input a
distribution file.
NOTE: by default DEDISbench uses the Personal Files Storage distribution.
Duplicate distribution generator: DEDISgen
The binary DEDISgen generates an output file that describes the distribution of duplicates found
at a specific folder and subfolders or disk device. This program can generate (if specified in the arguments)
an output distribution file (with option -o) that can be consumed by DEDISbench and used for simulating that duplicate distribution.
With DEDISgen we also pack DEDISgenutils that may be needed for performing more complex analysis of duplicated data.
For instance, if several datasets must be analysed separately and the merged toguether for generating the output distribution file for
DEDISbench it is possible to use these tools as explained below.
Instalation.
The only lib requires should be libc6-dev and libdb-dev libssl-dev
Run ./configure, make and make install in DEDISbench folder
Running the Benchmark
Required flags
-p or -n<value> Peak or Nominal Load with throughput rate of N operations per
second. If mixed benchmark of read and writes is defined then use -nr<value> and -nw<value>
for nominal rate of reads and writes respectively.
-w or -r or -m Write or Read Benchmark or a mix of write and read operations.
-t<value> or -s<value> Benchmark duration (-t) in Minutes or amount of data to write (-s) in MB
Optional Flags
-a<value> Access pattern for I/O operations: 0-Sequential, 1-Random Uniform, 2-TPCC random
default:2
-c<value> Number of concurrent processes default:4. Each process has an
independent file associated or a common device if -i flag is used
-f<value> Size of the file of each process in MB. If -i flag is used, this parameter defines
the size of the raw device. default:2048 MB
-d<value> choose the directory where DEDISbench writes data
-i<value> Processes write/read from a raw device instead of having an independent file.
If more than one process is defined, each process is assingned with an independet of the raw device,
dependent on the raw device size. By default, if this flag is not set each process writes to an individual file.
-l I/O latency results are written to a log file to extract additional
statistics. Each process writes these values in a file called result(processid) and
each line, corresponds to a single I/O operations and presents:
(latency of I/O operation in microseconds) (current time in seconds) \n
-e or -y Enable or disable the population of process files before running
DEDISbench. test(processid files in DEDISbench directory are the files used by
each process. For write testes and by default, empty files are created,
for read tests and by default the files are populated with content.
It is possible to enable this feature for write tests and acess rewrite
operations or to disable and use a custom file that respects the namings.
-b<value> Size of blocks for I/O operations in Bytes default: 4096
-g<value> Input File with duplicate distribution default:internal file called dist_personalfiles
DEDISbench can simulate three real distributions extracted respectively from an Archival, Personal Files and High Performance Storage\n");
For choosing these distributions the <value> must be dist_archival, dist_personalfiles or dist_highperf respectively.
The input file details the amount of blocks with a certain number of duplicates
and the format is: <number_duplicates> <number_blocks>
See below for more info for customizing distribution files and above for info on the default distributions.
-v<value> Seed for random generator default:current time. Usefull for repeating
-o<value> generate an output log with the distribution actually generated by the benchmark
-k<value> generate an output log with the access pattern generated by the benchmark
-z Disable the destruction of process temporary files generated by the benchmark
-j<value> Write to file path the output of DEDISbench. This feature also writes two additional files with the same name
as given in <value> and a snaplat and snapthr suffix that shows the throughput and lattency average values
for 30 seconds intervals
-o The output of DEDIS gives the exact number of unique and duplicate blocks written.
(This option requires using additional shared memory)
-x<value> I/O Operations synchronization: 0-without fsync and O_DIRECT, 1-O_DIRECT, 2-fsync, 3-both. default: 0
tests with the same settings.
Examples:
run write benchmark in peak mode for 5 minutes
./DEDISbench -p -w -t5
run read benchmark in nominal mode (300 reads/second) for 10 minutes
./DEDISbench -n300 -r -t10
run write benchmark in peak mode for 5 minutes. Enable the population of process files
before actually running the benchmark load and enable log output for results.
Use files with 4GB for each process and 8 processes.
./DEDISbench -p -w -t5 -e -l -f4096 -c8
run read benchmark in nominal mode (300 reads/second) for 8 minutes.
Disable the pre-population of files. Carefull because files test(processid) must be
present and have content or no content will be available for reading resulting in
I/O errors.
./DEDISbench -n300 -r -t8 -d
run 8 minutes mixed benchmark with a nominal load (100reads/s and 50 writes/s) in a raw device with 4GB.
./DEDISbench -m -nr100 -nw50 -t8 -ipath/to/rawdevice
Running the generator DEDISgen:
-f or -d (Find duplicates in folder -f or in a Disk Device -d)
-p<value> (Path for the folder or disk device)
-o<value> (Path for the output distribution file.
This is only necessary if distribution file of duplicates is going to be generated)
Optional Parameters
-z<value> (Path for the folder where duplicates databases are created default: ./gendbs/)
- duplicate db has the database with duplicates information hash->number_dups
-b<value> (Size of blocks for I/O operations in Bytes default: 4096)
-o<value> (Path for the output distribution file.
This is only necessary if distribution file of duplicates is going to be generated)
Examples
//generate duplicate distribution for files in folder /dir/duplicates/ and its subfolders
./DEDISgen -f -p/dir/duplicates/
//generate duplicate distribution for device /path/device. Use a block size of 8192 Bytes (8KB)
./DEDISgen -d -p/path/device -b8192
Deduplication distribution FILE:
This file describes the amount of blocks with a specific number of duplicates and has the following format:
<number duplicates> <number blocks>
Example
file:
0 1000
1 500
5 10
There are 1000 blocks without any duplicate, 500 blocks with one duplicate and 10 blocks with 5 duplicates
****** NEW DEDISgenutils ****************
UTILS for DEDISgen:
-m and -o can be used together or individually if only one operation is intended
-m<db1 folder path> <db2 folder path> Merges hashes and duplicates of db1 into db2.
-o<db folder path> <outputfile> Path for generating the output distribution file of duplicates db.
WARNING: these are supposed to be used in duplicates databases generated by dedisgen that map (hash->number-duplicates).
Not intended for distribution dbs that generate the distribution outputs. These databases are on the
database folder, which can be specified by the user as parameter with the option -z and by default are on ./gendbs/duplicatedb
Examples:
- Merging the duplicates found separately, with DEDISgen, for dataset 1 (in db1) and 2 (in dbw) db1 with db2
//Merge the two duplicates databases. Merge db1 into db2 - WARNING db2 info will be updated!!!!
./DEDISgenutils -mdb1 db2
//generate output with distribution for db1+db2
./DEDISgenutils -odb2 output_dist
For more information please contact:
jtpaulo at di.uminho.pt