Google summer of code notes
Performance Co-Pilot timeseries are series of time-stamped values gathered from hosts making performance
data available. With pmseries query, users can obtain various information about performance
metrics in real time or from historical data. The existing pmseries query language implementation is
maturing. However, there are still some important query grammar and functions needed to be added. This
proposal aims to extend some time series functions in the libpcp_web
module, including extending the
grammar to support scalar operands in expressions, extending the cross-domain operations, implementing
new statistical functions, and implementing new functions for time series sample matching.
Pull Request
- Time domain operation for max PR #1611
- libpcp_web: add pmseries time domain functions PR #1623
- libpcp_web: solve memory leak PR #1628
- libpcp_web: modify libpcp_web make file PR #1630
- libpcp_web: Pmseries nth percentile PR #1637
- libpcp_web: pmseries language extensions with topk functions PR #1638
- pmseries: scalar multiplication PR #1681
Multi-hosts monitoring set up
Follow Record metrics from a remote system to set up the remote system (can be a virtual machine on the same computer).
On your local host,
-
use
sudo vi /etc/hosts
to edit the hosts on your machine and add the new system mapping to the file. - go to
/etc/pcp/pmlogger
,-
The control
file
contains one line per host to be logged. -
The file
control.d
stores the config of each host
-
- Copy the local file and give it a name for your remote system. set n to primary option, which means this remote system is not primary, and your local machine should be primary system.
The arguments for the hosts are
-r
: creates the local config-T
: terminating cycle-c
: config file for pmlogger-v
: volume size. Once the archieve meets the set volume, a new archieve will be created.
-
Use
sudo systemctl restart pmlogger
to restart pmlogger, and usesudo systemctl status pmlogger
to check its status. - Go to the dir
cd /var/log/pcp/pmlogger/shiyao_fedora
where stores the config of your second system to see details.- Files 20220726.15.27.* have all the metrics sent from redis. Once this file meets the set volume (the -v option), a new one with the same prefix will be created, and new data will be stored to the new files. and the old files will be compressed.
- File 20220726.15.27.index is a lookup table for the previous files. It’s used for a quicker data query.
- File 20220726.15.27.meta is the metadata from redis, it stores metric names, descriptors, labels and so on.
- File Latest is a PCP archive folio
- File pmlogger.log is created by -r option. It stores the query frequency for each metric.
-
We can use
sudo systemctl stop pmproxy
to stop the new messages from redis, and usesudo systemctl restart pmproxy
to restart the pmproxy and allow new messages from redis server. - We can use
pmseries -a 805f4cdf368337dd564c365909543cc86a39275e
to see where the data come from (local or remote).
Early June (Week 1 & Week2)
- Set up testing environment
- Use the following commands to check which .so is linked
which pmseries ls -l /usr/local/lib ls -l /usr/lib/libpcp* ldd /usr/local/bin/pmseries
and use
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/libpcp_web.so.1 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/libpcp_web.so.1
to link to the correct .so.1 file
- Use the following commands to check which .so is linked
- finsed time domain operations for
min()
andmax()
functionsNotes
-
Use
sudo ./check -g pmseries
to run all pmseries tests, or usesudo ./check 1886
to run a specific test. - If a function is implemented, remember to
- run tests,
- update man pages, and
- run valgrind –leak-check=full
- To update man pages, go to
/pcp/man/man1/pmseries.1
file and runman ./pmseries.1
once new function descriptions are added - Use
gdb --args pmseries "..."
to debug
Late June (Week 3 & Week4)
- Implemented time domain operation:
sum_sample()
andavg_sample()
- Implemented time domain and instance domain operations for standard deviation, i.e.
stdev_inst()
andstdev_sample()
.Notes
- Remember to update
np->value_set.series_values[i].series_desc.type np->value_set.series_values[i].series_desc.semantics np->value_set.series_values[i].series_desc.units
if any operation is done to the original redis data. Checkout pmSemStr and pmUnitsStr for more information.
- Try to test on multi-host environment: Record metrics from a remote system
- Remember to update
Early July (Week 5 & 6)
- Implemented operations for
topk_inst()
andtopk_sample()
.Notes
- use
./new
to create a new qa tests under the qa folder.
Late July (Week 7 & 8)
- Implemented
nth_percentile_inst()
andnth_percentile_inst()
.Notes
- HdrHistogram_c provides examples for histogram function. HDR stands for high dynamic range.
- bpftrace provides examples of histogram output.
Early August (Week 9)
- Implemented scalar multiplication and its tests
- Note: does not have overflow handling. Current method is to report error when overflow occurs.
Early September (Week 10 & 11)
- Understood callback for creating histogram bar chart.
- pcp/src/include/pcp/pmwebapi.h: added a structure to store histogram values.
- pcp/src/pmseries/pmseries.c: added call back for on_histogram_value.
Notes
- Try to use HdrHistogram and it can be one of the vendors for pcp.
Late September (12 & 13)
Special notes
What if qa fails
Before running ./check ...
, run pmseries --load "{source.path: \"PATH/pcp/qa/archives/proc\"}"
.
- If unable to connect to redis server with the error msg ‘Segmentation fault (core dumped)’, try to run
sudo make clean
and rebuild the project. This should solve the segmentation fault problem.
Early October (14 & 15)
- Implemented histogram() function: created a new callback structure to send histogram data.
- Understand the timeseries sample matching problem, and create graphs to see metric data trends.
Special notes
Change time period of a metric:
- go to /var/lib/pcp/config/pmlogger/config.default
- add a section for the metric with new period, such as
log advisory on 2sec { disk.all.read }
By doing so, the report period of
disk.all.read
will be change to 2 seconds instead of 10 seconds.
Late October (16 & 17)
- Designed algorithm to do upsampling and interpolations of vector operands to match timeseries samples with other vector operands.
Special notes
- Figured out the usages of two
timing_t
in bothnode_t
andseries_t
structures.timing_t
innode_t
is the time periods (delta\interval
in the pcp time series query expression) for the series root node.timing_t
inseries_t
is the time intervals for each child roots.
TODO: After GSOC
- Keep implementing the Timeseries Sample Matching Function.
- Solve some memory leak problems.