epmt Developer and Advanced Usage Guide¶
This document contains detailed information about epmt configuration, data collection,
database submission, analysis, metrics, debugging, and CI/CD infrastructure.
For installation and quick-start instructions, see README.md.
Table of Contents¶
- Configuration
- Environment Variables
- Getting Current Configuration Information
- The Three Modes of
epmt - The First Mode, Data Collection
- The Second Mode, Data Submission
- The Third Mode, Data Analysis and Visualization
- Debugging
- Performance Metrics Data Dictionary
- Addition of new metrics
- CI/CD Workflows and Caching
- Workflows
- Caches and Invalidation
- Cache Invalidation Gap vs.
make - Forcing a Rebuild
- Weekly Pre-warming
- Troubleshooting
- Virtual Environments:
- Common Error Messages
Configuration¶
epmt houses default settings (epmt_default_settings.py), user settings (settings.py),
and a submodule (epmt_settings.py) that appends the user settings to the end of the
default settings, overriding them.
Custom user settings should only be included by developers who understand epmt's requirements.
The default settings should first be evaluated, they look a little bit like the following:
$ cat src/epmt/epmt_default_settings.py
...
papiex_options = "COLLATED_TSV"
epmt_output_prefix = "/tmp/epmt/"
input_pattern = "*-papiex*.[ct]sv"
...
db_params = {
'url': 'sqlite:///:memory:',
'echo': False,
}
...
Environment Variables¶
The following variables replace, at run-time, the values in the db_params dictionary found in settings.py:
EPMT_DB_PROVIDER
EPMT_DB_USER
EPMT_DB_PASSWORD
EPMT_DB_HOST
EPMT_DB_DBNAME
EPMT_DB_FILENAME
Getting Current Configuration Information¶
You can examine all current settings by passing the --help option:
$ ./epmt --help
usage: epmt [-n] [-d] [-h] [-a] [--drop]
[epmt_cmd] [epmt_cmd_args [epmt_cmd_args ...]]
positional arguments:
epmt_cmd start, run, stop, submit, dump
epmt_cmd_args Additional arguments, command dependent
optional arguments:
-n, --dry-run Don't touch the database
-v, --verbose Increase level of verbosity/debug
-h, --help Show this help message and exit
-a, --auto Do start/stop when running
--drop Drop all tables/data and recreate before importing
settings.py (overridden by below env. vars):
db_params {'host': 'localhost', 'password': 'example', 'user': 'postgres', 'dbname': 'EPMT', 'provider': 'postgres'}
debug False
input_pattern *-papiex-[0-9]*-[0-9]*.csv
install_prefix ../papiex-oss/papiex-oss-install/
papiex_options PERF_COUNT_SW_CPU_CLOCK
epmt_output_prefix /tmp/epmt/
environment variables (overrides settings.py):
The Three Modes of epmt¶
There are three modes to epmt usage; data collection, data submission, and data analysis. Each
has different dependencies:
- Collection requires Python 2.6 or higher, and compiled
papiexlibraries for process tagging - Submission requires Python packages for SQL and database interactions (
sqlalchemy,sqlite,postgres) - Analysis requires
jupyter,iPython, and additionallyepmt-dashandplotlyfor dashboard-style analytics
The First Mode, Data Collection¶
Collecting Performance Data¶
Assuming you have epmt installed and in your path, let's modify a job file:
$ cat my_job.sh
#!/bin/bash
# Example job script for Torque or SLURM
./compute_the_world --debug
This becomes:
$ cat my_job_epmt.sh
#!/bin/bash
# Example job script for Torque or SLURM
epmt start
epmt run ./compute_the_world --debug
epmt stop
Or more succinctly by automating the start/stop cycle with the --auto or -a flag:
$ cat my_job_epmt2.sh
#!/bin/bash
# Example job script for Torque or SLURM
epmt -a run ./compute_the_world --debug
But usually we want to run more than one executable. We could have any number of run statements:
$ cat my_job_epmt3.sh
#!/bin/bash
# Example job script for Torque or SLURM
epmt start
epmt run ./initialize_the_world --random
epmt run ./compute_the_world
epmt run ./postprocess_the_world
epmt stop
Let's skip all the markup and do it with only environment variables. epmt
provides the configuration to export to the environment through the source
command. This is intended for use in a job file and should be evaluated by the
running shell, be it bash or csh. epmt source prints the required
environment variables in Bash format unless either the SHELL or _
environment variable ends in csh.
Note the unset of LD_PRELOAD before stop! This prevents the data
collection routine from running on epmt stop itself.
$ cat my_job_bash.sh
#!/bin/bash
# Example job script for Torque or SLURM
### Preamble, collect job metadata and monitor all processes/threads
epmt start
eval `epmt source`
./initialize_the_world --random
./compute_the_world
./postprocess_the_world
# Postamble, disable monitoring and collect job metadata
unset LD_PRELOAD
epmt stop
Here's an example for csh, when run interactively,
$ /bin/csh
> epmt -j1 source
setenv PAPIEX_OPTIONS PERF_COUNT_SW_CPU_CLOCK;
setenv PAPIEX_OUTPUT /tmp/epmt/1/;
setenv LD_PRELOAD /Users/phil/Work/GFDL/epmt.git/../papiex-oss/papiex-oss-install/lib/libpapiex.so:/Users/phil/Work/GFDL/epmt.git/../papiex-oss/papiex-oss-install/lib/libpapi.so:/Users/phil/Work/GFDL/epmt.git/../papiex-oss/papiex-oss-install/lib/libpfm.so:/Users/phil/Work/GFDL/epmt.git/../papiex-oss/papiex-oss-install/lib/libmonitor.so
Data Collection with SLURM epilog and prolog¶
Using configured prolog and epilogs with SLURM tasks allows one to skip job
instrumentation entirely, except for job tags (EPMT_JOB_TAGS) and process
tags (PAPIEX_TAGS). These are configured in slurm.conf for jobs
submitted with sbatch but can be tested on the command line when using srun.
The above Csh job is equivalent to the below sequence using a prolog and epilog, with the exception of the trailing submit statement.
$ SLURM_TASK_SCRIPT_DIR=${EPMT_PREFIX}/epmt-install/slurm
$ srun -n1 \\
--task-prolog=${SLURM_TASK_SCRIPT_DIR}/slurm_task_prolog_epmt.sh \\
--task-epilog=${SLURM_TASK_SCRIPT_DIR}/slurm_task_epilog_epmt.sh
$ sleep 1
For this job to work using sbatch, make the following modifications in
slurm.conf, substituting the appropriate path for $EPMT_PREFIX:
SLURM_TASK_SCRIPT_DIR=${EPMT_PREFIX}/epmt-install/slurm
TaskProlog=${SLURM_TASK_SCRIPT_DIR}/slurm_task_prolog_epmt.sh
TaskEpilog=${SLURM_TASK_SCRIPT_DIR}/slurm_task_epilog_epmt.sh
If this fails, the papiex installation is likely either missing or
misconfigured in settings.py. The -a flag tells epmt to treat
this run as an entire job.
The Second Mode, Data Submission¶
After collecting the data, jobs (groups of processes) are imported into the
database with the submit command. This command takes arguments in the form of
directories or tar files that must contain a job_metadata file.
Normal operation is to submit one or more directories:
epmt submit <dir1/> [...]
One can also submit a list of compressed tar files:
epmt submit <compressed_dir_file.*z> [...]
There is also a mode where the current environment is used to determine where to find the data.
epmt submit
Manual Submission Example¶
We can submit our previous job to the database defined in settings.py by
running the epmt submit command with the directory returned by stage (found
in the location set by settings.py epmt_output_prefix):
$ epmt -v submit /tmp/epmt/1/
INFO:epmt_cmds:submit_to_db(/tmp/epmt/1/,*-papiex-[0-9]*-[0-9]*.csv,False)
INFO:epmt_cmds:Unpickling from /tmp/epmt/1/job_metadata
INFO:epmt_cmds:1 files to submit
INFO:epmt_cmds:1 hosts found: ['linuxkit-025000000001-']
INFO:epmt_cmds:host linuxkit-025000000001-: 1 files to import
INFO:epmt_job:Binding to DB: {'filename': ':memory:', 'provider': 'sqlite'}
INFO:epmt_job:Generating mapping from schema...
INFO:epmt_job:Processing job id 1
INFO:epmt_job:Creating user root
INFO:epmt_job:Creating job 1
INFO:epmt_job:Creating host linuxkit-025000000001-
INFO:epmt_job:Creating metricname usertime
INFO:epmt_job:Creating metricname systemtime
INFO:epmt_job:Creating metricname rssmax
INFO:epmt_job:Creating metricname minflt
INFO:epmt_job:Creating metricname majflt
INFO:epmt_job:Creating metricname inblock
INFO:epmt_job:Creating metricname outblock
INFO:epmt_job:Creating metricname vol_ctxsw
INFO:epmt_job:Creating metricname invol_ctxsw
INFO:epmt_job:Creating metricname num_threads
INFO:epmt_job:Creating metricname starttime
INFO:epmt_job:Creating metricname processor
INFO:epmt_job:Creating metricname delayacct_blkio_time
INFO:epmt_job:Creating metricname guest_time
INFO:epmt_job:Creating metricname rchar
INFO:epmt_job:Creating metricname wchar
INFO:epmt_job:Creating metricname syscr
INFO:epmt_job:Creating metricname syscw
INFO:epmt_job:Creating metricname read_bytes
INFO:epmt_job:Creating metricname write_bytes
INFO:epmt_job:Creating metricname cancelled_write_bytes
INFO:epmt_job:Creating metricname time_oncpu
INFO:epmt_job:Creating metricname time_waiting
INFO:epmt_job:Creating metricname timeslices
INFO:epmt_job:Creating metricname rdtsc_duration
INFO:epmt_job:Creating metricname PERF_COUNT_SW_CPU_CLOCK
INFO:epmt_job:Adding 1 processes to job
INFO:epmt_job:Earliest process start: 2019-03-06 15:36:56.948350
INFO:epmt_job:Latest process end: 2019-03-06 15:37:06.996065
INFO:epmt_job:Computed duration of job: 10047715.000000 us, 0.17 m
INFO:epmt_job:Staged import of 1 processes, 1 threads
INFO:epmt_job:Staged import took 0:00:00.189151, 5.286781 processes per second
INFO:epmt_cmds:Committed job 1 to database: Job[u'1']
Compressed Directory Submission Example¶
This might happen at the end of the day via a cron job:
epmt submit <dir>/*tgz
Internal-batch Job Submission Example¶
These commands could be part of every users job, or in the batch systems configurable preambles/postambles.
$ cat my_job.sh
#!/bin/bash
# Example job script for Torque or SLURM
echo "$PBS_JOBID or $SLURM_JOBID"
epmt start
epmt run ./compute_the_world --debug
epmt stop
epmt submit
The start/stop cycle can be removed with the --auto or -a flag, which performs start and stop for you:
epmt -a run ./debug_the_world --outliers
epmt submit
Data From Current Session Submission Example¶
If not inside of a batch environment, epmt will attempt to fake-and-bake a job id. This is useful
when performing interactive runs. You may not be able to submit these jobs to a shared database due to
constraints on job ID uniqueness, since the session ID is not guaranteed to be unique across reboots,
much less other systems. However, this use case is perfectly acceptable when using a private database:
$ epmt start
WARNING:epmt_cmds:JOB_ID unset: Using session id 6948 as JOB_ID
WARNING:epmt_cmds:JOB_NAME unset: Using job id 6948 as JOB_NAME
WARNING:epmt_cmds:JOB_SCRIPTNAME unset: Using process name 6948 as JOB_SCRIPTNAME
WARNING:epmt_cmds:JOB_USER unset: Using username phil as JOB_USE$ epmt run ./debug_the_world --outliers
$ epmt stop
$ epmt submit
The Third Mode, Data Analysis and Visualization¶
epmt uses an IPython notebook data analytics environment. Starting the
Jupyter notebook is easy from the epmt notebook command:
$ epmt notebook
[I 15:39:24.236 NotebookApp] Serving notebooks from local directory: /home/chris/Documents/playground/MM/build/epmt
[I 15:39:24.236 NotebookApp] The Jupyter Notebook is running at:
[I 15:39:24.236 NotebookApp] http://localhost:8888/?token=9c7529e19e12cb8121d66ff471e96fdd3056f6acc4480274
[I 15:39:24.236 NotebookApp] or http://127.0.0.1:8888/?token=9c7529e19e12cb8121d66ff471e96fdd3056f6acc4480274
[I 15:39:24.236 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 15:39:24.263 NotebookApp]
To access the notebook, open this file in a browser:
file:///home/chris/.local/share/jupyter/runtime/nbserver-18690-open.html
Or copy and paste one of these URLs:
http://localhost:8888/?token=9c7529e19e12cb8121d66ff471e96fdd3056f6acc4480274
or http://127.0.0.1:8888/?token=9c7529e19e12cb8121d66ff471e96fdd3056f6acc4480274
The notebook command supports passing parameters to jupyter, such as host IP for sharing access with machines on the local network, notebook token, and notebook password:
epmt notebook -- --ip 0.0.0.0 --NotebookApp.token='thisisatoken' --NotebookApp.password='hereisasupersecurepassword'
Debugging¶
epmt can be passed both -n (dry-run) and -v (verbosity) to help
with debugging. Add more -v flags to increase verbosity level (-vvv).
epmt -v start
Or to attempt a submit without touching the database:
epmt -vv submit -n /dir/to/jobdata
Also, one can decode and dump the job_metadata file in a dir or compressed dir.
$ epmt dump ~/Downloads/yrs05-25.20190221/CM4_piControl_C_atmos_00050101.papiex.gfdl.19712961.tgz
exp_component atmos
exp_jobname CM4_piControl_C_atmos_00050101
exp_name CM4_piControl_C
exp_oname 00050101
job_el_env_changes {}
job_el_env_changes_len 0
job_el_from_batch []
job_el_status 0
job_el_stop 2019-02-20 22:13:23.131187
job_pl_env {'LANG': 'en_US', 'PBS_QUEUE': 'batch', 'SHELL': '/bin/csh', 'PBS_ENVIRONMENT': 'PBS_BATCH', 'PAPIEX_TAGS': 'atmos', 'SHLVL': '3', 'PBS_WALLTIME': '216000', 'MOAB_NODELIST': 'pp057.princeton.rdhpcs.noaa.gov', 'PBS_VERSION': 'TORQUE-6.0.2', 'PAPIEX_OUTPUT': '/vftmp/Foo.Bar/pbs20345339/papiex', 'LOADEDMODULES': '', 'LC_TIME': 'C', 'MACHTYPE': 'x86_64', 'PAPIEX_OPTIONS': 'PERF_COUNT_SW_CPU_CLOCK', 'MOAB_GROUP': 'f'}
job_pl_env_len 81
job_pl_from_batch []
job_pl_groupnames ['f', 'f']
job_pl_hostname pp057
job_pl_id 20345339.moab01.princeton.rdhpcs.noaa.gov
job_pl_jobname CM4_piControl_C_atmos_00050101
job_pl_scriptname CM4_piControl_C_atmos_00050101
job_pl_start 2019-02-20 19:58:41.274267
job_pl_submit 2019-02-20 19:58:41.274463
job_pl_username Foo.Bar
Performance Metrics Data Dictionary¶
epmt collects data both from the job runtime and the applications run in that
environment. See the src/epmt/models/ directory for fixed data stored related to each
object. Metric data is stored differently; the data collector's data dictionary
can be found in papiex-oss/README.md. At the time of this writing, it looked
like this:
| Key | Scope | Description |
|---|---|---|
| 1. tags | Process | User specified tags for this executable |
| 2. hostname | Process | hostname |
| 3. exename | Process | Name of the application, usually argv[0] |
| 4. path | Process | Path to the application |
| 5. args | Process | All arguments to exe excluding argv[0] |
| 6. exitcode | Process | Exit code |
| 7. exitsignal | Process | Exited due to a signal |
| 8. pid | Process | Process id |
| 9. generation | Process | Incremented after every exec() or PID wrap |
| 10. ppid | Process | Parent process id |
| 11. pgid | Process | Process group id |
| 12. sid | Process | Process session id |
| 13. numtids | Process | Number of threads caught by instrumentation |
| 14. numranks | Process | Number of MPI ranks detected |
| 15. tid | Process | Thread id |
| 16. mpirank | Thread | MPI rank |
| 17. start | Process | Microsecond timestamp at start |
| 18. end | Process | Microsecond timestamp at end |
| 19. usertime | Thread | Microsecond user time |
| 20. systemtime | Thread | Microsecond system time |
| 21. rssmax | Thread | Kb max resident set size |
| 22. minflt | Thread | Minor faults (TLB misses/new page frames) |
| 23. majflt | Thread | Major page faults (requiring I/O) |
| 24. inblock | Thread | 512B blocks read from I/O |
| 25. outblock | Thread | 512B blocks written to I/O |
| 26. vol_ctxsw | Thread | Voluntary context switches (yields) |
| 27. invol_ctxsw | Thread | Involuntary context switches (preemptions) |
| 28. cminflt | Process | minflt (20) for all wait()ed children |
| 29. cmajflt | Thread | majflt (21) for all wait()ed children |
| 30. cutime | Process | utime (17) for all wait()ed children |
| 31. cstime | Thread | stime (18) for all wait()ed children |
| 32. num_threads | Process | Threads in process at finish |
| 33. starttime | Thread | Timestamp in jiffies after boot thread was started |
| 34. processor | Thread | CPU this thread last ran on |
| 35. delayacct_blkio_time | Thread | Jiffies process blocked in D state on I/O device |
| 36. guest_time | Thread | Jiffies running a virtual CPU for a guest OS |
| 37. rchar | Thread | Bytes read via syscall (maybe from cache not dev I/O) |
| 38. wchar | Thread | Bytes written via syscall (maybe to cache not dev I/O) |
| 39. syscr | Thread | Read syscalls |
| 40. syscw | Thread | Write syscalls |
| 41. read_bytes | Thread | Bytes read from I/O device |
| 42. write_bytes | Thread | Bytes written to I/O device |
| 43. cancelled_write_bytes | Thread | Bytes discarded by truncation |
| 44. time_oncpu | Thread | Nanoseconds spent running |
| 45. time_waiting | Thread | Nanoseconds runnable but waiting |
| 46. timeslices | Thread | Number of run periods on CPU |
| 47. rdtsc_duration | Thread | If PAPI, real time cycle duration of thread |
| * | Thread | PAPI metrics |
Addition of new metrics¶
Additional metrics can be configured either in two ways:
- The
papiex_optionsstring insettings.pyif usingepmt runorepmt source - The value of the
PAPIEX_OPTIONSenvironment variable if usingLD_PRELOADdirectly.
The value of these should be a comma separated string:
export PAPIEX_OPTIONS="PERF_COUNT_SW_CPU_CLOCK,PAPI_CYCLES"
To list available and functioning metrics, use one of the included command line tools:
papi_availandpapi_native_avail(viapapi)check_eventsandshowevtinfo(vialibpfm)perf list(linux)
The PERF_COUNT_SW_* events should work on any system that has the proper
/proc/sys/kernel/perf_event_paranoid setting.
One should verify the functionality of the metric using the papi_command_line tool:
papi_command_line PERF_COUNT_SW_CPU_CLOCK
papi_command_line CYCLES
CI/CD Workflows and Caching¶
epmt's GitHub Actions CI is split into focused workflows that use
actions/cache to avoid rebuilding expensive artifacts on every pull request
run.
Workflows¶
| Workflow | Trigger | Purpose |
|---|---|---|
docker_build_test.yml |
push to main, pull_request |
Full build + test pipeline; restores cached artifacts before building |
slurm_image_build.yml |
Weekly (Mon 06:00 UTC), workflow_dispatch |
Builds the slurm-cluster Docker image from source and saves it to cache |
weekly_tarball_build.yml |
Weekly (Mon 06:00 UTC), workflow_dispatch |
Compiles papiex and downloads epmt-dash, then saves both to cache |
build_and_test_epmt.yml |
push to main, pull_request |
Source-tree unit tests (no Docker) |
Caches and Invalidation¶
Each cache step is keyed on its build prerequisites — analogous to how make
uses file modification times to decide whether a target must be rebuilt. Changing
a prerequisite produces a new cache key, causing a cache miss and forcing a fresh
build.
| Cache | Cache key components | Invalidation trigger | Notes |
|---|---|---|---|
epmt-build Docker image |
OS_TARGET + PYTHON_VERSION +SQLITE_VERSION +hashFiles(Dockerfile, requirements.txt.py3) |
Edit the Dockerfile or requirements file, or bump any version variable | Fully content-hash based — closest analogy to make |
papiex compiled tarball |
PAPIEX_VERSION +OS_TARGET |
Bump PAPIEX_VERSION in docker_build_test.yml,weekly_tarball_build.yml, and Makefile |
Version-gated; relies on papiex using immutable release tags |
test-release Docker image |
OS_TARGET + PYTHON_VERSION +SQLITE_VERSION +hashFiles(Dockerfile, requirements.txt.py3) +github.sha |
Always rebuilds (by design) —restore-keys prefix reusesunchanged early layers via --cache-from |
Image content changes every commit; layer reuse keeps it fast |
slurm-cluster Docker image |
IMAGE_TAG +SLURM_TAG +SLURM_CLUSTER_TAG |
Bump any of the three version variables in docker_build_test.ymland slurm_image_build.yml |
Version-string based; upstream tag mutations without a version bump won't invalidate |
epmt-dash UI directory |
EPMT_DASH_SRC_BRANCH |
Change EPMT_DASH_SRC_BRANCHin docker_build_test.yml,weekly_tarball_build.yml,and Makefile |
Branch-name based; new commits to same branch don't invalidate — weekly workflow bounds staleness to ≤1 week |
Cache Invalidation Gap vs. make¶
make detects prerequisite changes via file modification times regardless of
version numbers. The GitHub Actions caches above use version strings or content
hashes instead, so:
epmt-buildis fullymake-like — changing the Dockerfile or requirements file immediately produces a different hash and forces a rebuild.papiexandslurm-clusterrequire a deliberate version bump in the workflow env block to trigger a rebuild. Remote source changes without a version bump go undetected until the next weekly build.epmt-dashrequires either a branch rename or waiting for the weekly build. The weeklyweekly_tarball_build.ymlworkflow always fetches fresh content, bounding maximum staleness to one week.
Forcing a Rebuild¶
To force any individual cache to rebuild, bump the relevant version variable
in the env: section of both the weekly workflow and docker_build_test.yml,
and update the Makefile to match:
# docker_build_test.yml (and weekly_tarball_build.yml)
env:
PAPIEX_VERSION: "2.3.16" # bump to force papiex rebuild
EPMT_DASH_SRC_BRANCH: "new-branch" # change to force epmt-dash rebuild
IMAGE_TAG: "25.05.4" # bump to force slurm-cluster rebuild
The epmt-build cache is invalidated automatically whenever
Dockerfiles/Dockerfile.rocky-8-epmt-build or requirements.txt.py3 is
modified — no manual version bump is needed.
Weekly Pre-warming¶
slurm_image_build.yml and weekly_tarball_build.yml run every Monday
morning to pre-warm their respective caches before the working week begins.
docker_build_test.yml then restores those caches on pull request and push
runs, skipping the expensive Docker compile steps. If the cache is missing (first
run, eviction, or new key), docker_build_test.yml falls back to building the
artifact inline so the pipeline never silently skips a required build step.
Troubleshooting¶
Virtual Environments¶
Note that often in virtual environments, hardware counters are not often available in the VM.
Common Error Messages¶
version GLIBC_x.xx not found¶
The collector library may not have been built for the current environment or the release OS version does not match the current environment.