This project analyzes open source projects for malware.
Due to the high demand of the community, we decide to open source the code as it is now, to allow collaboration.The majority of the code is updated until May 2019, which indicates that some components may not work any more.Especially the components that depends on external tools (e.g. Sysdig, Airflow) or APIs (e.g. Npm).
We are actively working on the testing and improvements. Please find the todo list here.For how to run commands, please refer to howto section. For how to deploy on machines, please refer to deploy instructions. For how to request access to the supply chain attack samples, please refer to request instructions
This repository is open sourced under MIT license. If you find this repository helpful, please cite our paper:
@inproceedings{duan2021measuring,
title={Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages},
author={Duan, Ruian and Alrawi, Omar and Kasturi, Ranjita Pai and Elder, Ryan and Saltaformaggio, Brendan and Lee, Wenke},
booktitle = {28th Annual Network and Distributed System Security Symposium, {NDSS}},
month = Feb,
year = {2021},
url = {https://www.ndss-symposium.org/wp-content/uploads/ndss2021_1B-1_23055_paper.pdf}
}
sudo ./setup.sh
setup.sh
and figure out their equivalenciessudo docker build -t maloss .
sudo docker build -t maloss . --no-cache
sudo docker run -it --rm -v $(pwd):/code maloss /bin/bash
cd /code
pip install -r src/requirements.txt --user
sudo apt-get install -yqq curl php git ruby-full rubygems-integration nuget python python-pip python3-pip npm jq strace
sudo ./src/install_dep.sh
main/config
from main/config.tmpl
sudo docker build -t maloss .
sudo docker build -t maloss . --no-cache
sudo docker run -it --rm --cap-add=SYS_PTRACE -v /tmp/result:/home/maloss/result -v /tmp/metadata:/home/maloss/metadata maloss /bin/bash
main/config
from main/config.tmpl
cd main && sudo docker-compose --compatibility -f docker-compose-master.yml up -d
python detector.py install -i ../data/pypi.csv
QUEUING = Celery
line in main/config
, and then the jobs should be running locally and sequentially.main/celery_tasks.py
and the entry point for master it main/detector.py
.python main.py select_pm
python main.py select_pkg ../data/pypi.with_stats.csv ../data/pypi.with_stats.popular.csv -n 10000
python main.py select_pkg ../data/maven.csv ../data/maven.popular.csv -n 10000 -f use_count
python main.py crawl $package_manager $outfile
python main.py crawl $package_manager $outfile -s -p 24
python main.py edit_dist $source -t $target $outfile
python main.py edit_dist ../data/pypi.with_stats.csv ../data/edit_dist/pypi_edist_dist.out -a c_edit_distance_batch -p 16
python main.py edit_dist ../data/pypi.with_stats.popular.csv ../data/edit_dist/pypi_pop_vs_all.out -t ../data/pypi.with_stats.csv -a c_edit_distance_batch -p 16 --pair_outfile ../data/edit_dist/pypi_pop_vs_all.csv
pip download --no-binary :all: --no-deps package
npm pack package
composer require -d ../testdata/php --prefer-source --no-scripts package
gem fetch package
mvn dependency:get -Dartifact=com.google.protobuf:protobuf-java:3.5.1 -Dtransitive=false && cp ~/.m2/repository/com/google/protobuf/protobuf-java/3.5.1/protobuf-java-3.5.1.jar ./
python main.py get_versions ../data/pypi.with_stats.popular.csv ../data/pypi.with_stats.popular.versions.csv -l python -c /data/maloss/info/python
python main.py get_versions ../data/maven.popular.csv ../data/maven.popular.versions.csv -c /data/maloss/info/java -l java
python main.py get_versions ../data/2019.07/pypi.csv ../data/2019.07/pypi.versions.csv -c /data/maloss/info-2019.07/python -l python --max_num -1
python main.py get_versions ../data/2019.07/pypi.csv ../data/2019.07/pypi.versions.csv -c /data/maloss/info-2019.07/python -l python --max_num -1 --with_time
python main.py get_versions ../data/2019.07/pypi.csv ../data/2019.07/pypi.versions.csv -c /data/maloss/info-2019.07/python -l python --max_num 100 --min_gap_days 1
python main.py get_author ../data/pypi.with_stats.popular.csv ../data/pypi.with_stats.with_author.popular.csv -l python -c /data/maloss/info/python
python main.py get_dep -l python -n protobuf -c ../testdata
python main.py get_dep -l python -n scrapy -c ../testdata
python main.py get_dep -l javascript -n eslint -c ../testdata
python main.py get_dep -l ruby -n protobuf -c ../testdata
python main.py get_dep -l php -n designsecurity/progpilot -c ../testdata
python main.py get_dep -l java -n com.google.protobuf/protobuf-java -c ../testdata
python main.py get_stats ../malware/npmjs-mal-pkgs.june2019.txt ../malware/npmjs-mal-pkgs.june2019.with_stats.txt.new -m npmjs
python main.py build_dep -c /data/maloss/info/python -l python ../data/pypi.with_stats.csv ../airflow/data/pypi.with_stats.dep_graph.pickle
python main.py build_dep -c /data/maloss/info/python -v -l python ../data/pypi.with_stats.popular.versions.csv ../airflow/data/pypi.with_stats.popular.versions.dep_graph.pickle
python main.py build_author ../data/author_pkg_graph.popular.pickle -i ../data/pypi.with_stats.with_author.popular.csv ../data/npmjs.with_stats.with_author.popular.csv ../data/rubygems.with_stats.with_author.popular.csv ../data/packagist.with_stats.with_author.popular.csv -l python javascript ruby php -t ../data/top_authors.popular.json
python main.py build_author ../data/author_pkg_graph.pickle -i ../data/pypi.with_stats.with_author.csv ../data/npmjs.with_stats.with_author.csv ../data/rubygems.with_stats.with_author.csv ../data/packagist.with_stats.with_author.csv ../data/maven.with_author.csv -l python javascript ruby php java -t ../data/top_authors.json
tar -zxf ../airflow/data/pypi.with_stats.dep_graph.pickle.tgz
python main.py split_graph ../airflow/data/pypi.with_stats.dep_graph.pickle ../airflow/pypi_dags/ -d ../airflow/data/pypi_static.py -n 20
python main.py split_graph ../airflow/data/pypi.with_stats.popular.versions.dep_graph.pickle ../airflow/pypi_version_dags/ -d ../airflow/data/pypi_static_versions.py -n 10
python main.py split_graph ../airflow/data/maven.dep_graph.pickle ../airflow/maven_dags/ -d ../airflow/data/maven_static.py -n 20
python main.py split_graph ../airflow/data/maven.popular.versions.dep_graph.pickle.tgz ../airflow/maven_version_dags/ -d ../airflow/data/maven_static_versions.py -n 80 -k 4
python main.py split_graph ../airflow/data/pypi.with_stats.dep_graph.pickle ../airflow/pypi_dags/ -d ../airflow/data/pypi_static.py -s ../data/pypi.with_stats.popular.csv
python main.py install -n protobuf -l python -c ../testdata -o ../testdata
python main.py install -n eslint -l javascript -c ../testdata -o ../testdata
python main.py install -n protobuf -l ruby -c ../testdata -o ../testdata
python main.py install -n designsecurity/progpilot -l php -c ../testdata -o ../testdata
python main.py install -n com.google.protobuf/protobuf-java -l java -c ../testdata -o ../testdata
python main.py astgen ../testdata/test-eval-exec.py ../testdata/test-eval-exec.py.out -c ../config/test_astgen_python.config
python main.py astgen ../testdata/html5lib-1.0.1.tar.gz ../testdata/html5lib-1.0.1.tar.gz.out -c ../config/test_astgen_python.config
python main.py astgen ../testdata/python-taint-0.40.tar.gz ../testdata/python-taint-0.40.tar.gz.out -c ../config/test_astgen_python.config
python main.py astgen ../testdata/test-eval.js ../testdata/test-eval.js.out -c ../config/test_astgen_javascript.config -l javascript
python main.py astgen ../testdata/urlgrey-0.4.4.tgz ../testdata/urlgrey-0.4.4.tgz.out -c ../config/test_astgen_javascript.config -l javascript
cd static_proxy && php astgen.php -c ../../config/test_astgen_php.config.bin -i ../../testdata/test-eval-exec.php -o ../../testdata/test-eval-exec.php.out.bin && cd ..
python main.py astgen ../testdata/test-eval-exec.php ../testdata/test-eval-exec.php.out -c ../config/test_astgen_php.config -l php
python main.py astgen ../testdata/test-backtick.php ../testdata/test-backtick.php.out -c ../config/test_astgen_php.config -l php
python main.py astgen ../testdata/php/vendor/guzzlehttp/guzzle/ ../testdata/guzzlehttp_guzzle.out -c ../config/test_astgen_php.config -l php
cd static_proxy && ruby astgen.rb -c ../../config/test_astgen_ruby.config.bin -i ../../testdata/test-eval.rb -o ../../testdata/test-eval.rb.out.bin && cd ..
python main.py astgen ../testdata/test-eval.rb ../testdata/test-eval.rb.out -c ../config/test_astgen_ruby.config -l ruby
cd static_proxy/astgen-java && java -jar target/astgen-java-1.0.0-jar-with-dependencies.jar -help && cd ../../
cd static_proxy/astgen-java && java -jar target/astgen-java-1.0.0-jar-with-dependencies.jar -inpath ../../../testdata/Test.jar -outfile ../../../testdata/Test.jar.out -intype JAR -config ../../../config/astgen_java_smt.config -process_dir ../../../testdata/Test.jar && cd ../../
python main.py astgen ../testdata/protobuf-java-3.5.1.jar ../testdata/protobuf-java-3.5.1.jar.out -c ../config/test_astgen_java.config -l java
python main.py astgen ../testdata/Test.jar ../testdata/Test.jar.out -c ../config/astgen_java_smt.config -l java
../config/astgen_XXX_smt.config
for each language (e.g. ../config/astgen_javascript_smt.config
) in astfilter jobpython main.py astfilter -n protobuf -c $python_config -d ../testdata/ -o ../testdata/
python main.py astfilter -n eslint-scope -c $javascript_config -d ../testdata/ -o ../testdata/ -l javascript
python main.py astfilter -n designsecurity/progpilot -c $php_config -d ../testdata/ -o ../testdata/ -l php
python main.py astfilter -n protobuf -c $ruby_config -d ../testdata/ -o ../testdata -l ruby
python main.py astfilter -n com.google.protobuf/protobuf-java -c $java_config -d ../testdata/ -o ../testdata -l java
python main.py taint -n json -d /data/maloss/info/ruby -o /data/maloss/result/ruby -l ruby -c ../config/astgen_ruby_smt.config
python main.py taint -n urllib -i ../malware/pypi-samples/urllib-1.21.1.tgz -d /data/maloss/info/python -o ./ -l python -c ../config/astgen_python_smt.config
python main.py taint -n django-server -i ../malware/pypi-samples/django-server-0.1.2.tgz -d /data/maloss/info/python -o ./ -l python -c ../config/astgen_python_smt.config
pip download --no-binary :all: --no-deps trustme && python main.py taint -n trustme -i trustme-0.5.1.tar.gz -d /data/maloss/info/python -o ./ -l python -c ../config/astgen_python_smt.config
python main.py taint -n eslint-scope -i ../malware/npmjs-samples/eslint-scope-3.7.2.tgz -d /data/maloss/info/javascript -o ./ -l javascript -c ../config/astgen_javascript_smt.config
python main.py taint -n custom8 -i static_proxy/jsprime/jsprimetests/custom8.js -d /data/maloss/info/javascript -o ./ -l javascript -c ../config/astgen_javascript_smt.config
python main.py taint -n stream-combine -i ../malware/npmjs-samples/stream-combine-2.0.2.tgz -d /data/maloss/info/javascript -o ./ -l javascript -c ../config/astgen_javascript_smt.config
python main.py taint -n test-eval-exec -i ../testdata/test-eval-exec.php -d /data/maloss/info/php -o ./ -l php -c ../config/astgen_php_smt.config
python main.py taint -n test-multiple-flows -i static_proxy/progpilot/projects/tests/tests/flows/ -d /data/maloss/info/php -o ./ -l php -c ../config/astgen_php_smt.config
python main.py taint -n test-flow -i ../testdata/test-flow.php -d /data/maloss/info/php -o ./ -l php -c ../config/astgen_php_smt.config
python main.py taint -n active-support -l ruby -c ../config/astgen_ruby_smt.config -i ../malware/rubygems-samples/active-support-5.2.0.gem -o ./
python main.py taint -n bootstrap-sass -l ruby -c ../config/astgen_ruby_smt.config -i ../malware/rubygems-samples/bootstrap-sass-3.2.0.3.gem -o ./
python main.py taint -n brakeman-rails4 -l ruby -c ../config/astgen_ruby_smt.config -i ../testdata/rails4/ -o ./
python main.py filter_pkg ../data/pypi.with_stats.csv ../data/pypi.with_stats.with_taint_apis.csv -c ../config/astgen_python_taint_apis.config -o /data/maloss/result/python -d /data/maloss/info/python -l python
python main.py filter_pkg ../data/rubygems.with_stats.csv ../data/rubygems.with_stats.with_taint_apis.csv -c ../config/astgen_ruby_taint_apis.config -o /data/maloss/result/ruby -d /data/maloss/info/ruby -l ruby
python main.py filter_pkg ../data/npmjs.with_stats.csv ../data/npmjs.with_stats.with_taint_apis.csv -c ../config/astgen_javascript_taint_apis.config -o /data/maloss/result/javascript -d /data/maloss/info/javascript -l javascript
python main.py filter_pkg ../data/packagist.with_stats.csv ../data/packagist.with_stats.with_taint_apis.csv -c ../config/astgen_php_taint_apis.config -o /data/maloss/result/php -d /data/maloss/info/php -l php
python main.py filter_pkg ../data/maven.csv ../data/maven.with_taint_apis.csv -c ../config/astgen_java_taint_apis.config -o /data/maloss/result/java -d /data/maloss/info/java -l java
python main.py static -n protobuf -c $python_config -d ../testdata/ -o ../testdata/
python main.py dynamic -n protobuf -l python -c ../testdata -o ../testdata
sudo python main.py interpret_trace -l python --trace_dir /data/maloss1/sysdig/pypi -c /data/maloss/info/python -o /data/maloss/result/python -p 8
python main.py compare_ast -i ../malware/npmjs-samples/flatmap-stream-0.1.1.tgz ../benignware/npmjs-samples/flatmap-stream-0.1.0.tgz -o ../testdata/ ../testdata/flatmap-stream.json -l javascript -c ../config/astgen_javascript_smt.config
python main.py compare_ast -i ../testdata/test-backtick.php ../testdata/test-eval-exec.php -o tempout/ tempout/test_eval_backtick.json -l php -c ../config/astgen_php_smt.config
python main.py compare_ast -i ../malware/rubygems-samples/bootstrap-sass-3.2.0.3.gem ../benignware/rubygems-samples/bootstrap-sass-3.2.0.2.gem -l ruby -c ../config/astgen_ruby_smt.config -o ../testdata/ --outfile ../testdata/bootstrap-sass-compare.txt
python main.py compare_ast -i ../malware/rubygems-samples/active-support-5.2.0.gem ../benignware/rubygems-samples/activesupport-5.2.3.gem -c ../config/astgen_ruby_smt.config -o ../testdata/ --outfile ../testdata/activesupport-compare.txt -l ruby
python main.py filter_versions ../data/2019.07/packagist.versions.with_time.csv ../data/2019.07/packagist_ast_stats.apis.json ../data/2019.07/packagist.versions.with_time.filtered_loose_apis.csv
python main.py compare_hash -i ../data/maven.csv ../data/jcenter.csv -d /data/maloss/info/java /data/maloss/info/jcenter -o ../data/maven_jcenter.json
python main.py compare_hash -i ../data/jitpack.csv ../data/jcenter.csv -d /data/maloss/info/jitpack /data/maloss/info/jcenter -o ../data/jitpack_jcenter.json
python main.py compare_hash -i ../data/jitpack.csv ../data/jcenter.csv -d /data/maloss/info/jitpack /data/maloss/info/jcenter -o ../data/jitpack_jcenter_filtered.json --inspect_content
python main.py compare_hash -i ../data/jitpack.csv ../data/jcenter.csv -d /data/maloss/info/jitpack /data/maloss/info/jcenter -o ../data/jitpack_jcenter_filtered.json --inspect_api -c ../config/astgen_java_smt.config
python main.py compare_hash -i ../data/jitpack.csv ../data/jcenter.csv -d /data/maloss/info/jitpack /data/maloss/info/jcenter -o ../data/jitpack_jcenter_filtered_api.json --inspect_api -c ../config/astgen_java_smt.config --compare_hash_cache ../data/jitpack_jcenter_filtered.json
python main.py interpret_result --data_type api -c /data/maloss/info/python -o /data/maloss/result/python -l python ../data/2019.01/pypi.with_stats.csv ../data/pypi_api_stats.json
python main.py interpret_result --data_type api -c /data/maloss/info/python -o /data/maloss/result/python -l python ../data/2019.01/pypi.with_stats.popular.csv ../data/pypi_pop_api_stats.json
python main.py interpret_result --data_type api -c /data/maloss/info/python -o /data/maloss/result/python -l python ../data/2019.01/pypi.with_stats.csv ../data/pypi_api_mapping.json -d --detail_filename
python main.py interpret_result --data_type domain -c /data/maloss/info/python -o /data/maloss/result/python -l python ../data/2019.06/pypi.csv ../data/2019.06/pypi_domain_stats.json
python main.py interpret_result --data_type domain -c /data/maloss/info/python -o /data/maloss/result/python -l python ../data/2019.06/pypi.csv ../data/2019.06/pypi_domain_mapping.json -d
python main.py interpret_result --data_type dependency -l python ../data/pypi.with_stats.popular.csv ../data/pypi_pop_dep_stats.json
python main.py interpret_result --data_type compare_ast -c /data/maloss/info/python -o /data/maloss/result/python -l python ../data/2019.06/pypi.with_stats.popular.csv ../data/2019.06/pypi_compare_ast_stats.json
python main.py interpret_result --data_type compare_ast -c /data/maloss/info-2019.07/javascript -o /data/maloss/result-2019.07/javascript -l javascript ../data/2019.07/npmjs.csv ../data/2019.07/npmjs_ast_stats.json --compare_ast_options_file ../data/2019.07/compare_ast_options.json
python main.py interpret_result --data_type install_with_network -c /data/maloss/info/javascript -o /data/maloss/result/javascript -l javascript -m npmjs ../data/2019.06/npmjs.csv ../data/2019.06/npmjs.install_with_network.json
python main.py interpret_result --data_type reverse_dep -l javascript -m npmjs ../airflow/data/high_impact.csv ../airflow/data/high_impact_npmjs.json
python main.py interpret_result --data_type reverse_dep -l python -m pypi ../airflow/data/high_impact.csv ../airflow/data/high_impact_pypi.json
python main.py interpret_result --data_type reverse_dep -l ruby -m rubygems ../airflow/data/high_impact.csv ../airflow/data/high_impact_rubygems.json
python main.py interpret_result --data_type correlate_info_api_compare_ast -c /data/maloss/info-2019.07/javascript -o /data/maloss/result-2019.07/javascript -l javascript -m npmjs -s ../data/2019.07/npmjs_skip_list.json ../data/2019.07/npmjs_ast_stats.json ../data/2019.07/npmjs_correlate_info_api_compare_ast.json
python main.py interpret_result --data_type correlate_info_api_compare_ast -c /data/maloss/info-2019.07/php -o /data/maloss/result-2019.07/php -l php -m packagist -s ../data/2019.07/packagist_skip_list.json ../data/2019.07/packagist_ast_stats.json ../data/2019.07/packagist_correlate_info_api_compare_ast.json
python main.py interpret_result --data_type taint -c /data/maloss/info-2019.07/php -o /data/maloss/result-2019.07/php -l php ../data/2019.07/packagist.csv ../data/2019.07/packagist_flow_stats.json
python main.py grep_pkg ../data/2019.07/rubygems.csv ../data/2019.07/rubygems.csv.pastebin.com pastebin.com -l ruby -p 80
python main.py grep_pkg ../data/2019.07/npmjs.csv ../data/2019.07/npmjs.csv.pastebin.com pastebin.com -l javascript -p 20
python main.py speedup ../data/2019.01/pypi.with_stats.popular.csv speedup.log -l python
-s
option specifies the maximum string size to print-v
option print unabbreviated argv, stat, termios, etc. argsWhereas Facter and osquery are predominantly about querying infrequently changing information, Sysdig is much more suited to working with real-time data streams – for example, network or file I/O, or tracking errors in running processes.
--extra-index-url
for internal/external packages will choose the version with higher version numbergem install --source