#!/usr/bin/env perl
use strict;
use warnings;

use FindBin qw($Bin);
use lib "$Bin/../lib";
use Pheno::Ranker::CLI;

exit Pheno::Ranker::CLI->new( pod_file => __FILE__ )->run(@ARGV);

=head1 NAME

pheno-ranker: A script that performs semantic similarity in PXF/BFF data structures and beyond (JSON|YAML)

=head1 SYNOPSIS

 pheno-ranker -r <individuals.json> -t <patient.json> [-options]

   Arguments:
     * Cohort mode:
       -r, --reference <file>         JSON/YAML BFF/PXF file(s) (array/object), supports .gz

     * Patient mode:
       -t, --target <file>            JSON/YAML BFF/PXF file (object or single-object array), supports .gz 

   Options:
     -age                             Include age-related variables; excludes agent-like terms (BFF/PXF-only) [>no-age|age]
     -a, --align [path/basename]      Write alignment file(s). If not specified, default filenames are used [default: alignment.*]
     -append-prefixes <prefixes>      Prefixes for primary_key when #cohorts >= 2 [default: C]
     -config <file>                   YAML config file to modify default parameters [default: share/conf/config.yaml]
     -cytoscape-json [file]           Writes an undirected graph in Cytoscape-compatible JSON [default: graph.json]
     -e, --export [path/basename]     Export miscellaneous JSON files. If not specified, default filenames are used [default: export.*]
     -exclude-terms <terms>           Exclude BFF/PXF terms (e.g., --exclude-terms sex id) or column names in JSON-derived from CSV 
     -graph-stats [file]              Generates a text file with key graph metrics, for use with <-cytoscape-json> [default: graph_stats.txt]
     -graph-min-weight <number>        Keep graph edges with weight greater than or equal to this value
     -graph-max-weight <number>        Keep graph edges with weight less than or equal to this value
     -include-hpo-ascendants          Include ascendant terms from the Human Phenotype Ontology (HPO)
     -include-terms <terms>           Include BFF/PXF terms (e.g., --include-terms diseases) or column names in JSON-derived from CSV
     -max-matrix-records-in-ram <number> In cohort mode, set max records before switching to RAM-efficient mode (default: 5000)
     -matrix-format <format>          Matrix output format in cohort mode [>dense|mtx]
     -max-number-vars <number>        Maximum number of variables for binary string [default: 10000]
     -max-out <number>                Print only N comparisons [default: 50]
     -o, --out-file <file>            Output file path [default: -r matrix.txt | -t rank.txt]
     -poi, --patients-of-interest <id_list>   Export JSON files for the selected individual IDs during a dry-run
     -poi-out-dir <directory>         Directory for JSON files (used with --poi)
     -prp, --precomputed-ref-prefix [path/basename]   Use precomputed data for the reference cohort(s). No need to use --r
     -retain-excluded-phenotypicFeatures     Retains features set to "excluded": true by appending '_excluded' to their IDs
     -similarity-metric-cohort <metric>  Similarity metric for cohort mode [>hamming|jaccard]
     -sort-by <metric>                Sort by Hamming distance or Jaccard index [>hamming|jaccard]
     -w, --weights <file>             YAML file with weights

   Generic Options:
     -debug <level>                   Print debugging (from 1 to 5, being 5 max)
     -h, --help                       Brief help message
     -log                             Save log file [default: pheno-ranker-log.json]
     -man                             Full documentation
     -no-color                        Toggle color output [>color|no-color]
     -v, --verbose                    Verbosity on
     -V, --version                    Print version

=head1 SUMMARY

Pheno-Ranker is a lightweight, easy-to-install tool for performing semantic similarity analysis on phenotypic data in JSON/YAML formats, including Beacon v2 Models and Phenopackets v2. It also supports pre-processed CSV files prepared using the included C<csv2pheno-ranker> utility.

=head1 INSTALLATION

If you plan to only use C<pheno-ranker> CLI, we recommend installing it via CPAN. See details below.

=head2 Non containerized

The Perl command-line interface is tested on Linux, macOS, and Windows via GitHub Actions. The commands below focus on Debian-based Linux systems, where Perl 5 is typically available by default and extra CPAN modules can be installed with C<cpanminus>. On Windows, use Docker, WSL, or a Perl environment such as Strawberry Perl.

=head3 Method 1: From CPAN

First install system level dependencies:

  sudo apt-get install cpanminus libperl-dev gcc make

We will install Pheno-Ranker and the dependencies at C<~/perl5>

  cpanm --local-lib=~/perl5 local::lib && eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)
  cpanm --notest Pheno::Ranker
  pheno-ranker --help

To ensure Perl recognizes your local modules every time you start a new terminal, you should type:

  echo 'eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)' >> ~/.bashrc

To B<update> to the newest version:
    
  cpanm Pheno::Ranker

=head3 Method 2: From CPAN in a CONDA environment

Please follow L<these instructions|https://cnag-biomedical-informatics.github.io/pheno-ranker/download-and-installation/#__tabbed_1_2>.

=head3 Method 3: From GitHub

To clone the repository for the first time:

  git clone https://github.com/cnag-biomedical-informatics/pheno-ranker.git
  cd pheno-ranker

To update an existing clone, navigate to the repository folder and run:

  git pull

Install system level dependencies:
  
  sudo apt-get install cpanminus libperl-dev

Now you have to choose between one of the 2 options below:

B<Option 1:> Install dependencies (they're harmless to your system) as C<sudo>:

  cpanm --notest --sudo --installdeps .
  bin/pheno-ranker --help            

B<Option 2:> Install the dependencies at C<~/perl5>:

  cpanm --local-lib=~/perl5 local::lib && eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)
  cpanm --notest --installdeps .
  bin/pheno-ranker --help

To ensure Perl recognizes your local modules every time you start a new terminal, you should type:

  echo 'eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)' >> ~/.bashrc

I<Optional:> If you want to use C<utils/barcode> or C<utils/bff_pxf_plot>:

  sudo apt-get install python3-pip libzbar0
  pip3 install -r requirements.txt

=head2 Containerized

=head3 Method 4: From Docker Hub 

(Estimated Time: Approximately 10 seconds)

Download the latest version of the Docker image (supports both amd64 and arm64 architectures) from L<Docker Hub|https://hub.docker.com/r/manuelrueda/pheno-ranker> by executing:

  docker pull manuelrueda/pheno-ranker:latest
  docker image tag manuelrueda/pheno-ranker:latest cnag/pheno-ranker:latest

See additional instructions below.

=head3 Method 5: With Dockerfile

(Estimated Time: Approximately 1 minute)

Please download the C<Dockerfile> from the repo:

  wget https://raw.githubusercontent.com/cnag-biomedical-informatics/pheno-ranker/main/Dockerfile

And then run:

  # Docker Version 19.03 and Above (Supports buildx)
  docker buildx build -t cnag/pheno-ranker:latest .

  # Docker Version Older than 19.03 (Does Not Support buildx)
  docker build -t cnag/pheno-ranker:latest .

=head3 Additional instructions for Methods 4 and 5

To run the container (detached) execute:

  docker run -tid -e USERNAME=root --name pheno-ranker cnag/pheno-ranker:latest

To enter:

  docker exec -ti pheno-ranker bash

The command-line executable can be found at:

  /usr/share/pheno-ranker/bin/pheno-ranker

The default container user is C<root> but you can also run the container as C<$UID=1000> (C<dockeruser>). 

  docker run --user 1000 -tid --name pheno-ranker cnag/pheno-ranker:latest
 
=head3 Mounting volumes

Docker containers are fully isolated. If you need the mount a volume to the container please use the following syntax (C<-v host:container>). 
Find an example below (note that you need to change the paths to match yours):

  docker run -tid --volume /media/mrueda/4TBT/data:/data --name pheno-ranker-mount cnag/pheno-ranker:latest

Then I will do something like this:

  # First I create an alias to simplify invocation (from the host)
  alias pheno-ranker='docker exec -ti pheno-ranker-mount /usr/share/pheno-ranker/bin/pheno-ranker'

  # Now I use the alias to run the command (note that I use the flag --o to specify the filepath)
  pheno-ranker -r /data/individuals.json -o /data/matrix.txt

=head3 System requirements

  - OS/ARCH supported: B<linux/amd64> and B<linux/arm64>.
  - Ideally a Debian-based distribution (Ubuntu or Mint), but any other (e.g., CentOS, OpenSUSE) should do as well (untested).
    The Perl CLI is also tested on macOS and Windows; container images are Linux-based.
  * Perl 5 (>= 5.26 core; installed by default in most Linux distributions). Check the version with "perl -v".
  * >= 4GB of RAM
  * 1 core
  * At least 16GB HDD

=head1 HOW TO RUN PHENO-RANKER

For executing pheno-ranker you will need a PXF/BFF file(s) in JSON|YAML format. The reference cohort must be a JSON array, where each individual data are consolidated in one object.

You can download examples from L<this location|https://github.com/CNAG-Biomedical-Informatics/pheno-ranker/tree/main/share/ex>.

There are two modes of operation:

=over 4

=item Cohort mode:
 
B<Intra-cohort:> With C<--r> argument and 1 cohort.

B<Inter-cohort:> With C<--r> and multiple cohort files. It can be used in combination with C<--append-prefixes> to add prefixes to each individual id.

=item Patient Mode:

With C<-r> reference cohort(s) and C<--t> patient data.

=back

B<Examples:>

 $ bin/pheno-ranker -r phenopackets.json  # intra-cohort

 $ bin/pheno-ranker -r phenopackets.yaml -o my_matrix.txt # intra-cohort

 $ bin/pheno-ranker -r phenopackets.json -w weights.yaml --exclude-terms sex ethnicity exposures # intra-cohort with weights

 $ $path/pheno-ranker -r individuals.json others.yaml --append-prefixes CANCER CONTROL  # inter-cohort

 $ $path/pheno-ranker -r individuals.json -t patient.yaml -max-out 100 # mode patient


=head2 COMMON ERRORS AND SOLUTIONS

 * Error message: R plotting
     Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
     line 1 did not have X elements
     Calls: as.matrix -> read.table -> scan
     Execution halted
   Solution: Make sure that the values of your primary key (e.g., "id") do not contain spaces (e.g., "my fav id" must be "my_fav_id")

 * Error message: Foo
   Solution: Bar

=head1 CITATION

The author requests that any published work that utilizes C<Pheno-Ranker> includes a cite to the following reference:

Leist, I.C. et al., (2024). Pheno-Ranker: a toolkit for comparison of phenotypic data stored in GA4GH standards and beyond. I<BMC Bioinformatics>. DOI: 10.1186/s12859-024-05993-2

=head1 AUTHOR 

Written by Manuel Rueda, PhD. Info about CNAG can be found at L<https://www.cnag.eu>.

=head1 COPYRIGHT AND LICENSE

This PERL file is copyrighted. See the LICENSE file included in this distribution.

=cut
