Graphic abstract

Network approaches in biomedicine have proven to be useful for organizing high-dimensional biomedical datasets and extracting meaningful information. A network is a way of representing datasets emphasizing the relationships between nodes, such as DNA interaction networks, expression networks, genomic regulatory networks, metabolic networks, phenotypic networks, drug-target/drug networks, protein-protein networks, and human disease networks, etc. Depending on the objects under investigation, nodes could be genes, mRNA, microRNA, proteins, metabolites, drug molecules, diseases, side effects, or any other entity capable of interacting in the modeling system. Studying the node relationships within the biological network helps us to infer new associations between nodes, and thereby has a large number of applications in the biomedical and drug discovery researches. For instance, several interaction-profile similarities are used in gene regulatory network or protein-protein interaction network to annotate and cluster genes or proteins, predict new associations, and find network modules. Likewise, they could also be applied in drug-target interaction network to predict new drug-target associations, find the new use for old drugs, and identify new protein targets.

Generally speaking, there are two ways to define the similarity between nodes. One is based on the essential attributes of nodes. For example, drug molecules are represented by chemical structures, and the proteins are represented by the physiochemical properties of amino acids. Two nodes are considered to be similar if they have many common features. However, the attributes of nodes are generally hidden or are not easily available. For example, in the drug-side effect/disease networks, the representations from those nodes representing the phenotypes such as side effects or diseases are very not easy to obtain. Additionally, it is very difficult to select the more suitable node attributes which contribute to the study object. The other is based solely on the network structure. Network-based node similarity captures and characterizes node relationships by considering the network environment in which nodes are located. Some of them have been successfully used in the biomedical and drug researches.

We implemented a selection of sophisticated network-based node similarity measures and provide them as a package for the free and open source software environment-python. The PyNetSim package aims at providing the user with comprehensive implementations of these similarity measures in a unified framework in order to allow easy and transparent computation. To our knowledge, PyNetSim is the first open-source package computing a number of node similarity indices based solely on the network structure. We recommend PyNetSim to analyze and represent the node relationships in the biomedical network under investigation. Further, we hope that the package will be helpful when exploring questions concerning function annotation, link prediction, and module identification in the context of network biology, network pharmacology, and network medicine.