Antimicrobial peptides are a unique and diverse group of molecules, divided into subgroups based on their amino-acid composition and structure, that have been demonstrated to kill bacteria, viruses and fungi, and even transform cancerous cells. Today there is the need to discover new antimicrobial peptides as antimicrobial resistance is a threat to global health. High-throughput simulation, machine learning as well as data analysis and representation can help accelerate the discovery process. As a large amount of proteins and peptides sequences annotated with a range of information and properties are available in public databases (such as Uniprot, InterPro, CAMPR3, etc.) for analysis, we want to explore these datasets from a genomic perspective and cluster sequences that share some functionality.
In support of this activity, we are developing a k-mer based framework for clustering, graph representation and visualization of amino-acid sequences, more precisely antimicrobial peptides, based on their functionalities, properties and structural features. The tool can provide insights about the data by extracting antimicrobial signals from sequences and inspiration in the process of discovering novel antimicrobial peptides.