- P. Millan Arias, J.Butler, G.Randhawa, M. Soltysiak, K.Hill, L.Kari. Environment
and taxonomy shape the genomic signature of prokaryotic extremophiles. Nature Scientific Reports, issue 13, no. 16105, 2023.
This study provides comprehensive quantitative evidence suggesting that adaptations to extreme temperatures and pH imprint a discernible environmental component in the genomic signature of microbial extremophiles. Both supervised and unsupervised machine learning algorithms were used to analyze genomic signatures, each computed as the k-mer frequency vector of a 500 kbp DNA fragment arbitrarily selected to represent a genome.
Computational experiments classified/clustered genomic signatures extracted from a curated dataset of ∼700 extremophile (temperature, pH)
bacteria and archaea genomes, at multiple scales of analysis, 1≤𝑘≤6. The supervised learning resulted in high accuracies
for taxonomic classifications at 2≤𝑘≤6, and medium to medium-high accuracies for environment category classifications of the same datasets
at 3≤𝑘≤6. For 𝑘=3, our findings were largely consistent with amino acid compositional biases and codon usage patterns in coding regions,
previously attributed to extreme environment adaptations. The unsupervised learning of unlabelled sequences identified several exemplars of hyperthermophilic organisms with large similarities in their genomic signatures, in spite of belonging to different domains in the Tree of Life.
- G.Randhawa, M.Soltysiak, H. El Roz, C. de Souza, K.Hill, L.Kari.
Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study.
PLOS ONE, April 24, 2020.
pdf
-
G.Randhawa, K.Hill, L.Kari.
MLDSP-GUI: An alignment-free standalone tool with an interactive graphical user interface for DNA sequence
comparison and analysis. Bioinformatics, vol. 36, issue 7, 2020, pp. 2258–2259.
pdf.
-
ML-DSP: Machine learning with Digital Signal Processing for ultrafast, accurate and scalable
DNA sequence classification at all taxonomic levels. G.Randhawa, K.Hill, L.Kari. BMC Genomics, 2019, 20:267.
pdf .
-
An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-I
genomes. S. Solis-Reyes, M.Avino, A. Poon, L.Kari. PLOS ONE 13(11): e0206409.
https://doi.org/10.1371/journal.pone.0206409. pdf
-
ModMaps3D: An interactive webtool for the quantification and 3D visualization of interrelationships
in a dataset of DNA sequences. R. Karamichalis, L.Kari. Bioinformatics, June 2017.
pdf
-
L.Kari, S.Kopecki. Deciding whether a regular language is generated by a splicing system. Journal of
Computer and Systems Sciences, JCSS, vol.84, 2017, 263-287
pdf
-
R. Karamichalis, L. Kari, S. Konstantinidis, S.Kopecki, S.Solis-Reyes. Additive methods for genomic
signatures. BMC Bioinformatics, 17:313, 2016.
pdf
-
Mapping the space of genomic signatures. L.Kari, K.Hill, A.Sayem, R.Karamichalis, N.Bryans, K.Davis,
N.Dattani. PLoS ONE 10(5): e0119815. doi:10.1371/journal.pone.0119815, 2015,
pdf
We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination of hundreds or thousands of complete mitochondrial genomes. An "image distance" is computed for each pair of graphical representations of DNA sequences, and the distances are visualized as a Molecular Distance Map: Each point on the map represents a DNA sequence, and the spatial proximity between any two points reflects the degree of structural similarity between the corresponding sequences. The graphical representation of DNA sequences utilized, Chaos Game Representation (CGR), is genome- and species-specific and can thus act as a genomic signature. Consequently, Molecular Distance Maps could inform species identification, taxonomic classifications and, to a certain extent, evolutionary history.
-
El.Czeizler, Eu.Czeizler, L.Kari, S.Seki.
An extension of the Lyndon-Schutzenberger result to pseudoperiodic words.
Information and Computation, 209, 2011, 717-730.
pdf
- L.Kari, S.Seki.
An improved bound for an extension of Fine and Wilf's theorem, and its optimality.
Fundamenta Informaticae, 101 (3), 2010, 215-236.
(pdf)
-
L.Adleman, J.Kari, L.Kari, D.Reishus, P.Sosik, The
Undecidability of the Infinite Ribbon Problem: Implications for Computing by Self-Assembly.
SIAM J. Comput. Volume 38, Issue 6, pp. 2356-2381 (2009)
(pdf)
Self-assembly, the process by which simple objects autonomously come together to form complex structures, has recently become of great practical
importance in the context of the developments in nanotechnolodgy and nanocomputation.
A systematic study of self-assembly as a mathematical process has been initiated by Adleman. The individual components are modelled as
square tiles on the infinite two-dimensional plane. Each side of a tile is covered by a specific ``glue'', and two adjacent tiles will
stick iff they have matching glues on their abutting edges. Tiles that stick to each other may form various two-dimensional ``structures''
such as squares, rectangles, or may cover the entire plane. In this paper we focus on a special type of structure, called ribbon:
A non-self-crossing sequence of tiles on the plane, in which successive tiles are adjacent along an edge, and abutting edges of
consecutive tiles have matching glues. We namely prove that, given a finite set of tiles with glues (infinite supply of each tile
type available) it is undecidable whether or not there exists an infinite ribbon that can be formed with these tiles.
The result settles a decade-old open problem formerly known as the ``unlimited infinite snake problem''. An immediate
consequence is the undecidability of existence of arbitrarily large structures self-assembled using tiles from a given tile set.
-
L. Kari, S. Seki,
On pseudoknot-bordered words and their properties,
Journal of Computer and System Sciences, 75 (2009) 113-121.
pdf
-
L. Kari, K. Mahalingam, S. Seki,
Twin-roots of words and their properties, Theoretical Computer Science, vol. 410, nr.24-25 (2009),
2393-2400.
pdf
-
L.Kari, G.Rozenberg. The many facets of natural computing. Communications
of the ACM, Oct.2008, vol.51, no.10, 72-83.
Cover Story of
Communications of the ACM. (pdf file)
pdf file of the paper with no pictures but
with extended reference list .
-
E.Czeizler, L.Kari, S.Seki.
On a Special Class of Primitive Words. MFCS 2008: 265-277
pdf
-
L.Kari, K.Mahalingam, Involutively bordered words, International
Journal of Foundations of Computer Science, 18(5) 2007, 1089-1106.
pdf
-
L.Kari, S.Konstantinidis, P.Sosik.
On properties of bond-free DNA languages
Theoretical Computer Science, vol. 334, no. 1-3 (2005), 131-159.
pdf
-
L. Kari, S. Konstantinidis, P. Sosik: Bond-free Languages:
Formalizations, Maximality and Construction Methods. International
Journal of Foundations of Computer Science 16 (2005), 1039--1070.
pdf
The computation language of a DNA-based system consists of all the
words (DNA strands) that can appear during any biocomputation of the system.
In this paper we investigate properties of such
languages which ensure that their words will not form undesirable
bonds when used in DNA computations.
The pioneering aspect and contribution of this research is to establish
a link between classical
coding theory and DNA computing. It turns out that the newly defined
notions of DNA compliant and sticky-free languages
are theoretically elegant generalizations of classical types of codes. Furthermore,
the use of such languages as input data ensures that the DNA computations will proceed error-free,
therefore their study has practical relevance for
experimental DNA computing.
-
L. Kari, S. Konstantinidis: Language Equations, Maximality and Error-detection. J. of Computer and System Sciences 70 (2005), 157-178.
pdf
-
Y.Wang, K.Hill, S.Singh, L.Kari.
The spectrum of genomic signatures: From dinucleotides to Chaos Game Representation.
GENE, 346 (2005), 173-185. pdf
In the post genomic era, access to complete genome sequence data
for numerous diverse species has triggered multiple avenues for
examining and comparing primary DNA sequence organization of
entire genomes. Previously, the concept of a genomic signature
was introduced with the observation of species-type specific
Dinucleotide Relative Abundance Profiles (DRAPs); dinucleotides
were identified as the subsequence with the greatest bias in
representation in a majority of genomes. Herein, we demonstrate
that DRAP is one particular genomic signature contained within a
broader spectrum of signatures. In this spectrum, an alternative
genomic signature, Chaos Game Representation (CGR), provides a
unique visualization of patterns in sequence organization.
A genomic signature is associated with a particular integer order or
subsequence length that represents a measure of the resolution or
granularity in the analysis of primary DNA sequence organization.
We quantitatively explore the organizational information provided
by genomic signatures of different orders through different
distance measures, including a novel Image Distance. The Image
Distance and other existing distance measures are evaluated by
comparing the phylogenetic trees they generate for 26 complete
mitochondrial genomes from a diversity of species. The
phylogenetic tree generated by the Image Distance is compatible
with the known relatedness of species. Quantitative evaluation of
the spectrum of genomic signatures may be used to ultimately gain
insight into the determinants and biological relevance of the
genome signatures.
-
L.Kari, L.F.Landweber. Computational power of gene rearrangement.
Proc. DNA 5, DIMACS Series, 54(2000), 207-216
pdf , ps
Ciliates are a diverse group of a few thousand types of
unicellular eukaryotes distinguished by
the presence of two nuclei: an active macronucleus and a
functionally inert micronucleus which contributes only to sexual reproduction.
The macronuclear DNA forms after
sexual reproduction from the micronuclear DNA by recombinations that accomplish a rearrangement of
gene fragments in the desired order and the deletion of non-coding sequences.
In this paper we develop a model for the guided homologous recombinations that take place
during gene rearrangement
and prove that it has the computational power of a Turing machine.
This is a proof of concept indicating that unicellular organisms may have the capacity to perform
any computation carried out by an electronic computer.
-
L.Landweber, L.Kari. The evolution of cellular computing:
nature's solution to a computational problem. Biosystems 52(1999) 3-13.
pdf
-
L.Kari. DNA computing -- The arrival of biological mathematics.
The Mathematical Intelligencer, 19(1997), 9--22.
pdf
-
L.Kari.
On Insertion and Deletion in Formal Languages.
Ph.D. Thesis, University of Turku, 1991.
Cover page ,
Title page ,
ISBN ,
Acknowledgements ,
Part I ,
Part II .
-
G.Paun, L.Santean.
Parallel communicating grammar systems-- the regular case.
Analele Universitatii din Bucuresti, Seria matematica-informatica,nr.2, 1989, 55-63. pdf