
MLDSP: Machine learning with Digital Signal Processing for ultrafast, accurate and scalable
DNA sequence classification at all taxonomic levels. G.Randhawa, K.Hill, L.Kari.
pdf .

An opensource kmer based machine learning tool for fast and accurate subtyping of HIVI
genomes. S. SolisReyes, M.Avino, A. Poon, L.Kari. PLOS ONE 13(11): e0206409.
https://doi.org/10.1371/journal.pone.0206409. pdf

ModMaps3D: An interactive webtool for the quantification and 3D visualization of interrelationships
in a dataset of DNA sequences. R. Karamichalis, L.Kari. Bioinformatics, June 2017.
pdf

L.Kari, S.Kopecki. Deciding whether a regular language is generated by a splicing system. Journal of
Computer and Systems Sciences, JCSS, vol.84, 2017, 263287
pdf

R. Karamichalis, L. Kari, S. Konstantinidis, S.Kopecki, S.SolisReyes. Additive methods for genomic
signatures. BMC Bioinformatics, 17:313, 2016.
pdf

Mapping the space of genomic signatures. L.Kari, K.Hill, A.Sayem, R.Karamichalis, N.Bryans, K.Davis,
N.Dattani. PLoS ONE 10(5): e0119815. doi:10.1371/journal.pone.0119815, 2015,
pdf
We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination of hundreds or thousands of complete mitochondrial genomes. An "image distance" is computed for each pair of graphical representations of DNA sequences, and the distances are visualized as a Molecular Distance Map: Each point on the map represents a DNA sequence, and the spatial proximity between any two points reflects the degree of structural similarity between the corresponding sequences. The graphical representation of DNA sequences utilized, Chaos Game Representation (CGR), is genome and speciesspecific and can thus act as a genomic signature. Consequently, Molecular Distance Maps could inform species identification, taxonomic classifications and, to a certain extent, evolutionary history.

El.Czeizler, Eu.Czeizler, L.Kari, S.Seki.
An extension of the LyndonSchutzenberger result to pseudoperiodic words.
Information and Computation, 209, 2011, 717730.
pdf
 L.Kari, S.Seki.
An improved bound for an extension of Fine and Wilf's theorem, and its optimality.
Fundamenta Informaticae, 101 (3), 2010, 215236.
(pdf)

L.Adleman, J.Kari, L.Kari, D.Reishus, P.Sosik, The
Undecidability of the Infinite Ribbon Problem: Implications for Computing by SelfAssembly.
SIAM J. Comput. Volume 38, Issue 6, pp. 23562381 (2009)
(pdf)
Selfassembly, the process by which simple objects autonomously come together to form complex structures, has recently become of great practical
importance in the context of the developments in nanotechnolodgy and nanocomputation.
A systematic study of selfassembly as a mathematical process has been initiated by Adleman. The individual components are modelled as
square tiles on the infinite twodimensional plane. Each side of a tile is covered by a specific ``glue'', and two adjacent tiles will
stick iff they have matching glues on their abutting edges. Tiles that stick to each other may form various twodimensional ``structures''
such as squares, rectangles, or may cover the entire plane. In this paper we focus on a special type of structure, called ribbon:
A nonselfcrossing sequence of tiles on the plane, in which successive tiles are adjacent along an edge, and abutting edges of
consecutive tiles have matching glues. We namely prove that, given a finite set of tiles with glues (infinite supply of each tile
type available) it is undecidable whether or not there exists an infinite ribbon that can be formed with these tiles.
The result settles a decadeold open problem formerly known as the ``unlimited infinite snake problem''. An immediate
consequence is the undecidability of existence of arbitrarily large structures selfassembled using tiles from a given tile set.

L. Kari, S. Seki,
On pseudoknotbordered words and their properties,
Journal of Computer and System Sciences, 75 (2009) 113121.
pdf

L. Kari, K. Mahalingam, S. Seki,
Twinroots of words and their properties, Theoretical Computer Science, vol. 410, nr.2425 (2009),
23932400.
pdf

L.Kari, G.Rozenberg. The many facets of natural computing. Communications
of the ACM, Oct.2008, vol.51, no.10, 7283.
Cover Story of
Communications of the ACM. (pdf file)
pdf file of the paper with no pictures but
with extended reference list .

E.Czeizler, L.Kari, S.Seki.
On a Special Class of Primitive Words. MFCS 2008: 265277
pdf

L.Kari, K.Mahalingam, Involutively bordered words, International
Journal of Foundations of Computer Science, 18(5) 2007, 10891106.
pdf

L.Kari, S.Konstantinidis, P.Sosik.
On properties of bondfree DNA languages
Theoretical Computer Science, vol. 334, no. 13 (2005), 131159.
pdf

L. Kari, S. Konstantinidis, P. Sosik: Bondfree Languages:
Formalizations, Maximality and Construction Methods. International
Journal of Foundations of Computer Science 16 (2005), 10391070.
pdf
The computation language of a DNAbased system consists of all the
words (DNA strands) that can appear during any biocomputation of the system.
In this paper we investigate properties of such
languages which ensure that their words will not form undesirable
bonds when used in DNA computations.
The pioneering aspect and contribution of this research is to establish
a link between classical
coding theory and DNA computing. It turns out that the newly defined
notions of DNA compliant and stickyfree languages
are theoretically elegant generalizations of classical types of codes. Furthermore,
the use of such languages as input data ensures that the DNA computations will proceed errorfree,
therefore their study has practical relevance for
experimental DNA computing.

L. Kari, S. Konstantinidis: Language Equations, Maximality and Errordetection. J. of Computer and System Sciences 70 (2005), 157178.
pdf

Y.Wang, K.Hill, S.Singh, L.Kari.
The spectrum of genomic signatures: From dinucleotides to Chaos Game Representation.
GENE, 346 (2005), 173185. pdf
In the post genomic era, access to complete genome sequence data
for numerous diverse species has triggered multiple avenues for
examining and comparing primary DNA sequence organization of
entire genomes. Previously, the concept of a genomic signature
was introduced with the observation of speciestype specific
Dinucleotide Relative Abundance Profiles (DRAPs); dinucleotides
were identified as the subsequence with the greatest bias in
representation in a majority of genomes. Herein, we demonstrate
that DRAP is one particular genomic signature contained within a
broader spectrum of signatures. In this spectrum, an alternative
genomic signature, Chaos Game Representation (CGR), provides a
unique visualization of patterns in sequence organization.
A genomic signature is associated with a particular integer order or
subsequence length that represents a measure of the resolution or
granularity in the analysis of primary DNA sequence organization.
We quantitatively explore the organizational information provided
by genomic signatures of different orders through different
distance measures, including a novel Image Distance. The Image
Distance and other existing distance measures are evaluated by
comparing the phylogenetic trees they generate for 26 complete
mitochondrial genomes from a diversity of species. The
phylogenetic tree generated by the Image Distance is compatible
with the known relatedness of species. Quantitative evaluation of
the spectrum of genomic signatures may be used to ultimately gain
insight into the determinants and biological relevance of the
genome signatures.

L.Kari, L.F.Landweber. Computational power of gene rearrangement.
Proc. DNA 5, DIMACS Series, 54(2000), 207216
pdf , ps
Ciliates are a diverse group of a few thousand types of
unicellular eukaryotes distinguished by
the presence of two nuclei: an active macronucleus and a
functionally inert micronucleus which contributes only to sexual reproduction.
The macronuclear DNA forms after
sexual reproduction from the micronuclear DNA by recombinations that accomplish a rearrangement of
gene fragments in the desired order and the deletion of noncoding sequences.
In this paper we develop a model for the guided homologous recombinations that take place
during gene rearrangement
and prove that it has the computational power of a Turing machine.
This is a proof of concept indicating that unicellular organisms may have the capacity to perform
any computation carried out by an electronic computer.

L.Landweber, L.Kari. The evolution of cellular computing:
nature's solution to a computational problem. Biosystems 52(1999) 313.
pdf

L.Kari. DNA computing  The arrival of biological mathematics.
The Mathematical Intelligencer, 19(1997), 922.
pdf

L.Kari.
On Insertion and Deletion in Formal Languages.
Ph.D. Thesis, University of Turku, 1991.
Cover page ,
Title page ,
ISBN ,
Acknowledgements ,
Part I ,
Part II .

G.Paun, L.Santean.
Parallel communicating grammar systems the regular case.
Analele Universitatii din Bucuresti, Seria matematicainformatica,nr.2, 1989, 5563. pdf