Constructing Antidictionaries in OutputSensitive Space
Ayad, Lorraine A. K.; Badkobeh, Golnaz; Fici, Gabriele; Heliou, Alice and Pissis, Solon P.. 2019. 'Constructing Antidictionaries in OutputSensitive Space'. In: 2019 Data Compression Conference. Snowbird, Utah, United States 2629 March 2019. [Conference or Workshop Item]

Text
DCC_Computing_Minimal_Absent_Words_in_Output_Sensitive_Space (1).pdf  Accepted Version Available under License Creative Commons Attribution Noncommercial. Download (365kB)  Preview 
Abstract or Description
A word x that is absent from a word y is called minimal if all its proper factors occur in y. Given a collection of k words y_1,y_2,...,y_k over an alphabet Σ, we are asked to compute the set M^ℓ_y_1#...#y_k of minimal absent words of length at most ℓ of word y=y_1#y_2#...#y_k, #∉Σ. In data compression, this corresponds to computing the antidictionary of k documents. In bioinformatics, it corresponds to computing words that are absent from a genome of k chromosomes. This computation generally requires Ω(n) space for n=y using any of the plenty available O(n)time algorithms. This is because an Ω(n)sized text index is constructed over y which can be impractical for large n. We do the identical computation incrementally using outputsensitive space. This goal is reasonable when M^ℓ_y_1#...#y_N=o(n), for all N∈[1,k]. For instance, in the human genome, n ≈ 3× 10^9 but M^12_y_1#...#y_k ≈ 10^6. We consider a constantsized alphabet for stating our results. We show that all M^ℓ_y_1,...,M^ℓ_y_1#...#y_k can be computed in O(kn+∑^k_N=1M^ℓ_y_1#...#y_N) total time using O(MaxIn+MaxOut) space, where MaxIn is the length of the longest word in {y_1,...,y_k} and MaxOut={M^ℓ_y_1#...#y_N:N∈[1,k]}. Proofofconcept experimental results are also provided confirming our theoretical findings and justifying our contribution.
Item Type: 
Conference or Workshop Item (Paper) 

Identification Number (DOI): 

Keywords: 
Bioinformatics, Gold, Data structures, Data compression, Genomics, Biological cells, Indexes; absent words, antidictionaries, string algorithms, output sensitive algorithms, data compression. 

Related URLs: 


Departments, Centres and Research Units: 

Dates: 


Event Location: 
Snowbird, Utah, United States 

Date range: 
2629 March 2019 

Item ID: 
25916 

Date Deposited: 
28 Feb 2019 14:57 

Last Modified: 
16 Oct 2020 17:52 

URI: 
View statistics for this item...
Edit Record (login required) 