Triclustering Toolbox

Dmitrii Egurnov

(0000-0002-8195-1670) [email protected] and

Dmitry I. Ignatov

(0000-0002-6584-8534) [email protected]

National Research University Higher School of Economics, Moscow, Russia

Abstract. Triclustring Toolbox is a collection of triclustering methods

consolidated into a single interface. It provides access to both box- and

prime-based OAC (Object-Attribute-Condition) triclustering, Spectral

triclustering and features implementations of DataPeeler and Trias. The

application also contains algorithms for mining triclusters of similar val-

ues: NOAC and Tri-K-Means. Quality of triclusters is measured in terms

of density, diversity, coverage, and variance, if applicable. Formats for

input and output data of all the methods are universal, which makes

comparison and interpretation of the results easier. The code is written

in C# (.Net 4.5) and runs on Windows. Triclustring Toolbox was used

to provide experimental results in several articles on triclustering.

Keywords: Triadic Formal Concept Analysis · Triclustering · OAC-

triclustering · Real-valued context · Software

1 Introduction

Triadic Formal Concept Analysis (3-FCA) was introduced by Lehman and

Wille [7] and is aimed at analysis of object-attribute-condition relational data.

However, in some cases its strict requirements may be relaxed, that is we could

search for less dense structures called triclusters instead of triconcepts [6, 4].

Such patterns found to be useful in several domains, among those mining com-

munities in folksonomies and (multimodal) social networks [6, 5] and semantic

frame induction in computational linguistics [9].

The remainder of the paper consists of three sections: Section 2 brieﬂy intro-

duces basic notions, Section 3 describes our Triclustering Toolbox, and Section 4

concludes the paper with future prospects of triclustering software development.

2 Triadic Contexts and Triclusters

Suppose we have a Triadic Formal Context. It has 3 dimensions, or modalities:

objects, attributes and conditions.

 2019 for this paper by its authors. Copying permitted for private and

academic purposes.

2 D. Egurnov and D.I. Ignatov

Deﬁnition 1. Let G, M and B be arbitrary sets. Subset of their Cartesian

product deﬁnes a triadic relation I ⊆ G×M ×B. The quadruple K = (G, M, B, I)

is called a triadic formal context, or tricontext. The sets G, M and B are called

set of objects, set of attributes, and set of conditions, respectively.

For each triple (g, m, b) ⊆ I, where g ∈ G, m ∈ M, and b ∈ B, it is said that

“object g has attribute m under condition b”. In case of numeric context we add

a value function: V : I → R.

Now we deﬁne a tricluster in its most general form:

Deﬁnition 2. Let A

n×k×l

be a three-dimensional binary matrix. Let sets G =

, g

, . . . , g

}, M = {m

, m

, . . . , m

}, and B = {b

, b

, . . . , b

} be index sets

of A. Then for some arbitrary sets X ⊆ G, Y ⊆ M and Z ⊆ B submatrix

XY Z

= {a

xy z

| x ∈ X, y ∈ Y, z ∈ Z} is called a tricluster. Sets X, Y and Z are

called respectively extent, intent and modus of the tricluster.

Various constraints may be applied to triclusters. Usually they feature struc-

ture requirements, cardinality restrictions on extent, intent and modus, and lim-

itations of other parameters. For example, the most common conditions, which

eliminate small and meaningless structures from the output, are minimal support

condition (|X| ≥ s

, |Y | ≥ s

, |Z| ≥ s

) and minimum density threshold:

ρ(A

XY Z

) =

|X|

i=1

|Y |

j=1

|Z|

k=1

|X||Y ||Z|

≥ ρ

min

In case of numeric triclusters we can also measure variance of the values of

triples in the tricluster. Lower variance means that values are more similar. If

we consider each tricluster T = (X, Y, Z) as an independent sample S(T ) =

{V (g, m, b) | (g, m, b) ∈ I ∩ X × Y × Z}, then unbiased estimate of the variance

is:

(S) =

s∈S



−



|S| − 1

3 Triclustering Toolbox description

Triclustring Toolbox is a Windows Forms UI application. It was programmed in

C# using .Net Framework 4.5. The program takes an input ﬁle that contains a

triadic context as set of triples. Each triple is described in a separate line with

its tab-separated components, which are members of the context’s respective

modalities. For numerical tricontexts lines also contain the triple’s value. The

path to the ﬁle should be speciﬁed in the ”Context ﬁle” ﬁeld of the interface.

Other parameters are:

– Output folder, where the results and log ﬁles of the processing will be placed.

– Limited context uploading. If only several ﬁrst triples of the context should

be processed, the number of triples should be speciﬁed.

– Selection of triclustering method:

Triclustering Toolbox 3

• OAC-triclustering [3]

• Spectral triclustering [3]

• Box triclustering [8]

• DataPeeler [1]

• Trias [6]

• NOAC [2]

• Tri-K-Means [2]

– Algorithm-speciﬁc options and additional constraints on the output.

When all of the necessary options are set, user can press the “Start” button.

For example, let us look at speciﬁc options for numerical triclustering algo-

rithms. For NOAC it is the parameter δ, which is set to 0 by default. It also

supports minimal density thresholds for extent, intent, and modus. Tri-K-Means

requires the number of clusters k and parameter γ, which deﬁnes degree of ex-

tending the tricluster over closeness of its values. User can also set upper bound

for the number of steps the algorithm takes before terminating and the manner

of initialization for centroids, which can be random or predeﬁned. The options

are delimited by commas, decimals are separated with a dot. In case several

experiments are planned, the sets of options are separated by semicolon.

The program writes two ﬁles in the output folder. The ﬁrst one contains a list

of the extracted triclusters along with calculated measures. The ﬁrst string of the

ﬁle contains a header, explaining the order of values: Density, Variance, Average

Coverage, Objects coverage, Attributes coverage, Conditions coverage, Extent,

Intent, and Modus. Then, in separate lines, the triclusters follow in the format

described in the header. Name of the ﬁle is composed automatically, using names

of the method, input ﬁle and the set of options. The second ﬁle is a log ﬁle. If it

already exists in the folder, it will be appended by new execution information.

Each line of the ﬁle corresponds to a separate experiment and contains the set

of options, execution time, number of found triclusters, total coverage of the

context by the tricluster set, as well as coverage of extent, intent and modus.

Figure 1 shows an example of Triclustering Toolbox application interface

with all described controls.

4 Future work

With recent development of the C# programming language the application may

be easily transferred to a cross-platform framework, namely .Net Core 2.0, but

only as a console application. UI options are announced in upcoming .Net Core

3.0 release. We would also like to support more recent triclustering methods and

provide extension possibilities for other developers.

Another possible important direction is using distributed and parallel com-

puting [10].

Acknowledgements We would like to thank Dmitry V. Gnatyshak for his

valuable help with the initial development of the toolbox and Engelbert Mephu

4 D. Egurnov and D.I. Ignatov

Fig. 1. Triclustering Toolbox User Interface

Nguifo for the inspiration. The work of Dmitry I. Ignatov (contributed to all

the sections) was supported by the Russian Science Foundation under grant

17-11-01294 and performed at National Research University Higher School of

Economics, Russia.

References

1. Cerf, L., Besson, J., Robardet, C., Boulicaut, J.F.: Data peeler: Contraint-based

closed pattern mining in n-ary relations. In: SDM (2008)

2. Egurnov, D., Ignatov, D.I., Nguifo, E.M.: Mining triclusters of similar values in

triadic real-valued contexts. In: 14th International Conference on Formal Concept

Analysis - Supplementary Proceedings. pp. 31–47 (2017)

3. Ignatov, D.I., Gnatyshak, D.V., Kuznetsov, S.O., Mirkin, B.G.: Triadic for-

mal concept analysis and triclustering: searching for optimal patterns. Machine

Learning 101(1-3), 271–302 (2015). https://doi.org/10.1007/s10994-015-5487-y,

http://dx.doi.org/10.1007/s10994-015-5487-y

4. Ignatov, D.I., Kuznetsov, S.O., Magizov, R.A., Zhukov, L.E.: From triconcepts

to triclusters. In: Rough Sets, Fuzzy Sets, Data Mining and Granular Computing

- 13th International Conference, RSFDGrC 2011, Moscow, Russia, June 25-27,

2011. Proceedings. pp. 257–264 (2011). https://doi.org/10.1007/978-3-642-21881-

1 41, http://dx.doi.org/10.1007/978-3-642-21881-1 41

5. Ignatov, D.I., Semenov, A., Komissarova, D., Gnatyshak, D.V.: Multimodal clus-

tering for community detection. In: Missaoui, R., Kuznetsov, S.O., Obiedkov, S.A.

(eds.) Formal Concept Analysis of Social Networks, pp. 59–96. Lecture Notes

in Social Networks, Springer (2017). https://doi.org/10.1007/978-3-319-64167-6 4,

https://doi.org/10.1007/978-3-319-64167-6 4

6. J¨aschke, R., Hotho, A., Schmitz, C., Ganter, B., Stumme, G.: TRIAS - an al-

gorithm for mining iceberg tri-lattices. In: Proceedings of the 6th IEEE In-

ternational Conference on Data Mining (ICDM 2006), 18-22 December 2006,

Triclustering Toolbox 5

Hong Kong, China. pp. 907–911 (2006). https://doi.org/10.1109/ICDM.2006.162,

http://dx.doi.org/10.1109/ICDM.2006.162

7. Lehmann, F., Wille, R.: A triadic approach to formal concept analysis. In: Con-

ceptual Structures: Applications, Implementation and Theory, Third International

Conference on Conceptual Structures, ICCS ’95, Santa Cruz, California, USA,

August 14-18, 1995, Proceedings. pp. 32–43 (1995). https://doi.org/10.1007/3-540-

60161-9 27, http://dx.doi.org/10.1007/3-540-60161-9 27

8. Mirkin, B.G., Kramarenko, A.V.: Approximate bicluster and tricluster boxes

in the analysis of binary data. In: Kuznetsov, S.O., Slezak, D., Hepting,

D.H., Mirkin, B.G. (eds.) Rough Sets, Fuzzy Sets, Data Mining and Gran-

ular Computing - 13th International Conference, RSFDGrC 2011, Moscow,

Russia, June 25-27, 2011. Proceedings. Lecture Notes in Computer Science,

vol. 6743, pp. 248–256. Springer (2011). https://doi.org/10.1007/978-3-642-21881-

1 40, https://doi.org/10.1007/978-3-642-21881-1 40

9. Ustalov, D., Panchenko, A., Kutuzov, A., Biemann, C., Ponzetto, S.P.: Unsu-

pervised semantic frame induction using triclustering. In: Gurevych, I., Miyao,

Y. (eds.) Proceedings of the 56th Annual Meeting of the Association for Com-

putational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Vol-

ume 2: Short Papers. pp. 55–62. Association for Computational Linguistics (2018),

https://aclanthology.info/papers/P18-2010/p18-2010

10. Zudin, S., Gnatyshak, D.V., Ignatov, D.I.: Putting oac-triclustering on mapreduce.

In: Yahia, S.B., Konecny, J. (eds.) Proceedings of the Twelfth International Con-

ference on Concept Lattices and Their Applications, Clermont-Ferrand, France,

Octob er 13-16, 2015. CEUR Workshop Proceedings, vol. 1466, pp. 47–58. CEUR-

WS.org (2015), http://ceur-ws.org/Vol-1466/paper04.pdf