xref: /aosp_15_r20/external/toolchain-utils/crosperf/machine_image_manager.py (revision 760c253c1ed00ce9abd48f8546f08516e57485fe)
1*760c253cSXin Li# -*- coding: utf-8 -*-
2*760c253cSXin Li# Copyright 2015 The ChromiumOS Authors
3*760c253cSXin Li# Use of this source code is governed by a BSD-style license that can be
4*760c253cSXin Li# found in the LICENSE file.
5*760c253cSXin Li
6*760c253cSXin Li"""MachineImageManager allocates images to duts."""
7*760c253cSXin Li
8*760c253cSXin Li
9*760c253cSXin Liimport functools
10*760c253cSXin Li
11*760c253cSXin Li
12*760c253cSXin Liclass MachineImageManager(object):
13*760c253cSXin Li    """Management of allocating images to duts.
14*760c253cSXin Li
15*760c253cSXin Li    * Data structure we have -
16*760c253cSXin Li
17*760c253cSXin Li      duts_ - list of duts, for each duts, we assume the following 2 properties
18*760c253cSXin Li      exist - label_ (the current label the duts_ carries or None, if it has an
19*760c253cSXin Li      alien image) and name (a string)
20*760c253cSXin Li
21*760c253cSXin Li      labels_ - a list of labels, for each label, we assume these properties
22*760c253cSXin Li      exist - remote (a set/vector/list of dut names (not dut object), to each
23*760c253cSXin Li      of which this image is compatible), remote could be none, which means
24*760c253cSXin Li      universal compatible.
25*760c253cSXin Li
26*760c253cSXin Li      label_duts_ - for each label, we maintain a list of duts, onto which the
27*760c253cSXin Li      label is imaged. Note this is an array of lists. Each element of each list
28*760c253cSXin Li      is an integer which is dut oridnal. We access this array using label
29*760c253cSXin Li      ordinal.
30*760c253cSXin Li
31*760c253cSXin Li      allocate_log_ - a list of allocation record. For example, if we allocate
32*760c253cSXin Li      l1 to d1, then l2 to d2, then allocate_log_ would be [(1, 1), (2, 2)].
33*760c253cSXin Li      This is used for debug/log, etc. All tuples in the list are integer pairs
34*760c253cSXin Li      (label_ordinal, dut_ordinal).
35*760c253cSXin Li
36*760c253cSXin Li      n_duts_ - number of duts.
37*760c253cSXin Li
38*760c253cSXin Li      n_labels_ - number of labels.
39*760c253cSXin Li
40*760c253cSXin Li      dut_name_ordinal_ - mapping from dut name (a string) to an integer,
41*760c253cSXin Li      starting from 0. So that duts_[dut_name_ordinal_[a_dut.name]]= a_dut.
42*760c253cSXin Li
43*760c253cSXin Li    * Problem abstraction -
44*760c253cSXin Li
45*760c253cSXin Li      Assume we have the following matrix - label X machine (row X col). A 'X'
46*760c253cSXin Li      in (i, j) in the matrix means machine and lable is not compatible, or that
47*760c253cSXin Li      we cannot image li to Mj.
48*760c253cSXin Li
49*760c253cSXin Li           M1   M2   M3
50*760c253cSXin Li      L1    X
51*760c253cSXin Li
52*760c253cSXin Li      L2             X
53*760c253cSXin Li
54*760c253cSXin Li      L3    X    X
55*760c253cSXin Li
56*760c253cSXin Li      Now that we'll try to find a way to fill Ys in the matrix so that -
57*760c253cSXin Li
58*760c253cSXin Li        a) - each row at least get a Y, this ensures that each label get imaged
59*760c253cSXin Li        at least once, an apparent prerequiste.
60*760c253cSXin Li
61*760c253cSXin Li        b) - each column get at most N Ys. This make sure we can successfully
62*760c253cSXin Li        finish all tests by re-image each machine at most N times. That being
63*760c253cSXin Li        said, we could *OPTIONALLY* reimage some machines more than N times to
64*760c253cSXin Li        *accelerate* the test speed.
65*760c253cSXin Li
66*760c253cSXin Li      How to choose initial N for b) -
67*760c253cSXin Li      If number of duts (nd) is equal to or more than that of labels (nl), we
68*760c253cSXin Li      start from N == 1. Else we start from N = nl - nd + 1.
69*760c253cSXin Li
70*760c253cSXin Li      We will begin the search with pre-defined N, if we fail to find such a
71*760c253cSXin Li      solution for such N, we increase N by 1 and continue the search till we
72*760c253cSXin Li      get N == nl, at this case we fails.
73*760c253cSXin Li
74*760c253cSXin Li      Such a solution ensures minimal number of reimages.
75*760c253cSXin Li
76*760c253cSXin Li    * Solution representation
77*760c253cSXin Li
78*760c253cSXin Li      The solution will be placed inside the matrix, like below
79*760c253cSXin Li
80*760c253cSXin Li           M1   M2   M3   M4
81*760c253cSXin Li      L1    X    X    Y
82*760c253cSXin Li
83*760c253cSXin Li      L2    Y         X
84*760c253cSXin Li
85*760c253cSXin Li      L3    X    Y    X
86*760c253cSXin Li
87*760c253cSXin Li    * Allocation algorithm
88*760c253cSXin Li
89*760c253cSXin Li      When Mj asks for a image, we check column j, pick the first cell that
90*760c253cSXin Li      contains a 'Y', and mark the cell '_'. If no such 'Y' exists (like M4 in
91*760c253cSXin Li      the above solution matrix), we just pick an image that the minimal reimage
92*760c253cSXin Li      number.
93*760c253cSXin Li
94*760c253cSXin Li      After allocate for M3
95*760c253cSXin Li           M1   M2   M3   M4
96*760c253cSXin Li      L1    X    X    _
97*760c253cSXin Li
98*760c253cSXin Li      L2    Y         X
99*760c253cSXin Li
100*760c253cSXin Li      L3    X    Y    X
101*760c253cSXin Li
102*760c253cSXin Li      After allocate for M4
103*760c253cSXin Li           M1   M2   M3   M4
104*760c253cSXin Li      L1    X    X    _
105*760c253cSXin Li
106*760c253cSXin Li      L2    Y         X    _
107*760c253cSXin Li
108*760c253cSXin Li      L3    X    Y    X
109*760c253cSXin Li
110*760c253cSXin Li      After allocate for M2
111*760c253cSXin Li           M1   M2   M3   M4
112*760c253cSXin Li      L1    X    X    _
113*760c253cSXin Li
114*760c253cSXin Li      L2    Y         X    _
115*760c253cSXin Li
116*760c253cSXin Li      L3    X    _    X
117*760c253cSXin Li
118*760c253cSXin Li      After allocate for M1
119*760c253cSXin Li           M1   M2   M3   M4
120*760c253cSXin Li      L1    X    X    _
121*760c253cSXin Li
122*760c253cSXin Li      L2    _         X    _
123*760c253cSXin Li
124*760c253cSXin Li      L3    X    _    X
125*760c253cSXin Li
126*760c253cSXin Li      After allocate for M2
127*760c253cSXin Li           M1   M2   M3   M4
128*760c253cSXin Li      L1    X    X    _
129*760c253cSXin Li
130*760c253cSXin Li      L2    _    _    X    _
131*760c253cSXin Li
132*760c253cSXin Li      L3    X    _    X
133*760c253cSXin Li
134*760c253cSXin Li      If we try to allocate for M1 or M2 or M3 again, we get None.
135*760c253cSXin Li
136*760c253cSXin Li    * Special / common case to handle seperately
137*760c253cSXin Li
138*760c253cSXin Li      We have only 1 dut or if we have only 1 label, that's simple enough.
139*760c253cSXin Li    """
140*760c253cSXin Li
141*760c253cSXin Li    def __init__(self, labels, duts):
142*760c253cSXin Li        self.labels_ = labels
143*760c253cSXin Li        self.duts_ = duts
144*760c253cSXin Li        self.n_labels_ = len(labels)
145*760c253cSXin Li        self.n_duts_ = len(duts)
146*760c253cSXin Li        self.dut_name_ordinal_ = dict()
147*760c253cSXin Li        for idx, dut in enumerate(self.duts_):
148*760c253cSXin Li            self.dut_name_ordinal_[dut.name] = idx
149*760c253cSXin Li
150*760c253cSXin Li        # Generate initial matrix containg 'X' or ' '.
151*760c253cSXin Li        self.matrix_ = [
152*760c253cSXin Li            ["X" if l.remote else " " for _ in range(self.n_duts_)]
153*760c253cSXin Li            for l in self.labels_
154*760c253cSXin Li        ]
155*760c253cSXin Li        for ol, l in enumerate(self.labels_):
156*760c253cSXin Li            if l.remote:
157*760c253cSXin Li                for r in l.remote:
158*760c253cSXin Li                    self.matrix_[ol][self.dut_name_ordinal_[r]] = " "
159*760c253cSXin Li
160*760c253cSXin Li        self.label_duts_ = [[] for _ in range(self.n_labels_)]
161*760c253cSXin Li        self.allocate_log_ = []
162*760c253cSXin Li
163*760c253cSXin Li    def compute_initial_allocation(self):
164*760c253cSXin Li        """Compute the initial label-dut allocation.
165*760c253cSXin Li
166*760c253cSXin Li        This method finds the most efficient way that every label gets imaged at
167*760c253cSXin Li        least once.
168*760c253cSXin Li
169*760c253cSXin Li        Returns:
170*760c253cSXin Li          False, only if not all labels could be imaged to a certain machine,
171*760c253cSXin Li          otherwise True.
172*760c253cSXin Li        """
173*760c253cSXin Li
174*760c253cSXin Li        if self.n_duts_ == 1:
175*760c253cSXin Li            for i, v in self.matrix_vertical_generator(0):
176*760c253cSXin Li                if v != "X":
177*760c253cSXin Li                    self.matrix_[i][0] = "Y"
178*760c253cSXin Li            return
179*760c253cSXin Li
180*760c253cSXin Li        if self.n_labels_ == 1:
181*760c253cSXin Li            for j, v in self.matrix_horizontal_generator(0):
182*760c253cSXin Li                if v != "X":
183*760c253cSXin Li                    self.matrix_[0][j] = "Y"
184*760c253cSXin Li            return
185*760c253cSXin Li
186*760c253cSXin Li        if self.n_duts_ >= self.n_labels_:
187*760c253cSXin Li            n = 1
188*760c253cSXin Li        else:
189*760c253cSXin Li            n = self.n_labels_ - self.n_duts_ + 1
190*760c253cSXin Li        while n <= self.n_labels_:
191*760c253cSXin Li            if self._compute_initial_allocation_internal(0, n):
192*760c253cSXin Li                break
193*760c253cSXin Li            n += 1
194*760c253cSXin Li
195*760c253cSXin Li        return n <= self.n_labels_
196*760c253cSXin Li
197*760c253cSXin Li    def _record_allocate_log(self, label_i, dut_j):
198*760c253cSXin Li        self.allocate_log_.append((label_i, dut_j))
199*760c253cSXin Li        self.label_duts_[label_i].append(dut_j)
200*760c253cSXin Li
201*760c253cSXin Li    def allocate(self, dut, schedv2=None):
202*760c253cSXin Li        """Allocate a label for dut.
203*760c253cSXin Li
204*760c253cSXin Li        Args:
205*760c253cSXin Li          dut: the dut that asks for a new image.
206*760c253cSXin Li          schedv2: the scheduling instance, we need the benchmark run
207*760c253cSXin Li                   information with schedv2 for a better allocation.
208*760c253cSXin Li
209*760c253cSXin Li        Returns:
210*760c253cSXin Li          a label to image onto the dut or None if no more available images for
211*760c253cSXin Li          the dut.
212*760c253cSXin Li        """
213*760c253cSXin Li        j = self.dut_name_ordinal_[dut.name]
214*760c253cSXin Li        # 'can_' prefix means candidate label's.
215*760c253cSXin Li        can_reimage_number = 999
216*760c253cSXin Li        can_i = 999
217*760c253cSXin Li        can_label = None
218*760c253cSXin Li        can_pending_br_num = 0
219*760c253cSXin Li        for i, v in self.matrix_vertical_generator(j):
220*760c253cSXin Li            label = self.labels_[i]
221*760c253cSXin Li
222*760c253cSXin Li            # 2 optimizations here regarding allocating label to dut.
223*760c253cSXin Li            # Note schedv2 might be None in case we do not need this
224*760c253cSXin Li            # optimization or we are in testing mode.
225*760c253cSXin Li            if schedv2 is not None:
226*760c253cSXin Li                pending_br_num = len(schedv2.get_label_map()[label])
227*760c253cSXin Li                if pending_br_num == 0:
228*760c253cSXin Li                    # (A) - we have finished all br of this label,
229*760c253cSXin Li                    # apparently, we do not want to reimaeg dut to
230*760c253cSXin Li                    # this label.
231*760c253cSXin Li                    continue
232*760c253cSXin Li            else:
233*760c253cSXin Li                # In case we do not have a schedv2 instance, mark
234*760c253cSXin Li                # pending_br_num as 0, so pending_br_num >=
235*760c253cSXin Li                # can_pending_br_num is always True.
236*760c253cSXin Li                pending_br_num = 0
237*760c253cSXin Li
238*760c253cSXin Li            # For this time being, I just comment this out until we have a
239*760c253cSXin Li            # better estimation how long each benchmarkrun takes.
240*760c253cSXin Li            # if (pending_br_num <= 5 and
241*760c253cSXin Li            #     len(self.label_duts_[i]) >= 1):
242*760c253cSXin Li            #     # (B) this is heuristic - if there are just a few test cases
243*760c253cSXin Li            #     # (say <5) left undone for this label, and there is at least
244*760c253cSXin Li            #     # 1 other machine working on this lable, we probably not want
245*760c253cSXin Li            #     # to bother to reimage this dut to help with these 5 test
246*760c253cSXin Li            #     # cases
247*760c253cSXin Li            #     continue
248*760c253cSXin Li
249*760c253cSXin Li            if v == "Y":
250*760c253cSXin Li                self.matrix_[i][j] = "_"
251*760c253cSXin Li                self._record_allocate_log(i, j)
252*760c253cSXin Li                return label
253*760c253cSXin Li            if v == " ":
254*760c253cSXin Li                label_reimage_number = len(self.label_duts_[i])
255*760c253cSXin Li                if (can_label is None) or (
256*760c253cSXin Li                    label_reimage_number < can_reimage_number
257*760c253cSXin Li                    or (
258*760c253cSXin Li                        label_reimage_number == can_reimage_number
259*760c253cSXin Li                        and pending_br_num >= can_pending_br_num
260*760c253cSXin Li                    )
261*760c253cSXin Li                ):
262*760c253cSXin Li                    can_reimage_number = label_reimage_number
263*760c253cSXin Li                    can_i = i
264*760c253cSXin Li                    can_label = label
265*760c253cSXin Li                    can_pending_br_num = pending_br_num
266*760c253cSXin Li
267*760c253cSXin Li        # All labels are marked either '_' (already taken) or 'X' (not
268*760c253cSXin Li        # compatible), so return None to notify machine thread to quit.
269*760c253cSXin Li        if can_label is None:
270*760c253cSXin Li            return None
271*760c253cSXin Li
272*760c253cSXin Li        # At this point, we don't find any 'Y' for the machine, so we go the
273*760c253cSXin Li        # 'min' approach.
274*760c253cSXin Li        self.matrix_[can_i][j] = "_"
275*760c253cSXin Li        self._record_allocate_log(can_i, j)
276*760c253cSXin Li        return can_label
277*760c253cSXin Li
278*760c253cSXin Li    def matrix_vertical_generator(self, col):
279*760c253cSXin Li        """Iterate matrix vertically at column 'col'.
280*760c253cSXin Li
281*760c253cSXin Li        Yield row number i and value at matrix_[i][col].
282*760c253cSXin Li        """
283*760c253cSXin Li        for i, _ in enumerate(self.labels_):
284*760c253cSXin Li            yield i, self.matrix_[i][col]
285*760c253cSXin Li
286*760c253cSXin Li    def matrix_horizontal_generator(self, row):
287*760c253cSXin Li        """Iterate matrix horizontally at row 'row'.
288*760c253cSXin Li
289*760c253cSXin Li        Yield col number j and value at matrix_[row][j].
290*760c253cSXin Li        """
291*760c253cSXin Li        for j, _ in enumerate(self.duts_):
292*760c253cSXin Li            yield j, self.matrix_[row][j]
293*760c253cSXin Li
294*760c253cSXin Li    def _compute_initial_allocation_internal(self, level, N):
295*760c253cSXin Li        """Search matrix for d with N."""
296*760c253cSXin Li
297*760c253cSXin Li        if level == self.n_labels_:
298*760c253cSXin Li            return True
299*760c253cSXin Li
300*760c253cSXin Li        for j, v in self.matrix_horizontal_generator(level):
301*760c253cSXin Li            if v == " ":
302*760c253cSXin Li                # Before we put a 'Y', we check how many Y column 'j' has.
303*760c253cSXin Li                # Note y[0] is row idx, y[1] is the cell value.
304*760c253cSXin Li                ny = functools.reduce(
305*760c253cSXin Li                    lambda x, y: x + 1 if (y[1] == "Y") else x,
306*760c253cSXin Li                    self.matrix_vertical_generator(j),
307*760c253cSXin Li                    0,
308*760c253cSXin Li                )
309*760c253cSXin Li                if ny < N:
310*760c253cSXin Li                    self.matrix_[level][j] = "Y"
311*760c253cSXin Li                    if self._compute_initial_allocation_internal(level + 1, N):
312*760c253cSXin Li                        return True
313*760c253cSXin Li                    self.matrix_[level][j] = " "
314*760c253cSXin Li
315*760c253cSXin Li        return False
316