HW1 : Regular Expressions and Finite Automata

CSE105Sp23

Due: April 11th at 5pm (no penalty late submission until 8am next morning), via Gradescope

In this assignment,

You will practice reading and applying the definitions of alphabets, strings, languages, Kleene star, and regular expressions. You will use regular expressions and relate them to languages and finite automata. You will use precise notation to formally define the state diagram of finite automata, and you will use clear English to describe computations of finite automata informally.

Resources: To review the topics for this assignment, see the class material from Week 1. We will post frequently asked questions and our answers to them in a pinned Piazza post.

Reading and extra practice problems: Sipser Section 0, 1.3, 1.1. Chapter 1 exercises 1.1, 1.2, 1.3, 1.18, 1.23.

For all HW assignments: Weekly homework may be done individually or in groups of up to 3 students. You may switch HW partners for different HW assignments. The lowest HW score will not be included in your overall HW average. Please ensure your name(s) and PID(s) are clearly visible on the first page of your homework submission and then upload the PDF to Gradescope. If working in a group, submit only one submission per group: one partner uploads the submission through their Gradescope account and then adds the other group member(s) to the Gradescope submission by selecting their name(s) in the “Add Group Members" dialog box. You will need to re-add your group member(s) every time you resubmit a new version of your assignment. Each homework question will be graded either for correctness (including clear and precise explanations and justifications of all answers) or fair effort completeness. You may only collaborate on HW with CSE 105 students in your group; if your group has questions about a HW problem, you may ask in drop-in help hours or post a private post (visible only to the Instructors) on Piazza.

All submitted homework for this class must be typed. You can use a word processing editor if you like (Microsoft Word, Open Office, Notepad, Vim, Google Docs, etc.) but you might find it useful to take this opportunity to learn LaTeX. LaTeX is a markup language used widely in computer science and mathematics. The homework assignments are typed using LaTeX and you can use the source files as templates for typesetting your solutions. To generate state diagrams of machines, we recommend using Flap.js or JFLAP. Photographs of clearly hand-drawn diagrams may also be used. We recommend that you submit early drafts to Gradescope so that in case of any technical difficulties, at least some of your work is present. You may update your submission as many times as you’d like up to the deadline.

Integrity reminders

You will submit this assignment via Gradescope (https://www.gradescope.com) in the assignment called “hw1CSE105Sp23”.

Assigned questions

  1. Functions over sets of strings (17 points):
    For this question, fix the alphabets \(\Sigma = \{0,1\}\) and \(\Gamma = \{0,1,2\}\).

    Whenever \(K\) is a set of strings over \(\Gamma\) and \(L\) is a set of strings over \(\Sigma\), we can use the following rules to define associated sets of strings: \[\begin{aligned} \textsc{Substring}(K) &:= \{ w \in \Gamma^* \mid \text{there exist } a,b \in \Gamma^* \text{ such that } awb \in K\} \\ \textsc{Rep}(L) &:= \{ w \in \Gamma^* \mid \text{between every pair of successive $2$s in $w$ is a string in $L$}\}\\ &\phantom{:}=\{w \in \Gamma^* \mid \text{for all } v \in \Sigma^* \text{ if } 2v2 \in \textsc{Substring}(\{w\}) \text{, then } v \in L\} \end{aligned}\] Note: Formally, \(\textsc{Substring}\) and \(\textsc{Rep}\) are functions whose domains and codomains are specified as \[\textsc{Substring}: \mathcal{P}(\Gamma^*) \to \mathcal{P}(\Gamma^*)\] and \[\textsc{Rep}: \mathcal{P}(\Sigma^*) \to \mathcal{P}(\Gamma^*)\] In other words, \(\textsc{Substring}\) maps sets of strings with characters \(\{0,1,2\}\) to associated sets of strings with characters \(\{0,1,2\}\); and \(\textsc{Rep}\) maps sets of strings with characters in \(\{0,1\}\) to associated sets of strings with characters in \(\{0,1,2\}\).

    1. (Graded for correctness) 1 Consider \(w = 0120\) (which is a string in \(\Gamma^*\)). List every element of the set \(\textsc{Substring}(\{w\})\). In other words, fill in the blank \[\textsc{Substring}(\{w\}) = \{ \underline{\phantom{\hspace{3in}}} \}\] Briefly justify your answer by referring back to the relevant definitions.

      Not graded, but good to think about: Why do we need the curly braces—“\(\{\)” and “\(\}\)”—around \(w\) for the input to \(\textsc{Substring}\)?

    2. (Graded for correctness) Specify an example language \(A\) over \(\Gamma\) such that \(A \neq \Gamma^*\) and yet \(\textsc{Substring}(A) = \Gamma^*\), or explain why there is no such example. A complete solution will include either (1) a precise and clear description of your example language \(A\) and a precise and clear description of the result of computing \(\textsc{Substring}(A)\) using relevant definitions to justify this description and to justify the set equality with \(\Gamma^*\), or (2) a sufficiently general and correct argument why there is no such example, referring back to the relevant definitions.

    3. (Graded for completeness) 2 Define the language \(B\) to be the language over \(\Sigma\) described by the regular expression \[\Sigma^* 1 \Sigma^*\] In plain English, we might explain that \(B\) is the set of all strings of \(0\)s and \(1\)s that contain a \(1\). Give a plain English explanation for the set of strings \(\textsc{Rep}(B)\).

    4. (Graded for correctness) Prove/disprove: For every finite language \(L\) over \(\Sigma\), \(\textsc{Rep}(L)\) is also a finite set of strings. A complete answer will either give a general argument starting with an arbitrary finite language and proving that the result of applying \(\textsc{Rep}\) is also finite, or will give a counterexample (which is a specific example of a finite language \(L\) for which applying \(\textsc{Rep}\) gives an infinite language, with justification referring back to the relevant definitions).

      Note: A finite language is a set of finitely many strings. This includes the possibility that \(L\) is the empty set!

    5. (Graded for completeness) Write a template for a regular expression that describes \(\textsc{Rep}(L)\) when \(L\) is described by a regular expression \(R\). You may use union, concatenation, Kleene star, and \(\Sigma\), \(\Gamma\), and \(R\). (We’re using the shorthand for regular expressions describing alphabets from page 64.)

  2. Deciphering regular expressions (22 points):
    For this question, let’s fix the regular expression over the alphabet \(\{0,1\}\) \[R = 0^* (1 \cup 10)^*\]

    For each choice of strings of length \(3\), \(a, b, c \in \{0,1\}^3\) we can define the regular expression: \[X_{a,b,c} = 0 (a \cup b \cup c)^*\]

    1. (Graded for completeness) Give a plain English explanation for the language described by the regular expression \(R\). This continues a theme from Problem 1—before trying to prove formal statements about a specific regular expression, it’s often good to try to translate it into a form that is more easy to reason about. Typically speaking, the shorter and more concise your plain English description is, the more useful it will be in reasoning about the language.

    2. (Graded for correctness) Suppose \(a = 000\), \(b=001\), \(c=011\) so \[X_{a,b,c} = 0 ( 000 \cup 001 \cup 011)^*\] Show that \(L(R) \not\subseteq L(X_{a,b,c})\) by giving some string in \(L(R)\) which is not in \(L(X_{a,b,c})\), and justifying this choice referring back to relevant definitions.

    3. (Graded for correctness) More generally, prove that \[L(R) \not\subseteq L(X_{a,b,c})\] for all possible strings \(a, b, c \in \{0,1\}^3\). Hint: What are the possible lengths of strings in \(L(R)\) (and why does this help)?

    4. (Graded for correctness) Give a specific example of three distinct strings \(a, b, c \in \{0,1,2\}^3\) such that \[L(X_{a,b,c}) \subseteq L(R)\] Briefly justify your answer by explaining how an arbitrary element of \(L(X_{a,b,c})\) is guaranteed to be an element of \(L(R)\).

    5. (Graded for correctness) Give a specific example of three distinct strings \(a, b, c \in \{0,1,2\}^3\) such that \[L(X_{a,b,c}) \not\subseteq L(R)\] Briefly justify your answer by giving a counterexample string that is in \(L(X_{a,b,c})\) and is not in \(L(R)\) (and explaining why using relevant definitions).

  3. The right transition function can make or break a DFA (6 points):
    Consider the finite automaton \((Q, \Sigma, \delta, q_0, F)\) depicted below

    where \(Q = \{q_0, q_1, q_2\}\), \(\Sigma = \{0,1\}\), and \(F = \{q_0\}\).

    1. (Graded for completeness) Find and fix the mistake in the following symbolic description of the transition function \(\delta \colon Q \times \Sigma \to Q\): for each \(j \in \{0,1\}\) \[\delta(q_0, j) = q_j \hspace{2cm} \delta(q_1, j) = q_{1-j} \hspace{2cm} \delta(q_2, j) = q_{1+j}\]

    2. (Graded for correctness) Keeping the same set of states \(Q = \{q_0, q_1, q_2\}\), alphabet \(\Sigma = \{0,1\}\), starting state \(q_0\), and set of accepting states \(F = \{q_0\}\), change the transition function \(\delta\) so that the resulting finite automaton recognizes the language described by the regular expression \[0^* \cup \Sigma^* 1000^*\] Briefly justify why the resulting finite automaton works by describing the role of each state with your new transition function and relating it to a plain English description of the language described by the regular expression.

      Note: with regular expressions \(*\) binds more tightly than concatenation so \(1000^* = (100)(0^*)\).

    (Challenge question, not graded) There is a beautiful plain English description of the language recognized by the finite automaton with the state diagram depicted at the start of Problem 3. What is it?

  4. Being precise with terminology (5 points):
    For each of the following statements, determine if it is true, false, or if the question doesn’t even make sense (because the statement isn’t well formed or doesn’t use terms in ways consistent with definitions from class).

    1. (Graded for completeness) The empty string is in every language.

    2. (Graded for completeness) \(\Sigma^*\) is a language.

    3. (Graded for completeness) Every language is a regular expression.

    4. (Graded for completeness) Alphabets are infinite.

    5. (Graded for completeness) There is a (finite) number \(k \in \mathcal N\) such that every DFA has fewer than \(k\) states.


  1. This means your solution will be evaluated not only on the correctness of your answers, but on your ability to present your ideas clearly and logically. You should explain how you arrived at your conclusions, using mathematically sound reasoning. Whether you use formal proof techniques or write a more informal argument for why something is true, your answers should always be well-supported. Your goal should be to convince the reader that your results and methods are sound.↩︎

  2. This means you will get full credit so long as your submission demonstrates honest effort to answer the question. You will not be penalized for incorrect answers. To demonstrate your honest effort in answering the question, we ask that you include your attempt to answer *each* part of the question. If you get stuck with your attempt, you can still demonstrate your effort by explaining where you got stuck and what you did to try to get unstuck.↩︎