I am a mostly self taught programmer. As such I am not trained in Classical Computer sciense. For a long time I did not consirerd myself a "Real" programmer, so when ever I heard statements about how programing should be conducted that I didn't agree with I assumed I was wrong and that "real programmers" knew better. As I gained experience and started amassing a large body of code (Much of it available as open source if you are courious), I started to wonder why I was (In my view) able to be so much more productive dispite not following well established computer science philosopy.
I started noticing that the problems that real software development where encountering when not solved by computer sciense, and often even made worse. I started noticing that many sucsessful projects where critizied for bad practice by leading computer science advocates. I also found that many of the people whos work I admired would, after a few drinks, admit they didnt realy gain much value from following many of the generaly accepterd principles of devlopment. Eventiualy I accepted, that I think that much of the way computer sciense thinks about software development is wrong. I simply found that the ideals promoted by computer sciense, where irrelevant or often counter productive to development of software.
This document is the result of me trying to distil my experience in to principles I call "Explicit programming" that I think are useful for the practice of developing software.
The most common counter to my ideas, is that developing code the way I do, is simply not feasable from a productivity standpoint. I think my output as a developer diproves that. My body of code, is larger then almost any I know of produced by a single developer. The number of applications, libraries and algorithms I have produces rivals many substantial teams. If using high level languages, cutting corners, relying on dependencies and commonly touted programing paradigms did produces orders of magnitude more productivity, there would be plenty of people who had outputs far more substantial then mine, but I havent found any.
If you are a software developer, there are probably things in this document that go against what you have been taught and what you are comfortable with. Luckaly, you dont have to agree witt this document to gain value from it. The goal is to make you think about what we consider to be good software development practice, and to do that its not neceserafor you to agree with my principles. We learn from thouse we disagree with, so I dont mind if you do. In fact, Im counting on it, beacusse I need to grow and learn too. Software development should be evolving. I am evolving as a software developer, and will likely disagree, or want to modify or add things to this text in the future as I gain experience and insight. This is my tomb of knowledge at a moment in time.
Many professional occupations like Chef, air dispatcher, doctor, soldier, firefighter, and fishermen take great pride in their workmanship and disiplin. They have a culture of exelence. Everyone is expected to do things right and shortcuts arent tolerated. The tone in these professions can be rough epecialy towarsds beginners but its because the outcome of a failiure can be disasterous. You are expected to pay your dues, and prove your worth.
Programming doesnt share this culture. Much of our culture is focused, on avoiding work, by relying on the work of others, "hacking things together", and doing things with the fewest lines of code possible. We have a culture of constantly seaching for a new language, paradigm, or library that will act as a panacea to all our ills rather then putting in the effort to solve problems. We are like a group of cheffs where the majority cant be botherd to shapening our knifes, keeping our worstation clean and pick out the best ingredients, and instead advocate buying premade meals and stick them in the micro, because its easier and faster.
This very lax atitude means that the range of skills in software developers is so great. There are plenty of developers who output just a 10th of an average developers, but there are also a fair few developers that produce 10times or more then the average developer. The difference isnt always imidietly clear. Sometimes a micoroed frozen pizza can be misstaken for greatness. Explicit programming is trying to distill what makes a programmer great and why they can be orders of magnitude more producteive then their peers. For anyone who wants to be that kind of programmer, and is willing to put in th effort, This is for you. Its part programing advice, and part life wisdom.
Explicit programming, values real world perfomance, reliability, productivity, deep knowledge, and work ethic over theories, ease of use and paradigms.
The name derives form the explicit nature of the code produced, Its code that explicitly says way it does. It doesnt try to stay within a style or paradigm, but derives its style from the practical requirements of the task at hand.
Imagine you have a problem, and the optimal set of computer instructions to solve that problem. The computer does not do anything surpurfelous. It just solves the problem. Explicit programming is a school of programming that favors writhing code that explicitly describes these instructions and little else. The style is guided by the problem rather than a paradigm. Explicit programmers tries to think like a computer, because that what we are programming. Explicit programmings valuses practical solutions over abstract paradigms and abstractions.
They can be applied to any language (Although it favours low level progarmming languages), or problem. As such Explicit programming is paradigm agnostic, but it rejects the idea that a paradigm can be a panacea that should be applied to all development. A hammer is not better or worse than a screwdriver, they are just different tools.
Everything is memory and instructions, this is the only programing paradigm that is true, because that is the hardware architecture that we curently have. Functional programming, objectoriented programming, declarative programming, Constraint based programming, event driven programing, or any other paradigm may be useful as a tool or thought experiment, but none oftem makes programing easy. None of them are fit for all purposes. Know them, and use them when needed, but dont be seduced by them and dont shohorn a problem to fit them. Every time you chose a high level paradigm, you run the risk of obfuscating what your program really is: Instructions modifying Memory.
Code should only do what you explicitly ask it to do. When you read code, it should be clear when things happen and why. This may seem obvious but many programmers spend a lot of time writing code that either tries to hide what it does, or is there to manage other code. Explicit tries to only do what needs to be done, and do so in a transparent way. If something happens you can see it happening in the code.
If your code is full of handlers, controllers and managers that are there to manage other code, you are most likely wasting your time. Solve the problem in front of you, not some imaginary future code. Managers force each components to conform to their rules, code bases should instead conform to modules. Each model should be designed using the right approach for its particular task. This means you can re use modules in other designs, rather the have to adopt a large system.
Every hour you use a design you know is wrong you embedded that bad design deeper. Drop everything and fix it now. It may be a lot of work, but it will be more work for everyday you ignore it. Dont ever think "Ill go back and fix it later". Fixing it later is more work then fixing it now. I advocate rewriting rather then fixing.
In many ways its trying to define what good software development is. Its trying to distill the ethos that makes the very best programmers, the programmers thay are.
Explicit programming for people who are drawn to programing because they like to make things, they like to tinker, take things apart to figure out how they work. Its for people who want to make every part, and are want to make things right. Its for people who who are practical rather then theoretical. Its not for people who just want to get things over and done with, or make money, or impress others.
These are basic traits in all the best programmers (and engineers) I know. It can be used by single developers,teams, or even people who manage team or organizations that develop software.
If these things don't apply to you, then Explicit programming does not apply to you. You may think that everybody wants to be good, but in my experince it is a big obsticle. People are taken in by promises of easy programming and not needing to learn or put in the effort. Deciding that you are going to be good and that there is going to be a cost assosiated with that is an important life decicion. My experience tells me that taking this decission is key to becomming successful.
You dont need an expencive computer or special software to be a good programmer. In fact I would argue that a slow simple computer and basic software can be an advantage. You dont need to go to a fancy school, or need to know anyone to become a good programmer. Having internet access can be useful, but books from a public library can do the job as well. I grew up in a house without a computer, and was only able to access one at a community center, for a few years before I got my own. I also droped out of school and was and still am dyslectic. I'm not mentioning this to say that i had it hard, quite the oposit, I'm saying it to let you know that in the grand scheme of things, these adverseries had little impact on me becoming a good programmer. What really made me a good programmer is my curiosity and my will to improve and make things.
This is the goal of this document: To make us better. That is why I would say that the only prerequisit you need to make use of it, is that you want to be a better programmer. If you want programming to be easy, you want to spend as little time as possible programming, or dont care to write particularly good software, then this is not a document for you. This is for the people who want be the best they can be and be part of writing the greatest software.
I am a mostly self taught programmer. As such I am not trained in Classical Computer science. For a long time I did not consider myself a "Real" programmer, so whenever I heard statements about how programming should be conducted that I didn't agree with I assumed I was wrong and that "real programmers" knew better. As I gained experience and started amassing a large body of code (Much of it available as open source if you are curious), I started to wonder why I was (In my view) able to be so much more productive despite not following well established computer science philosophy.
I started noticing that the problems that real software development encountered when not solved by computer science, and often even made worse. I started noticing that many successful projects were criticized for bad practice by leading computer science advocates. I also found that many of the people whose work I admired would, after a few drinks, admit they didn't really gain much value from following many of the generally accepted principles of development. Eventually I accepted, that I think that much of the way computer science thinks about software development is wrong. I simply found that the ideals promoted by computer science were irrelevant or often counterproductive to development of software.
This document is the result of me trying to distill my experience into principles I call "Explicit programming" that I think are useful for the practice of developing software.
The most common counter to my ideas, is that developing code the way I do, is simply not feasible from a productivity standpoint. I think my output as a developer disproves that. My body of code is larger than almost any I know of produced by a single developer. The number of applications, libraries and algorithms I have produced rivals many substantial teams. If using high level languages, cutting corners, relying on dependencies and commonly touted programming paradigms did produce orders of magnitude more productivity, there would be plenty of people who had outputs far more substantial than mine, but I haven't found any.
If you are a software developer, there are probably things in this document that go against what you have been taught and what you are comfortable with. Luckily, you don't have to agree with this document to gain value from it. The goal is to make you think about what we consider to be good software development practice, and to do that it's not necessary for you to agree with my principles. We learn from those we disagree with, so I don't mind if you do. In fact, I'm counting on it, because I need to grow and learn too. Software development should be evolving. I am evolving as a software developer, and will likely disagree, or want to modify or add things to this text in the future as I gain experience and insight. This is my tomb of knowledge at a moment in time.
Many professional occupations like Chef, air dispatcher, doctor, soldier, firefighter, and fishermen take great pride in their workmanship and discipline. They have a culture of excellence. Everyone is expected to do things right and shortcuts aren't tolerated. The tone in these professions can be rough especially towards beginners but its because the outcome of a failure can be disastrous. You are expected to pay your dues, and prove your worth.
Programming doesn't share this culture. Much of our culture is focused, on avoiding work, by relying on the work of others, "hacking things together", and doing things with the fewest lines of code possible. We have a culture of constantly searching for a new language, paradigm, or library that will act as a panacea to all our ills rather than putting in the effort to solve problems. We are like a group of chefs where the majority can't be bothered to sharpen our knives, keep our workstation clean and pick out the best ingredients, and instead advocate buying pre-made meals and stick them in the micro, because it's easier and faster.
This very lax attitude means that the range of skills in software developers is so great. There are plenty of developers who output just a 10th of an average developers, but there are also a fair few developers that produce 10 times or more then the average developer. The difference isnt always immediately clear. Sometimes a microwave frozen pizza can be mistaken for greatness. Explicit programming is trying to distill what makes a programmer great and why they can be orders of magnitude more productive than their peers. For anyone who wants to be that kind of programmer, and is willing to put in the effort, This is for you. It's part programing advice, and part life wisdom.
Explicit programming, values real world performance, reliability, productivity, deep knowledge, and work ethic over theories, ease of use and paradigms.
The name derives from the explicit nature of the code produced, Its code that explicitly says the way it does. It doesn't try to stay within a style or paradigm, but derives its style from the practical requirements of the task at hand.
Imagine you have a problem, and the optimal set of computer instructions to solve that problem. The computer does not do anything superfluous. It just solves the problem. Explicit programming is a school of programming that favors writhing code that explicitly describes these instructions and little else. The style is guided by the problem rather than a paradigm. Explicit programmers tries to think like a computer, because that what we are programming. Explicit programming values practical solutions over abstract paradigms and abstractions.
They can be applied to any language (Although it favors low level programming languages), or problems. As such Explicit programming is paradigm agnostic, but it rejects the idea that a paradigm can be a panacea that should be applied to all development. A hammer is not better or worse than a screwdriver, they are just different tools.
Everything is memory and instructions, this is the only programming paradigm that is true, because that is the hardware architecture that we currently have. Functional programming, object oriented programming, declarative programming, Constraint based programming, event driven programing, or any other paradigm may be useful as a tool or thought experiment, but none of them makes programming easy. None of them are fit for all purposes. Know them, and use them when needed, but don't be seduced by them and don't shohorn a problem to fit them. Every time you choose a high level paradigm, you run the risk of obfuscating what your program really is: Instructions modifying Memory.
Code should only do what you explicitly ask it to do. When you read code, it should be clear when things happen and why. This may seem obvious but many programmers spend a lot of time writing code that either tries to hide what it does, or is there to manage other code. Explicit tries to only do what needs to be done, and do so in a transparent way. If something happens you can see it happening in the code.
If your code is full of handlers, controllers and managers that are there to manage other code, you are most likely wasting your time. Solve the problem in front of you, not some imaginary future code. Managers force each component to conform to their rules, code bases should instead conform to modules. Each model should be designed using the right approach for its particular task. This means you can re-use modules in other designs, rather the have to adopt a large system.
Every hour you use a design you know is wrong you embedded that bad design deeper. Drop everything and fix it now. It may be a lot of work, but it will be more work for everyday you ignore it. Don't ever think "I'll go back and fix it later". Fixing it later is more work then fixing it now. I advocate rewriting rather than fixing.
In many ways its trying to define what good software development is. It's trying to distill the ethos that makes the very best programmers, the programmers they are.
Explicit programming for people who are drawn to programming because they like to make things, they like to tinker, take things apart to figure out how they work. It's for people who want to make every part, and want to make things right. It's for people who are practical rather than theoretical. It's not for people who just want to get things over and done with, or make money, or impress others.
These are basic traits in all the best programmers (and engineers) I know. It can be used by single developers,teams, or even people who manage teams or organizations that develop software.
If these things don't apply to you, then Explicit programming does not apply to you. You may think that everybody wants to be good, but in my experience it is a big obstacle. People are taken in by promises of easy programming and not needing to learn or put in the effort. Deciding that you are going to be good and that there is going to be a cost associated with that is an important life decision. My experience tells me that taking this decision is key to becoming successful.
You don't need an expensive computer or special software to be a good programmer. In fact I would argue that a slow simple computer and basic software can be an advantage. You don't need to go to a fancy school, or need to know anyone to become a good programmer. Having internet access can be useful, but books from a public library can do the job as well. I grew up in a house without a computer, and was only able to access one at a community center, for a few years before I got my own. I also dropped out of school and was and still am dyslexic (as im sure you will be constantly reminded when reading this). I'm not mentioning this to say that I had it hard, quite the opposite, I'm saying it to let you know that in the grand scheme of things, these adversaries had little impact on me becoming a good programmer. What really made me a good programmer is my curiosity and my will to improve and make things.
This is the goal of this document: To make us better. That is why I would say that the only prerequisite you need to make use of it, is that you want to be a better programmer. If you want programming to be easy, you want to spend as little time as possible programming, or dont care to write particularly good software, then this is not a document for you. This is for the people who want to be the best they can be and be part of writing the greatest software.
All kids want to be like the wise kungfu master. They say "That guy is awesome, he can kick anyone in the head, please kung fu master show us!", but the wise kungfu master refuses. The kids say "Please, Kunfu master, if you wont show us, then at least teach us how to kick people in the head!", and the wise kungfu master, "Yes, I will teach you how to kick people in the head, but only If we do it my way". The wise kungfu master makes the kids train hard, get up early in the morning, he makes them meditate, he makes them keep the house in perfect order. The kids say "Why aren't you just practicing kicks? Maybe the kungfu master is just tricking us into cleaning his house?" Eventually the kids do become kung fu masters who can kick people in the head. But they will find that the kungfu master did in deed trick them. He tricked them in to becoming wise. A kungfu master is wise, not because he can kick people in the head, but because he has conquered the process of doing something hard. They needed to learn respect for the craft, how to be patient, humble, relentless, and to care about the details, in order to be kungfu masters. These are all skills needed to be able master something. This have brought them wisdom, and as a wise people they will also no longer want to kick anyone in the head.
In order to be good at something, you need to not just want to be good, you need to want to engage with the steps necessary to make you good. A master is a master not just because they can do the hardest thing, but because they want to do the hardest things. Masterchefs, drill sergeants, kung fu and jedi masters all ride their students hard on a bunch of details that seem not focused on learning the skills at hand. That's because they are trying to teach the students to focus on doing the task at hand well, instead of trying to find a shortcut to the end.
Thats really all we want to teach you: Stop fucking around and start taking what ever you do seriously. It's easy to say, but it's really hard to communicate without sounding like a grumpy old dude complaining about the youth of today. The key to getting good is really that simple, but it seems like everyone has to try every other possible short cut before accepting this.
Being good at something is not a requirement in life. I fully respect people who chose to live an easier life, and you should too. Life is hard enough for most people as it is. What I want to teach is mastery, what I don't want to do is advocate for mastery. I think mastery should be given the kind of respect that is earned, but also that everyone should be given the kind of respect that is owed to everyone, if they chose to live a different life.
When I started out with computers, the few people who cared about computers were passionate about it. You had to be because computers weren't exactly enticing unless you had a lot of imagination and curiosity. Today everyone has to interact with a computer in one form or another, whether they want to or not. Computers are also a lot more alluring and offer easy enjoyable distractions, that don't require you to understand very much to enjoy them.
It sometimes irks me when people suggest that my skills come from being given privileged access to the exclusive club of programming. This is a fundamental misunderstanding of the challenges faced by young programmers of today. I didn't grow up with access to computers. Campus police had to escort me out more than once after I snuck into computer labs of a nearby university. Why? Because computers were hard to come by, now everyone has one in their pocket. I remember going to the library asking for programming books and hearing how they might consider buying a programming book next fall. Today the net is filled with more tutorials than you could absorb in a lifetime for every conceivable language and programming technique, available for anyone for free.
Access” is everywhere today, what is hard today is to focus. Computers are today consumption devices that offer an infinite sea of distraction no matter your interest of inclination. We had the luxury of boredom. If we wanted to play a game we had to first make it. If we wanted to use a computer we first had to understand it.
Many programmers today aren't that interested in computers. There is a new generation for whom learning to program is about being employable, and securing a stable income, much like becoming a bookkeeper, or dentist. Programming has become an “easy” way to earn a living to many people. Take a 4 week course, and start earning money. There is nothing wrong with that, but it presents a great cultural rift between people who are passionate about computing and people who just want to clock out and get a paycheck.
If you have ever tried to write a web page, the complexity immediately hits you. Why does it have all these tags when all I want to do is put some text online? Why does it have to be this complicated? As soon as you manage to make your text appear on the page, you are struck by how bad the margins, the font, the colors and everything else looks. So you start adding more and more tags. First you change the text to be "blue" but then you realize that's not the right blue so you resort to using hex codes for your colors. The longer you spend on the page the more control you require. In the end your complaint about the web editing process isn't that there are too many things you have to do, but that there aren't enough things you can do.
Internalize this experience. Its true for so many things. When you first start to learn something, be it programming, or any other technology, or fields as wide as medicine, politics or the law, it seems overwhelming and needlessly complex. Its easy to fall in to the trap of asking if things really need this complicated? We reach for simple answers, but the more we learn and the more proficient we get, we start to see the complexity as assets, as possibilities and eventually, we see that the world's problems are often a result of us not taking a nuanced enough approach that engages with the complexities of the world.
A lot of programming ideas revolve around getting results fast, not about being in control. Can’t we do this in fewer lines? Can't the compiler figure it out for me? Why do I have to do all this typing? They are all the result of a naive view of what is of value. These questions are asked by people starting out a project, not people knee deep in actually building something.
So, why does programming have to be hard? Because what we are trying to do is complex. Code is written to interact with the world, and the world we live in is complex. Embrace that challenge. There is no magic solution. Realize that in the end you will want to know how things work because you want to be in control. You will want to understand. 100 lines of code that you understand fully are better then 10 lines that the compiler magically appears to do what you think it should.
Don't be afraid of complexity. Complexity is unavoidable, and it is a sign of control. Think about how to manage complexity instead.
Politicians love to talk about getting rid of complexity in the form of bureaucracy, laws, taxes. They ask why does everything have to be so complicated? They love to say "Can't we just...". However almost all of our problems are because our systems are not refined enough to handle an infinitely complex world. Rather then try to understand the system and try to improve it, it's much easier to sell a simple solution. Simple solutions are attractive because they don't require us to think and learn. But complex systems always win out in the end. They handle more situations. They have more flexibility. They offer control.
Its alluring to imagine that a language, or coding style magically should make complex things simple, Just like its alluring to think that there are simple solutions to societal problems. The most important barrier to break, to achieve greatness, is to want to know more, and engage deeper.
A regular person may drive a car with an automatic gearbox. They simply don't want to think about the mechanics of the powertrain, they just want to get where they are going. A race car driver uses a manual gearbox, because they want control. A truly great driver, doesn't just want to drive, they care about everything that goes in to racing. They read the 1000 page rule book, they constantly listen the mechanics and follow the work of the engineers. Ideally a racecar driver wants to control not just the gearbox but the turbo boost, weight distribution, suspension setup, rake, differential, tire choice and any other thing that can give them an edge. A race car driver, wants all these things because they aspire to be as good as they can be. Someone who dont care about driving can drive an automatic and not be aware of the make of the car that they drive. That someone, may say "Why would anyone want to drive a manual gear car? its so much easier to drive a automatic". Their goal is to not have to understand something, while someone who wants to be good should always want to understand more. When I hear programmer's complain that they don't want to care about memory allocation, CPU cache structures, compiler design, OS design, and the many other things that influence how software performs, then I know they can't be great programmers. If you are going to be good at something, you have to want to be good at it.
Simple solutions are desirable. They are elegant. If you can build something with fewer parts, then there is less that can break, it's easier to make, service, and learn. We should always look for simple, elegant solutions, but they still need to solve the problem at hand. Building an airplane is much easier if you build it without a landing gear, but eventually you are going to realize that a landing gear would be nice to have and at that point you are going to wish you had planed for it all along. Good simple solutions are hard to come by, and they often take more work, then a more complex solution. Always look to make things simpler and more elegant to make them better, never use it as an excuse to avoid doing the job of an engineer.
Ideal circumstances don't exist, reality is messy and full of special cases, and code needs to reflect that. There is no one paradigm that will make programming easy, because code has to handle reality and reality is not easy. What is beautiful to Computer scientists, is a small clever recursive algorithm. What is beautiful to a user is an algorithm that encompasses every special case in the best possible way to solve the task for each situation. There is a huge difference between the two.
It is very natural for people who want to do something to ask "What do I need to learn to do X?". While there is nothing wrong with this, I encourage you to learn in order to be able to ask "Now that I know this what can I do with it?". It might seem like semantics, but the mind set is fundamentally different. When you are confronted with something you don't understand it's easy to be frustrated, and just want to get past it. Learning becomes a barrier you have to overcome in order to do what you want.
I encourage you to see learning as adding tools to your arsenal. You don't buy a screwdriver in order to screw in one screw, you buy it because you know the word is full of screws and a screwdriver will allow you to screw and unscrew a lot of them. Ask yourself: is learning something I have to slog through in order to do what I want, or is learning what opens doors to new possibilities. By learning this way you are open to coming up with new ideas that challenge your assumptions about what you are trying to do.
This is a fundamental conflict between people who embrace technical knowledge and those who just want to use it. Many people who don't work with technology see technology as a limitation. They go to an engineer and say "I want a flying car", and the engineer will start off listing all the technical issues with flying cars, like safety, noise, control, pilot qualifications, energy requirements, pollution, battery weight and so on. The non-technical person often reacts negatively thinking that the engineer lacks imagination, just sees the problems in everything, or is a bad engineer. They can forge ahead to build what they will eventually realize is a Helicopter, hitting their heads against every law of physics, engineering and reason on the way. What the non technical person is missing is that the engineer probably knows of much better ways to solve transportation then flying cars, things that a non technical person could never imagine. The layman can't recognize the engineer's imagination, because he or she can't imagine what its like to think as someone who has a firm grasp of technology.
Science is the search for what can be proven to be the truth, engineering is the search for what practically works. Both are incredibly valuable, but don't confuse the two. Just because something can be proven to be true doesn't mean it is useful. In science proof is the only requirement, in engineering we have many requirements that need to be met.
Most bugs happen because you and the compiler have different ideas about what your code does. Debugging is the process of figuring out why your code doesn't do what you think it does. The compiler is usually right about what your compiler does, you are usually wrong. Good code has to be readable by you, not just the compiler. It is therefore the job of a compiler and a language to be as clear as possible about what it does. If there is any ambiguity the compiler should notify the user and say, "I don't understand what you mean, please clarify", or "You are doing this, are you sure that's what you want?". Software should not think it's smarter than users. The more things that are hidden to the programmer, the harder it is to understand what the compiler does, and the more likely it is that the programmer misunderstands the compiler.
Get over the idea that the act of typing code is time consuming or hard. It is not. Designing an algorithm is hard, Architecting a system is hard, Debugging is hard. Monkeys can type, and I'm writing as possibly the slowest typist on the planet. Fewer lines of code is not a virtue, Clarity is. Performance is. Stating explicitly when costly operations like memory allocations, disk access, System calls, networking, and mutex locking happens is important because it makes the cost clear. Copying some code in order to write a different version of the same thing is often clearer than trying to have one general implementation filled with if statements for various uses. If you know a better way to implement something you have already written, do it. If you know what the better design is, you have already done the hard work. A lot of programming environments, languages, and systems pride themselves on how little you need to type to accomplish things. Typing is easy, so I would rather have a system with better clarity, better debugging, shorter compile time, or a range of other features.
If you started reading this to learn clever syntax tricks, from a programmer master, let me disappoint you right away. The syntax I write is plain. It looks like your first C program, and that's the point. Its there to be as easy to read as possible, Not to be compact, or as a way to show off the depth of my command of the language. The syntax is there to be as simple as possible. It doesn't jump in and out of abstractions, it uses basic types, it reads from top to bottom.
Your users will never pad you on your back for writing what should have been 10 lines of code in only one. But you may some day look at back at that code and wonder, what the hells does this line do, and why is it so complex? Your cleverness should be dedicated to solving the users problem as efficiently and elegantly as possible. The code you write to do so should be as straight forward and simple as possible.
If someone takes a screenshot of a randomly selected page of code from your code base, and reads it; is it understandable without context? In other words, how much does understanding the code depend on things that are defined elsewhere? Trying to keep code understandable without context, should be a prime objective for clarity. Obviously not everything can be defined in place, but when things are not defined in place, it needs to be very clear that they are not, and it needs to be clear to what extent it impacts the code that is in place.
This is why functionality such as function and operator overloading, macros, and name spaces are so dangerous, they change the context of the code. Copying the code from one part of the code base to another yields a different result, even if the code looks identical. Any part of the language such as keywords and basic concepts such as flow control should never be redefined, or obfuscated. To a large extent I also discourage the redefinition or renaming of basic types.
for (i = 0 ; i < 10 ; i ++)
Your brain should instantly recognise the pattern down to using "i" as an iterator. If we compare this to a macro that does the same thing:
loop (10 )
Your brain, won't instantly recognize what is going on. You may have saved a few key presses, but your brain is on overload not to think this is a function call that is missing a semicolon.
The C preprocessor is an incredibly powerful tool that can be used for a wide range of things. As such i limit the use of "#define", to either fully capitalized constants, and function calls, that are function calls. When you read this:
for (i = 0 ; i < NUMBER_OF_LOOPS ; i ++)
#define NUMBER_OF_LOOPS slow_function_with_lots_of_side_effects ()
Then it is very misleading. I find that there are two reasons to use the preprocessor to define defines with parameters. Either to create debug versions of functions that add __LINE__ and __FILE___ Macros to the parameter list of a function. Or when you need to force the compiler to inline code. In both cases, you should implement versions of your code that doesn't do this so that you can switch back and forth, to verify that they yield identical results, and to help in debug
void *debug_malloc (size_t size , char *file , unsigned int line )
{
printf ("Allocating %u bytes in file %s on line %u /n ", size , file , line );
return malloc (size );
}
#define malloc (a ) debug_malloc (a , __FILE__ , __LINE__ )
I think a good rule of thumb is that any code copied from one place to another, should either break, because things are not defined, or work the same way because it is defined the same way. This means: never define anything in one place functionally different from an other.
Everytime you abstract you run the risk of unintended consequences. Any kind of ambiguity creates hazards. Programmers need to be able to trust the code they see. Programming features like Macros, function overloading, implicit initialization, operator overloading, templates, implicit type conversions hides what happens from the user. Use the language as it is, do not try to hide it by redefining it to something else. If a handle is a pointer to a structure, do not hide that it's a pointer. If a pointer is the language's natural way of expressing a reference to an object, then the programmer is used to seeing pointers, and knows what they are. The special type you define just for your thing is foreign to a programmer, and won't be as easy to internalize, no matter how brilliant it is to you. Any time the language is clever, it forces the programmer to use more of her brain power to understand how the compiler will interpret the code.
Good software is software where the user is exposed to a few simple consepts gives them as much power to accomplich thing as possible.
LEGO, is a great design because with a few simple parts, youb can build anything you want. You dont need to own specific lego features in order to build a spaceship, you can just use the basic peaces. Special peaces, may aid in building a spcaeship, but even these special peaces conform to a general format that makes them easy to understand and integrate in to a design.
Unix is a great design becaue you can use pipes together with various comand line tools to create all kinds of functionality. The system doesnt need specific features, because you can use the basic functionality to build the specific features you need.
Users can only use what they understand, so these consepts have to be understandable. In order to make them understandable, you need decide what is exposed to the user and what is not. A program that shows images, may need to expose the user to the fact that images are stored in files, but does not have to expose the user to the secifics of the file formats used. A data base can let the user store and retrive data, but does not have to expose the user to what algorithm it uses to index the data. If it does, it empowers the user to better tune the perfomance, but also adds more complexity and concepts for the user to manage. Good software magically removes concepts that arent empowering the user and gives the user concepts that are.
Greate advances are often made in software when someone manages to remove the need for a concept to be managed by the user. 3D rendering that lets the user, video editors where the user doenst have to manage different formats, automation tools that doesnt require the user to write code, a system that can automaticly convert data types seamlessly. Often an insight in to how a concept can be removed or automated is a key factor
When you design software the goal is to build as few things as possible that can do as much as possible. The way to do this is to alow the users to combine different aspects of the software, in as many ways as possible. You dont want to add to a software, you want to add dimentions to your software. If the system you have designed is clear simple and fit for purpouse you need less features designed for a specifice purpous, since the basic design lets users do what they want within that framework. Good software design, lets the user do things you havent thought of. Good software lets the user use and combine its capabilities to solve problems the developer has never encounterd. Users often think in terms of features, but its your job as a software architect to translate that in to flexible systems that can do what your users request, but are also flexible enough to do what they havent yet asked you to implement.
At a product presentation for a video editor, I once heard a product manager proudly announce that they had after many user requests added an option to hide tracking points that would obscure the main view if you had a lot of tracking points. This is terrible design, duisguiced as "listening to the users feedback". If the tracking points are anoying maybe redesign them to not be anoying? Did i hear they obscure once you have too many? Maybe count them and make them smaller or more transparent as the grow in numbers? Instead of fixing the problem they added yet another thing that users have to learn and manage.
Quality is the measure of how long something remains fit for purpose, be it buildings, cars or furniture, so the same thing should apply for code. While programmers do talk a lot about code quality, it's rarely talked about in terms of longevity. Often code quality is defined as code that is easy to maintain, rather than code that can be used for a long time without maintenance. Nobody would call a car that constantly breaks but is easy to fix, high quality. Why do we accept this for software?
I would argue that Longevity is more important in code because the cost of deteriorating code is much much higher than for other things. If your table breaks, you can buy a new one without needing to do a major redesign to your entire house. When code has to be taken out and replaced, It tends to be very disruptive and requires a lot of redesign of other things. If your car breaks down and you have to buy a new one, you can usually just get into the new car and drive off. You may have to be told about how to use some obscure feature but the vast majority of your driving skills will carry over. New code (or god forbid a new language) requires the users to relearn its interfaces.
A programmer's productivity is measured by how long code can be used without needing to be updated divided by how long it takes to implement. The longevity of your code should be the prime metric of code quality. Longevity is obviously valuable, but not only that, "temp code" tends to stick around and bad designs have a tendency of spreading.
Going back to code you wrote a month ago is significantly harder then going back to code you wrote yesterday. Going back to someone else's code is orders of magnitude harder than the code you wrote. The reason to complete code now, is so that you can clear your minds of the implementation details and take on another task. Every Time you have to go back to something you wrote a long time ago you have to take time and effort to re-familiarize yourself with the implementation details of something you no longer remember. Task switching has a high productivity cost, so stay on one thing, be sure to complete it, then move to the next thing. Leaving things for someone else to deal with is bad programmer manors.
The longer time you have used code, the more tested and therefore trustworthy it has become. The moment you make any change you sow the seed of distrust that the code no longer works as you expect it to. Every time you use the code you are writing yet another test case, that verifies that the code is sound. The moment you make a change you go back to zero.
-Avoid trends
You should always avoid writing code you don't expect to use for decades. Not just because it's good to have long lasting code, but also because it's bad to have long lasting code that wasn't intentionally designed to be long lasting. Whatever is hip and cool now wont be cool in 10 years. When choosing technology, maturity and stability is key. Will this be supported in 10 years? If not, is it possible to transition away from the technology? Will there be people available who can use it? Are the tools mature? In programming new technology is bad technology. If it's been around for decades it will probably be around for decades more. Switching is more painful, then the gain of new features is worth it. The users don't care what cool hip technology you use, they care that the thing you make works. Computing is full of trends, and fads that constantly change. My argument is that you should almost always avoid trends and instead focus on a long term strategy.
Changing an API requires everyone who uses it to not only learn it, but also adopt all the code that depends on it. This is a huge cost to everyone involved, with the possible exception of the one who makes the change. It is therefore incumbent on you to not make changes unless absolutely necessary. If possible make the legacy interface available concurrently. If you think it looks bad to retain the old version, when you want everyone to move over to your new version, then tough shit. You messed up so now you have to live with it. Other people have more important things to do with their time than to adopt their code because you messed up.
If you have users depending on your API, changing it is bad manners. The Linux kernel and Windows come to mind as good citizens who respect their users' time and effort, while Apple, Google and most Linux distributions don't.
Even if you are mindful of the effects of changing APIs the rest of the world isn't. You are going to need to depend on other people's technology, but when you do you need to be mindful of the risk profile you create. The concept of longevity rests on minimizing risk. Keeping your code free from dependencies means reducing the attack surface of external shocks. If your code depends on an external service, company or software, you have all kinds of risk exposure. What if the API changes? What if they go bankrupt or stop supporting it? What if they change how they charge for the service or change their License agreement, or simply decide they don't want you as a customer? As you add more and more dependencies, these risks compound. You can reduce risk, by choosing fewer dependencies, but also technologies that have multiple implementations, are open source, are not tied to a single entity, choose technologies where older versions will be supported, or where the possibility of forking exists. This is true for both hard and software platforms.
Languages are dependencies too. Languages are probably the dependency that poses the greatest risk. If a library or platform disappears, you may be able to replace it, but if your language is no longer fit for purpose, you have to start over from scratch. Will a tool chain exist for the language of choice in the future? Will compilers be actively developed for new hardware architectures? Will there be enough skilled developers to hire?
In my opinion there is no reason to ever call any outside software directly: wrap EVERYTHING. When you wrap software try to always wrap your implementation around at least 2 different external APIs. This way if one goes away you have another to hold you over until you have added additional implementations. Your code should build and run without installing or downloading any additional libraries, source or installing SDKs. Downloading a library is easy, but as you add more dependencies the chance that one dependency is no longer available, or has changed its interface increases exponentially. Code that is reliable today won't be in the future. Anybody can be hit by a bus, any company can go out of business or be taken over by people who change course. When you have to rely on existing technology like languages, choose stable tech that is mature and has many independent implementations. If you need your code to interface with a SDK in order to run, make it a dynamically loadable library that the application loads in. This means that if the SDK is not available the application can still be compiled and run.
When desiging software, the first thing to consider is the scope of the structure, that future development will have to live in. At the start of development many decission will be made in rapid sucsession that will have great implications in the future. Once a desiccion is taken, other decissions are made that depend on the first desission, and the older the original decission have been arround the harder it is to undo it. Thefore it important to think ahead. The goal is not to plan out everything in advance, but to not inadvertedly make things hard in the future. The structure you design in the beginning will remain for a long time. Parts may be swaped out, they even be designed to be swapped out, but that too takes planing
Some enginners and managers prefer to only have a very limited scope of a project at the beginning, focus on making something simple that works and then add features. I prefer to know as much as possible up front, and plan as far as possible ahead. The purpouse of this is not to set a detailed roadmap that stretches out long in to the future. What the software will need and in what order, will change many times, so the goal is not to avoid change but to antisipate and prepare for it. The way to do this is to define the posibility space of the software and where the bounds of it fetureset can reasonably be set. It is more important to discuss what it could be vs what it clearly can not be rather then precisely what you think it will be. Writing a feature list is useful to make sure that the software meets external requirements, but in my opinion, focus on features are the the enemy of good design.
Lets say you are making a video editor, and in it you have a time line that produces a videostream. Sounds straightforward. Now lets imagine you want to edit video that is sterioscopic, now the timeline produces 2 video streams, or you are making a immersive video installation that may employ many screens or projectors, then a timeline may output numerous video streams. If you assume that the limelines and video streams are always one to one, and then go back to separate them once the project has grown after a couple of years in order to support seriosopics, then that will be very painful. Deciding that they should be separate data structures have very little cost if its done up front. Just because its decided to separate them up front doesnt mean you have to actiually implement all the features a user may need to do sterioscopics or multi projection installations, or even make it possible for the user to do this at all. What it enables you to do, at a future point is to add these features if you decide you want them, without a huge rewrite.
In my experience designers, and decission makers. Rarely have a long term plan, and if they do they dont share it or it tends to change. A lot of misguided managers and decission makers think: "Im not going to tell my enginneers this going to have to be networked in the future, beacues i dont want to over load them with feature requests that we dont need until next year anyways". This is a terrible practice, and creats untold wasted hours rewriting systems that where never fit for purpouse. You need to know as much as possible up front. Knowing what the app is ment to be like in 5 years, is not a distraction from what needs to be done today, it is making sure we are working towards having that app in 5 years and not having in 10 years.
This means that its up to the engineer who designs the system to think ahead and antisipate what might come. This means ask a lot of question, think about the posibility space, and ask directed questions, about precisely what they mean. "When you say the user can load a document, do you mean that the user will never be able to loade more then one document at a time?" When asking these questions the scope will enevitybly grow. Knowing this information upfront is so much more valuable becaus expectations get more aligned. Whenever someone says that "The software will never be required to do X." I tend not to belive them, if I think there is a chance that it will be a requirement later on. In this situation, you may write it with this possibility in mind if its trivial to do so. If not supporting this use case would reduce the complexity and effort needed significantly, then I would ask a few more times, make it clear to the decisison maker that this is not a decission they can go back on, and explain the added cost of supporting it now, but also the significantly larger cost of supporting it at a future point if they change their mind. If they still think the feture isnt needed, get it in writing and keep it on file.
There are some requirements that are especialy important to make upfront, beacuse they are notoriusly hard to add late in aproject:
Once you have a vague idea of what parts of your software application is likly to change and need expanding, then you can start planing out what parts should be abstracted. Dividing the various parts of your code in to modules, and making them talk to one and other is a large subject, and it will be coverde extencivly latrer.
Preferably you can build a small core that interacts with a wide range of modules. If the core, and the way it interacts with modules can be made satble, a lot of things will become a lot simpler later on.
One valuabel consideration is the relationship between different modules. What module is calling what module? In general I advocate having all module interaction be one directional. Module A calls the API of module B, but module B never calls module A. If B needs to notify A let, A register a callback with B, so that B can call A. This creates a one directional dependency. A depends on B, but B does not depend on A. If A wants B to tell it when something happens let B register a callback with A.
In genral if code is dependent on other code, it makes sense to staticly link to it. There is not point in dividing an application in to multiple files, if separating them breaks the application. The exception is plugin architectures. A plugin architecture is usefull when you have optional features that the application can in theory run without. It also divides a software project in to multiple projects, and this can be increadible useful for both managing the project and dependencies. Lets imagine we are building a sound application and we design a pluging architecture. Each sound effect can be implemented as a separate code base. If for instance you hire a new junior employee they can be given a specific task to write a new de-esser, and this code is now entierly separate from the main code base. If the employee turns out to write terrible code, you can bin the entire plugin, and you wont have to worry about bad decission leaking in to other pars of the code. Similarly if you want to support a specific sound system SDK, you can write a separate pluging that does this. This means that only the people making this integration needs to install the SDK to build the plugin. The plugin now hasa dependency to the SDK, but the project as a whole has not added a dependency. Another good reason to build Plugin architectures to let outsiders write code that interface with a propriatary code base.
There is a trend of thinking "software is never finished" and a process of "continuous integration". I think this is a very bad mental model for producing anything. Yes, you can argue that anything can be improved, and that nothing is ever perfect, but that's fundamentally quite different from saying nothing can ever be finished.
There is a saying that in order to make great art you need two people, one great artist and another person to stop them from working on it when its done. Any project should have the right scope, at some point more becomes less. Too much software degrades because it was designed to solve one problem, but once it does so well, the developers goes on to add more features, to solve other problems, and consequentially the software bloats and becomes a mess.
The idea that software has to evolve, instils the idea that new is always better than what is old. The very notion of new and modernity itself becomes a virtue that requires no justification. Lets redesign the interface! Why? because it is old. Lets change things that works so that they do the same thing but in a different way. Change for the sake of change.
If the project has no set end goal then the direction can change, at any time and do so over an over. No decision is ever final, and everything is always up for debate. If you have an infinite timeline for your development, then why prioritize? Why not put off hard work? On a long enough time scale you will get to everything, so why do it today? If your software only needs to hold together until next month when you are planning to update it, why build it is a stable foundation? Why insulate it against dependencies? If software is never done, then why bother, when the problem will be someone elses at some point anyways?
If your software requires constant up-keep, that is now a tax the world has to keep paying just to be able to do what it was able to do yesterday. You just ate a chunk of the worlds productivity indefnetly. Progress is when we can solve problems in such a way that we can move on to solve new problems. Change doesn't just cost the developers time and effort. If software always changes, users have to constantly adopt, re-learn and update software, costing them time and effort that needs to be justified.
Imagine that the software you release will be the last version. Ask your self for how long will it be used? Once the software is complete you should be able to walk away from the software, confident that you have left something useful behind.
I think the best way to write software is to, start with a vision of what how it should solve a problem. Then you define a clear scope around that, what the software does, and does not do. Then you implement that. One you have implemented that, you need to evaluate what you have made. At this point there are two outcomes: Either the software just wasn't a good idea and needs to be scrapped, or proves to be useful, but inevitably needs some work. You can plan your software in advance but until you have it and can use it you cant know how your vision will perform in practice. I tend to find that some workflows can be streamlined, some features are missing, and some are never used. A lot of times a particular way of using the application proves so good, that it renders other features obsolete.
At this point you can do a number of rounds to refine and optimize the application. Eventually you end up with a peace of software that is completed, bug free, and reflects your vision, with the added experience you have gained from the process. You may release some minor patches here and there to fix issues that arise.
At some pint you may either decide to re think the problem and redesign the software in part or in full, or you may decide to add some features or capabilities.
This is reflected in how I think about version numbers. The first number reflects major rewrites, the second number reflects new features or capabilities, and the third number reflects minor fixes and changes. Marketing may want more frequent updates to the major version number, but to me the version numbers aren't just a way to communicate progress to users, its a way to think about software development.
The first number should require major rewrites to change, because a lot of times that is wat is needed to make progress. This is not because of "crust" building up, but because as time goes on you need to ask your self: "is the approach taken by this to software to solve this problem still the right one?". Limiting the scope of your software allows you to ask this question.
Knowing what number in the version will change because of what you are working on gives clear guidance on scope and expected impact.
I have found that relentlessly working on a single project indefinitely, is not very productive. Work easily becomes a grind, with no end in sight. This is one of the reasons for why I think it is so important to have a clear and limited scope. Not ever seeing the end of the tunnel, and always battling an in progress mess is a good way to burn yourself and your team out.
But beyond the mental tole, it also robs you of perspective. You never get any distance from what you are working on.
If you write software, immediately evaluate it, find issues with it, and then immediately go about addressing the issues, you will inevitably be biased towards the first solution that presents itself rather than the best solution.Standly Kubrik once said that "the best idea is usually the opposite of the first idea". I think about that a lot. I used to make this mistake a lot, something wouldn't work and I would throw spaghetti at the wall to see what stuck. That's a lot of wasted effort, and each iteration would leave behind a lot of dead code. A lot of the times when these experiments would fail, if I'm honest with myself, they failed in predictable ways. If someone had asked me to give my best guess as to why the approach I was attempting would not work out, I would be able to tell them why. Yet I was still trying it out, simply because I didn't have a better solution right there and then. I was rushing to do "something" rather then slowing down to do the right thing.
I don't work that way any more. What I do is that I work on several interleaved more or less independent projects, and then I switch between them. I work on one project with a clearly defined idea of what I want to accomplish, and then I stop and move on to another project.
This lets me clearly define what I want to accomplish with each "push" forward of a project. I have a clearly thought out plan, that I have had at least the length of the previous push to think about. The plan has had time to mature. I focus on the plan and try not to get distracted. The length of a push varies widely from a week to several month depending on what I want to accomplish. A game I have worked on has gotten a one week push every 6-9 months.
What is important is that I don't start implementing until I have a plan, and when the plan is accomplished I stop working on the project, no matter if it was successful or not. A lot of times, you don't want to walk away, because your head is filled with ideas for how to make the software better, either you are on a high form a success full push and want to keep it up, or things didn't work out, and you are eager to make things right. Something I have learned the hard way, is that most of the time its better to step away. Let your original vision settle for a little while, work on something else, and come back with a fresh pair of eyes.
To be able to do this you need to have enough options to chose from when you decide what you are going to do next. I tend to have 2-3 major projects going on at one time, but I also do smaller one-push projects like utilities and libraries, and a wider range of software that I occasionally dip back in to. This enables me think about what to do in other projects while I'm actively working on pushing a project forward, and once a push is completed, I can choose to work on the project with the highest priority that has a mature plan that can be executed.
A format is a definition of how to express something. So many things in software engineering can be thought of as formats. A data structure is a format that defines how data is stored in memory, an API is a format for calling code, a network protocol is a format for coding and decoding data sent to a remote machine, a file format is a format for desctibing some data like an image or a document. Even a programing language is a format for describing instructions for a comupter to execute.
That is a very wide definition, so wide in fact that it might border on meaningless, but before we get in to the specifics, and we will get there, I want to show you that there are a surpricing ammount of lessons that apply to all of them.
All formats are inherently comunication devices. You use a format so that something can be understood. Formats are not the message, but they are the medium, and they there to define what can be expressed and how it is done.
Any form of comunication has to be implemented at minimum twice, once by the provider of information and and once for the reciver. There is no point in saving a image as a file that no program can open, write a program in a language no compiler can compile, calling an API that no one has implemented or writing a variable unless you intend to read it. Idealy thoug, you want to reuse a format as many times as possible. Just like when you chose to learn to speak a language, the value of the language goes up if there are lots of other people who speak that language.
Some sucsessful formats become a standard. Standards are sometimes officialy and sometimes unoficialy accepted formats. Some standards originate from standard bodies (they are ususlly bad because of design by comitty), but other standards organicly emerge because enough people adopt other peoples formats. (Once a defacto standard has been established, it can be useful to create a standards body to maintain it.) These naturaly emerging standards have to be good and useful to many people, because adopting them is optional. Given that organicly emerging standards, have to be good fomats, we can use them to learn what makes a susessful format.
Understanding how to create a good format that can grow in to a standard, should be fundamental to any software enginners skills, yet its not a topic explored enough in engineering (or in standards bodies for that matter). A standard may seem like a very rigide structure for a design that requires more work and agreement, than what is neccesery for most software projects. That can be true, but standards share many caracteristics of good design with software engineering that worth exploring their properties. Good standards arent complicated at all. If the format you are using internly in your project or organization have the propperties that would make it successful as a standard, then you are probably doing really good systems design and you and your organization will reap many benefits, just like the wider industry would if they adopted it.
Lets say someone needs to measure the length of stuff, so they picks up a stick and says "How about we use this stick to measure stuff? Lets call it a meter!". Congratulation this person has just invented a format for expressing lengths! At this point really any stick will do, all you really need to do is pick one and stick with it.
Picking up a stick is really easy and anyone can do it, so lots of people will. In fact there is a enough sticks in the forest for everyone to have their own. Unfortunately not all sticks are the same length. There is an obvious value if everyone could agree to use the same stick, but since anyone can pick up a stick, why would anyone let someone else have to honor of being the stick picker? You may want the honor of being the stick picker, and we may all have opinions about the right length of a stick, but a stick agreedupon by everyone is infinitly better so get over it. A lot of really bad standards, that everyone agrees are bad (NTSC), persist because the value provided by being compatible overrides all other requirements. Compatibility is the differece beween something working or not working, and most of the times having things work is pretty high on the requirements list.
Whether you use what everybody else is using or not, once you have messured everything with one stick, it becaomes very costly to change stick. It very easy to make fun of people who use a bad format, but while it can be easy to spot a out of favour format, apreciating the cost and effort required to change the format is much harder, and very often underapreciated.
Software engineers love to complain, that systems use the wrong formats, be it APIs, data structures, protocols, or was written in the wrong language. Most of the time this is very counter productive. People underestimate the value of a working format, even when poorly designed. Any functional system that does what should is infinitly better that any imaginary design that have yet to be built. A factory full of equipment that operate in imperial units instead of metric units, may not be ideal, but the fastest way to go bakrupt is to scrap all that perfectly working equipment, just to replace it with the same equipment with different numbers on them. Changing formats is hard, expencive and ofter very risky, so learn to accept bad but working formats, and learn to manage bad formats.
We will talk more later about how to migrate from one technology to another because it is an important topic, but for now lets be very clear: You want to avoid migrating, or be stuct with a bad format, so do everything in your power to get it right the first time.
Because formats are comunication devices, it lets you divide a problem up in to multiple peaces. This make everything easier. Once everyone has agreed to use the same stick to measure stuff, one can go out and aquire two peaces of equipment from two different vendors, who do not even know of each other and they will fit perfectly together. Thats magic. When work can easily be distributed this way, productivity goes up a lot. If you are managing a team, manage the fomrats that connect your team memebers.
Comunication is the hardest thing in a team, so if you solve that by giving everyone common formats to interface with, everything else will be easier. This is why I emphasize the design of good formats over almost everything else. If you and your organization are good at this, you can scale infinitly. Any problem hard enough can be broken down in to multiple less complex problems.
The many-to-many property of a format is notable, because as more things use a format it becomes exponentialy more useful. There is however an other side to this, that most people do not realize: if we add complexity to the format that complexity gets distributed to every thing that needs to interact with the format, and we now have exponential growth in complexity! If your format takes a day longer to implement, thats not a day wasted, thats one day for each implemenation. This means that making your formats simple is paramount.
Many formats that have become standards where never intended to be widely used. "This was just something I hacked together", is a common statement from many inventors of the most important formats that exists. Becasue simple formats are easy to adopt, and useful formats are so sticky, you can often inadvertenly get stuck with something you never intended to reuse. This is the reason why sometimes the thing someone hacked together in an afternoon becomes more sucsessfull, than a three hundred page specification meticulusly designed by the industrys brightest minds.
This is the big challange of format design, you want to make something as simple to implementable and use as possible, but still want to have all the features you need. The time you shave off by making your format easier to use for you, will be multiplied by every other user too. One the flip side, the misstake and bloat you add when designing a format, is now everybodys problem, for as long as your format is used. Both features and problems spread and persist. Why not just fix the problems in your format? Well its not enough for you to fix it, now everybody has to fix it, and at the same time. This is incredibly hard.
This is where designing formats become an art form: you want to balance a future looking feature set that encompases everything you could ever want to do, with something simple and implementable.
So how do you design a format that can do everything but has very few features? The way to do it is to build a simple but flexible system that can easily be understood and implemneted but that can be stretched to do many things. Easy right? Later we will talk more in depth about how to do this using primitives.
You want your format to be flexible and encompas every possible future use, and this is when people start adding features, lots of features. This is especialy common when a format is designed by comitty, and why "designed by comitty" is commonly recognized as a bad thing. Each partsipant has they own needs and requirements and everyone piles them on. This is an easy way to bloat the format and make it hard to implement. You not need to be a comitty to make this misstake, many deveopers pile on features without any regard to implementability.
When designers (or more commonly groups of designers) cant make up their minds abou the best way to do something, they may opt to let the user decide. Instead of deciding if the format should store units in metric or imperial, you decide to let the user decide. The therory goes that now the format does support your favorite meassuring unit, no matter if it is metric or imperial. Two is better than one right? In reality, it measn that everyone now needs to handle both, so everyone is forced to implement the system they think in inferior. Whats worse, is that a number of users are going to say "well I use one of the systems so i dont need to implement the other", and yet others will chose to only implement the other and the compatibility falls apart. (USB-C is a good example of this)
A common way to make a specification less complex, is to make parts of the standard optional. In some cases this can be good, but a lot of times this creates a ven diagram of features where there is less overlap then needed to get useful interoperability.
Some standards leave space for arbitrary user extentions. This can be useful in rare cases, but lot of times you want a format to be as complete as possible so that there is one standard for everything. The purpouse of using a common format is to have interoperability, if everyone is free to define what ever they want then it isnt a format that offers interoperability.
In the orginal specification to FTP (RFC 959) the responce to the command LIST that is used to query a server for the content of a directory, simply states that the server should transfer a list of files, but doesnt specify how the data is formated. A human can read the list disregarding the formting, but the omissing of a strict specification of the format makes it impossible to write a program that can reliably parse the resonce of any FTP implementation correctly.
A very common source of limitations and complexity is indirection. Can your application have one or more documents loaded at the time? Can resorces be shared between documents? Is it a single user application, or multi user, are users divided in to groups? Are there sub groups? There is a great value, in asking yourself, can "there be one or many of these" for each structure you design. These are the kind of questions you want to ask when considering the indirections of your design. Getting this right is very important beacause it is so often very complicated and labourus to change once it has been implements. This most often happens when there are too few indirections, but having too many indirections tends to become a burden for develoers too
Example: Lets say you have a multiplayer strategy game. You have sides that play against each other. One side may be controlled by a player, but multiple players can also be allied. This means that you have 2 players controling 2 armies fighting on the same side. You may also have multiple players shareing the control of a single army. Once you have figured out this is the design you want, you can make a decission of what in your design constitutes Sides, Armies and Players and how the 3 relate to one an other.
The cost of having too much indirection is that you get complexity. If the code becomes needlessly complex, it gets harder to name and diferentiate different levels of indirections, and indirection can often cause a lot of complexity when exposed to the end user. You must weight this against the probablility that the indirection will be needed. I tend to think that too much indirection is better then too little. Its always a judgment call. I think that leaving indirection to the future, when you know that you at some point in the future will need it is always a bad idea.
The Jason file format supports arrays, but the content of an array does not have to be of a uniform type.
As a general rule, designing a format the numerical limits of any structire should be Zero, One or memory/address size bound. Or in other words, if something is suported, and you can have many of them you should be able to ahve ans many of them as you want. There is a wrinkle to this rule though, everytime you allow for something to be dynamic in size you are adding a indirection. Any time you have a dynamic size you are adding an allocation. For a lot of things a strict limit is acceptable. The name of an item can be given an arbitrary length, but for almost any use a 32 or 64 character limit will be more than enough, and significantlly simplyfy implementations since you know the buffer size needed to stora a name up-front.
If a format gets better the more people that use it, how do you get people to adopt yours? If you are the boss you can just tell people to adopt it but even then you want people to want to adopt it. Some may adopt bad formats, that have a lot of complexity because they like the long feature list, but since we are not in the buisness of tricking people in to make bad decissions. We need a reason for early adopters to adopt and lets be clear the early adopter are the ones who are hard to win over. Once your format is a prevasive standard people will be forced to adopt it wather they like it or not. Trying to convince someone to adopt a format because it will be great, once everybody else has adopted it as a standard, doesnt help them solve their problem today, its just you trying to push your format.
Standards emerge, not because they are good when they are widely adopted standards, but because they are good before then. It has to give the early adopters a reason to adopt it, by being useful and solve a problem. People adopt technology, because it makes their lives easier not harder. Given that adopting, is in it self an effort that needs to be undertaken, the return on investment has to be at least more then the cost of adoption.
The best way to this is to provide tooling. If you are standing in a forest full of people asking you to use their particular stick to measure everything, and one of them offers you a full range of precission mesuring equipment including everything from laser range finders to calipers and micrometers, and tables letting you convert your measurements in to every other stick in the forest then thats the one you pick. All sticks solve the problem of having a stick to measure against, but that stick solves a lot of other surrounding problems, that would have cost you time and effort to solve yourself.
A format with reference implementations, viewers, debuggers, libraries, utilities, converters, documentation, loggers, UIs and so on, are a lot more atractive to adopt then having to do all that stuff by yourself. The nice thing about tools is that they can be as complex as you want without making the format itself complex. Any complexity that can be moved from the format itself to tools is therefor a win. Cant decide how your format should store something? Pick one, and then write converters to the other options. If the user dont like your tools, thats fine too because they arent required to use them to use the format. Over time your format should build up a pool of tools that users can choose from, and that makes adopting your format a lot more attrctive. Tools unlike formats are easy to change, rewrite or replace and that gives you the flexibility that formats dont posess back.
The simpler your format is to use, the easier it is to write tools for it. The more tools you have the more atractive your format becomes. A format with lots of tools have also been validated. No one would bother writing all the tools unless implementing the format was easy. Use the tool development process as a way to not just make future use easier, but to refine your format before presenting it to others.
As you start to use your format more and more, you will discover new issues you didnt at first consider. If the stick you use to measure things is a cylinder, then at some point some one who uses the stick will tilt it slightly and get a slightly longer stick. So now you dont just need to define what stick to use, but precise guidence of how that stick is to measured. Then you realize that in cold weather the stick shrinks and in warm weather and grows and when its humid the stick swells. Now you have to define the precise climate the stick should be used in. Soon you have to start worring about the wear of the stick. All these things might seem negligable, but once people are using your stick to buy copper to build trans atlantic wires or, measure gravitational waves, these things really matter. You need to think hard about what you define and how you define it. You want your format to be a specific as possible so that there is no ambiguity in its use.
Once you have established the one stick to use to measure all things, some crafty person will say "I need to measure strength, how about we use this stick that everybody already has, and we see how far it can be bent!" Now your standard for measuring lengths just became a standard for measuring the strength of burly men. So is this a good thing? Probably not. Now you have an entierly new set of expectations for your format. People now expect the stick to have specific strength, and that means it cant change material, and maybe having burly men bend these sticks may result in... bent sticks? Bent stick are not good for measuring so you are bound to have some confusion when some things are measured by straight sticks and some by sticks that have been bent.
Its therefor very useful to explicitly have non-requirements. As important as it is to define what your format defines, it is just as important to define the limits of your format. If you explicitly forbid sticks from being used for something other than their intended use, or even better design your format in such a way that its hard or impossible to miss use it, you are much less likly to run in to these issues.
If your format is good people are going to want to use it for its intended and non intended uses. If the non intended uses become too popular your format may fork, where there is one set of requirements on the books, but an entierly different set of requirements in the field. Some companies have intentionaly engages in "Embrace and extend" a tactic where you deliberatly embrace a standard, but then add various non standard extention to your implementations in order to force the use of their product in order to be compatible with the extentions. (See both Microsoft and Google in the Browser wars.)
If a format requires all data to be sorted in a specific way, then anyone who implements the format also needs to implement sorting. You just added a requirement to your format and increased the implementation burden. You, the designer, may have a beautiful sorting algorithm you can take off the shelf an just use, but that doesnt mean that everyone does. You may not care about perfomance, but for someone else the time it takes to sort the data may be the difference between it being useful or useless.
These are hard trade offs that needs to be made. A good question to ask yourself is: does the requirement have to be in the format itself, or is it something that a user that needs it can do themselves? If a user of the data needs it to be sorted, does it have to come sorted or can they sort it themselves? I tend to try to keep the requirements as low as possible on the format itself because it does apply to every implementation.
Too often I see standards that cassualy add loads of requirements that are assumed to be trivial: "Images are stored in this format that are Checksumed by this algorithm, and then compressed using this algorithem and encrypted by that algorithem". This assumes that the user have all these technologies redily available on their platform and that they all work flawlessly. The format becomes bloated and fragile. If any of those technologies arent available or changes then everything falls apart like a house of cards. What may be a simple addition to you may not be for everyone else.
Lesson is keep your formats simple and indipendent, so that the can be implemented from scratch in a reasonable time frame.
When ever you are designing a format the underlying implementation you are using easliy rubs off on the format.
If you are using a stick to measure stuff, the unit of mesure wont be longer than the longest stick you can find. This may be fine, but a lot of times it can bake in the limitations of your way of using the format in to the format itself.
Your platform, hardware limitations, and requiremnats are different from other users and from how the format may be used in the future. The less assumptions about what the world looks and what the requirements are, the more flexible your format is going to be.
Idealy you want your format to be reimplemented many times before setteling on a design. (Some standard bodies require at least two indipendent implementations before considering standardization).
While moderns computer architecture with deep pipelines, cashes, out-of-order-execution, parralizm and branch prediction are very different from a PDP-11 but they are still trying to appear as if they ar just a faster BDP-11 to then programmer, because that is what C is best siuted for.
For decades GPU API designers has struggled with GPU hardware advancing at amuch faster rate than CPUs. Again and again the CPUs ability to feed the GPU has become the bottleneck. In early versions of GPU APIs the CPU would feed the GPU individual vertex propperties one by one. The CPU would have to send several commands to draw a single triangle. Modern GPUs alow CPUs to send a single command to draw complex scenes with thousands of objects, with multiple textures and shaders in complex multiple passes.
You dont know what the future looks like. Some times you have to make bets about where technology is going, its always best if you can avoid it. At the time of writing, if I compare two computers of roughly the same price from the same vendor roughly 20 years appart, I get:
As you can see, all number has gone up, but the difference is stark. A design that once relied on getting data from the lan instead of generating it on the GPU.
Often a format has two sides, Reader/Writer, Client/Server or Caller/Callee. In this case the burden of implementation does not necesseraly weigh equly. Lets say you have a service and you require it to deliver data in very specific order that the reciver can depend on. By making this requirement, you have made it harder to implement the service. On the other hand perhaps the burden on reciver have decresed becasuse they can always depend on the service to provide correctly orderd data. If the designers of the system only expect one or a few implementations of the service and orders of magitude more implementations of users of the service, then perhaps this is a good trade off. Always assume all sides will be reimplemented at some point.
When designing format, it is valuable to separate the structure from the semantics. The structure defines how data is stored, where as the semantics defines what the data means.
Example: Json is entierly a structural file format. It only defines how data is stored, not what any of it means. This means that its possible to write a parser that can parse the data structure, but there is no way to write a parser that can make sense of the data. If you have two data bases that stores records of people using json, they are not neceseraly compatible, because the two systems may store the same data, in the same format in completly different ways.
The SI sytem on the opther hand is entierly a semantic system. It defines what a meter is, but it doesnt define how the data is stored. You cant write a loader that can load in all SI units, because they can be stored in any way, but once you have loaded in SI units, you can make sense of them and do calculations on them.
Thinking of them as two separate but linked problems helps design a system that is both easy to understand and implement. The two have very different requirements and different functionality has different needs. All interactions needs to access the data, so creating a structure that is simple is important even if the semantic dedscription of the data is very complex. Often functionality needs to know very little about the semantics of the data in order to do its job. Many functionalities only change one thing, and therefor only needs to know the semantics of that thing, but in order to do so it needs to be able to traverse the entire sturucture to parse/load it and then modifying it and return/save it. The complexity of your data format has an invers corelation to the numer and capabilities of the tools that operate on them.
Example: imagine you have a very complex data set that stores the plans for a nuclear plant. The data needs to store all kinds of different data types, from materials, electrical systems, geometry of machines and buildings, radiation levels over time and so on. Imagine you realize that the maker of light fixtures that have been specified, has renamed their product line and you want to write a tool that can replace all the occurences of the old product name with the new in the data set. If the data is stored in a simple structure, it is possible for someone to write a loader for this data set that places it in to memory. Once it is in memory the name of the fixture can be fond and modified, and saved in its modified form. This tool, does not need to understand thecomplex semantic meaning of any of the data, except light fixture name, but it does need to be able to traverse the entire structure of the data.
A good way to make a flexible format is to define a core format, that defines the data structure and the a set of basic functionality, but then leaves room for others to add their own data in a well defined way in the same structure. Preferably this additional data uses the same structure but adds new semantic meening to it. Many tools can be written without having any semantic understanding of the data. When a new semantic is defined, and auxilary specification can be written that defines how a data structure is used for a specific use case. This lets a format stay simple and easy manage, while keeping it extendable, and compatible.
Lets say we design a 3D file format. We define basic geometry, but we also alow for each object to have a key value store. If someone wants to make a physics engine, they may want to store the mass of an object in the key value store. Since the format is defined in meters, the physics engine stores the values as kilograms with the key "mass". Once this is done they can publish an axillary spec, that defines how they store these propperties in the file format. If another team wants to implement a physics engine, they can follow this auxilary spec and make their software compatible. A editor for the format can display and let the user edit the mass of an object simply by letting the user access the key value store. The editor does not have to know the semantic meening of mass, for it to edit it. This means that you can write a forward compatible tool that can handle propperties was not yet defined when it was written.
Lets say you want a data structure where you want to track a bunch of kids, that share 10 sticks. The obvious way to store this, is to have each kid store the number of sticks in their posession. To trasfer a sticks from one kid to another you need to subtract it from one kid and add it to another. If you add up all the kids sticks it should alway add up 10, but it doesnt have to. Its possible that a sticks was added to one kid but not removed from another or wise vera. If we assume we live in a world where kids dont loose stuff, then the programmer must have lost them. If on the other hand we store a list of who has each of the 10 sticks then we cant loose any toys, becuase if we change who has a toy, we dont have to do two matching operations only one.The data structure is consistent with our requirements by its nature, and is therefor much less error prone.
Example: Unix command-line system lets you cut, paste, and transform data in endless combinations using a few basic commands. However, Unix commands however assumes that all data is stored as textfiles. You cant use grep to find out when "as time goes by" plays in a video file of Cassablanka. It would be possible to build such a system, but it would require a much more complex type system, that could operate on such disparate data types as text, audio and video. It would not only make the system more complex, ith would also make wiriting commands such as Grep more complex and time consuming.
I tend to favour binary fileformats over text file formats. Binary file formats are easier and less error prone to parse, are smaller and faster then text formats. Textformats on the other hand have the advantage that they are easily human readable and editable, something that is often handy in the development process. So to follow the "Code is for running, tools are for developing" mantra, the best solution is to build a binary format along with some utilities that can convert the binary format to and from a text equivalent. This utility can then be used to inspect debug data as well as create test data, without impacting the performance of the final product.
I tend to favour binary fileformats over text file formats. Binary file formats are easier and less error prone to parse, are smaller and faster then text formats. Textformats on the other hand have the advantage that they are easily human readable and editable, something that is often handy in the development process. So to follow the "Code is for running, tools are for developing" mantra, the best solution is to build a binary format along with some utilities that can convert the binary format to and from a text equivalent. This utility can then be used to inspect debug data as well as create test data, without impacting the performance of the final product.
OpenGL is a example of a extendable API that has been in used for decades in the fast moving area of graphics hardware. A modern OpenGL application uses extention that entierly redefine the API in to something compleatly different from OpenGL 1.0. This is only possible beacuse the application can query the OpenGL implementation of its capabillities, and make use of what ever extenetions it needs. This approach does not work for file fomats.
You should not use a programing language you cant implement yourself in a reasonable amount of time. If you cant implement it quickly, then editors, debuggers and other tools wont be created for the language. The larger a language is the longer it takes to learn and internalize. This means fewer users, and harder to read code. Clever is the enemy of language design. When programing you should think about the problem you are trying to solve, not what part of the language to use.
Looking at this list, you can see that i would prefere a stricter, more explicit language that provides less convinience and "compiler magic". Almost all other languages, have seen these things as issues that needs to be addressed rather then assets. This may be the reason competitors to C has come and gone, or keeps getting redesigns, while C ocupies a stabile unthreatend place.
To organice code, I use a scheme that is pervasive throughout all code and files. The first thing you want to do is to distiguish between code, variables definitions and defines, and other mechanics your language provides. While the precise scheem you chose is a matter of taste and habbit, rather then objectiv right and wrong, having a consistent format greatly helps with readability. A consistent design can comunicate what things are quickly just as good syntax highlighting can.
I use capitals spaced with underscores for defines and constants, structs dont have spaces, but the first letter of each word is capitalized. functions and variables are lower case spaced with underscores, and so are my file names.
The reason why I think this is better is that it creates an address path towards finer and finer granularity. We start the address with "object" to create context and then we go on to describe what we do with the object. In reality, having just two levels of addressing is almost never enough. A real functions may be named:
We can here follow the path, from project, to module to sub mudule down to specific functionality. It also makes your code more easily searcable. By searching for "game_rendering_object_property" you get a list all uses of all propperty functions, and where they are beeing used. This is why using long global, unique names is very powerful and why I think name spaces are a bad idea. You cant quickly search to find every instance of a particular functionality. The code can easily be identified in its global context, and with no indirection you can copy and paste code between different files, and be sure they have the same meaning.
In a header file it becomes increadibly easy to find blocks of functionality that all deal with the same thing. Writing prefix, not only refines the search as you read it, it also makes it easy to navigate a header files. Here is a list of functions with postfix names:
Notice how much easier it is to navigate the prefix functions, just because we read and format from left to right. Its much more organized, and for instance its easy to identify all functions that deal with rendering objects.
This naming scheemes, is recognizable for how wide code it produces. In general I think that wider code I.E, code with long names is decirable.
We can also use the same naming structure for file names. The above functionality could reside in a file called:
It is now always abundantly clear where functionality recides. All functions that start with "game_rendering_object_property" should be found in this file. By storing all files in a flat hirarcy, you can easily list files to see the structure of the project and where functionality recides.
Many people have a header file for each code file, but i find that to create an awful lot of header files, with very little in them. I instead prefer to have fewer headerfiles that encompass multiple files. Given our previous example we may have a headerfile named:
It would encompass all functions if all of the files that start with "game_rendering". For modules i usualy have just 2 headerfiles. One external and one internal. Lets imagine we have a simple module called "hello", It may consist of the following files.
hello_internal .h
The "hello.h" file includes the entire outward facing API. It does not include anything the user of this module shouldnt have access. "hello_object.c" and "hello_property.c" implements the functionality of the module. Finally the "hello_internal.h" is a header that is included in both "hello_object.c" and "hello_property.c" and it contains shared definitions and datastructures that are not visible to the user of the module. The purpouse of "hello_internal.h" is to link the c files together, and to keep "hello.h" cleen and free from implementation details.
This structure again makes it very easy to know where to look for the functionality you need. These external inteterface headerfiles are a great place to describe and document the functionality as it is the starting point for any user.
Due to this naming scheeme I find it incredibly important to have a consistent style for paranteses. I do not have spaces between fucntion names and paranteses, so that i can search for eigher "game_rendering_object_property" to search for all functions that start with this path, or search for "game_rendering_object_property(" to search for a specific function. I always newline for braces, so that brace pairs are easy to identify.
I find that comments are often harder to read than good code. Outside of headerfiles, comments serve little purpouse. I would much rather focus on good naming, then writing lots of comments. The main reason to write comments in-line in your code, is if there is behaviour and systems that are not visible in the code. The main example I would give of that is multi threaded code where a peace of code runs concurrently with other code, residing someplace else. For anyone reading the code, it may not at all be clear why do code is jumping thrugh hoops to avoid a deadlock, if you dont see that there is other code that running concurrently. Another example is code that is a workaround for a broken API or hardware. This can produces code that looks nonsensical, that someone can easily break by accidentaly "fixing" it. In these cases a clear comment warning anyone reading the code can be warranted.
Another good use of comments is to add searchable tags. I use "FIX ME" as a universal tag for anything that needs addressing. At any point I can search the code for "FIX ME" and find known issues to work on. Obviusly you can invent your own tags that fit your team and project.
Name what things after what they are or do, not what they are used for. Very often you write a peace of functionality for a specific reason and then later you realize that it can be used for other things. If you then have named it for is use rather then for what it does, it becomes very confusing.
In computer graphics hardware the process of wraping an image arround a model is called "texturing", because it gives the object texture. However as functionality in modern programable graphics hardware, this translates in to a filterd lookup, in to a multi dimentional lookuptable. The is incredibly powerful and versetile functionality and accounts for much of what GPUs do in almost any use, yet its still called texture hardware, dispite just beeing one possible use of the functionality.
We have already talked about the value of a strict structure of naming. A well executed naming scheem also means that the user can guess correctly what functionality should be named. If your game, has a matrix property you want to set in its object in the rendering engine, you may guess its in "game_rendering_object_property_matrix_set". To make it predictable, make sure that words that are used match each other. For instance if you have a function that ends with the word "create", then a corresponding function should be named "destroy", not "delete", "free", or "subtract". The point is that if a user sees one function they should be able to predict what the coresponding fuction is named. There are many words that pair well with each other like "allocate" and "free", or "set" and "get".
Beyond having uniqe names, it makes sense to define uniqe words with specific meanings for your project. Imagine you are implementing a game of Monopoly. To do this you need to load image "assets" of the board and crads. You also need to implement a structure that stores each players "assets", in the form of money, streets, houses and hotels. If you then go ahead an implement a function that loads assets, its hard to know if that code loads images or the state of a players finances if the word "asset" is used interchangably to be about image assets and player assets. By renaming the players assets in to something different like "properyt" you make it clear that it is distinctive from assets. While a word like asset has a wide meaning in the english language, it makes sense to give it a much narrower meaning in a code base.
When designing software, you are (hopefully) designing primitives with specific uses, possibilities and limitations. It makes sense to define strong naming conventions to tell them apart. By naming then you create a strong shorthand to talk about the capabilities of your software. This naming conventions are usefull both within the code, and to comunicate functionality to the end user. Often we use generic words like session, object, node, fragment, layer, agent, operator, tasks, device, actions, sets, filters, and handler to describe functionality, but it makes sense to define much more narrowly what these words mean in the context of the software you are writing. By codifying the meaning of the terms, it becomes easier to enfoces implicit rules that may govern different types of code.
A "graph" is a structure that describes a set number of "nodes" that are connected to one and other. "Nodes" are invocation instances of "actions". "Actions" can only access input and output from a "Node", and may not access global data. A session can contain multiple "graphc", but "nodes" in one graph can not be connected to nodes in another "graph".
This kind of naming conventions that put clear bounderies arround different modules are especially useful when dealing with multi threaded code where it is extremly important to manage access rights. When choosing these words i tend to avoid word that are already used to describe language features of the implementation language, like "Functions", "objects", and "procedures".
Local variables dont need global adresses, so its much better to focus on clarity. Clarity can come in the form of expressive text, but often its much better to make the variables recognizable. If we again take the following example:
for (i = 0 ; i < 10 ; i ++)
It is instantly recognizable, and we make an asumption about the type of "i" even if its not explicitly stated and we also recognize it as a iterator, even if its only written out as a single letter. If we compare it to:
for (iterator = 0 ; iterator < 10 ; iterator ++)
It is less clear because its not instantly recognizable and therfor you need to read the code to figure out what it does. Using "i" is a very common ideom of programming, so for this to work in a broader sense, we need to expand the number of commonly used variables to cover as much as possible of our code base.
-i , j , k , l , m
Iterators, always of integer type, usualy unsigned. If a floating point iterator is needed, i use, fi, or di.
for (i = 0 ; i < 10 ; i ++)
sum += array [i ];
b = &buffer [10 ];
-p , a , b , c
-v , v2 , v3
Used for finding the best value in a data set according to some metric. "found" is always an integer or pointer, and best can either a floating point or integer value.
found = 0 ;
best = array [0 ];
for (i = 1 ; i < 10 ; i ++)
{
if (array [i ] > best )
{
found = i ;
best = array [i ];
-f , f2 , f3
used for temporary floating point values .
On top of these I have loads more that are domain specific for projects. One common way of deriving these is to use function definitions as a base for variable names. If you have a function that looks like this:
void project_container_destroy (PContainer *container );
It makes sence to call it with a variable called "container". It also makes sence to call it "container" thrugout the implementation of project_container_destroy, and other functions that use the same type. This way you instantly recognize that a variable called "container" is a pointer to the type PContainer.
Dont get in to the habbit of writing slow or bad code when you think it doesnt matters. If you solve a problem, it should remain solved, and you should leave no reason to solve it again, becasue it wasnt done the right way the first time. Returning to re write code that you thought didnt need to be fast, comes at a big mental cost. If you get in to the habit of always writing performant code, you have to maintain and write one style of code rather then two. Write code that is fast on generic hardware, and alow the compiler to do its job. Dont waste your time optimizing for a specific compiler, hardware or platform unless you have to. The goal is to make code that is fast and will remain fast, not code you have to go in and re-optimize.
I maintain a mental barrier againts writing code that has a n2 complexity where n is dynamic. A computer can obviusly solve problems of quadratic complexity using a brute force aproach, but I pretend that they can't. I get in to the habit of always avoiding writing code that have this or worse complexity. In your head you maintain a tool kit of solutions to various problems, and I chose not to carry this one, to force myself to get in to the habit of writing performant code.
The reason code is slow is not because you havent read up on the latest papers on sorting theory. Its almost always slow because your application does things that it doesnt need to do. Its slow becasue it keeps accessing disk, waits for network connections, garabge collects, hasent cashed what it alread has computed, runs in a virtual machine, makes too many system calls and has poorly designed thread locks. The best potential optimization is always figuring out a way you dont need to compute someting at all. Be explicit with operations that take time. Amortize large computations and decouple heavy computations from user feedback. O notation only rarely is the cause of poor performance. Also note that all common search and sorting algorithmes assume you know nothing about your data. This is almost never true, and is another good reason why generalized code is less decirable.
You write a the function: int setting_get(char *name) that opens a file, searches it for the setting then closes the file. If it cant find then file, then it connects to a remote server to query the setting. Then someone uses this function like this:
for (i = 0 ; i < 10000 ; i ++)
array [i ] = setting_get ("Initial value ");
The job of a programmer is to produce a set of machine instructions that executes on some kind of hardware tos do a task. If this is a definition of sucsess,
A garbage collecting, language objectivly makes the software slower, and use more memory, for what is preceved to be easer development. By choosing to use such a tool, you are chosen your own convinience over the quality of your output. A much better aproach is to use a tool that helps the programmer without adverse efects on the end product. A house without a roof, may be easer to build, but its not a better house.
Example: The Linux kernel contains many goto's. While this may not make for the easiest to read code, it does produce better assembly language that saves execution cycles. The Linux kernel is not desinged to be bed time reading, it is designed to be a optimal operating system running on billions of devices. A single saved instruction saves an imesurable ammount of power and compute time across the world, and that is much more valuable the the percived estetics of its source code.
If the choice is better user expeience or better developer experience, always chose to better user experience. This how ever is rarely a tradeof that needs to be made. Most problems encounterd during the development process than be aliviated with better tools that dont impact the final result negatively.
Low Latency is almost alway more decirable, and harder to achive then bandwidth. This is almost always true or all domains like user feedback, networking, memory access and so on.
-Avoid optimizing for a specific hardware .
For a software developer, Its easy to think that Software is easier and faster to develop then hardware. Yes, it is easier and faster to write hello world, then it is to solder the hardware needed to run it, but hardware gets replaced much more often. Most of the software we choose to use is over a decade old, but we rarely choose to use a decade old hardware.
When optimizing there are always gains to be had by knowing exactly how the hardware works, but the time it takes to optimize for the exact hardware, is often outpaced by the development of new hardware. What you need to do is to recognize what hardware trends are long and short term. Algorithmic optimisations almost always yeild longer term results then optimizitions targeting a specifict hardware generation.
Its impossible to know the exact size of cashe lines, the number of cashes, or memory latency of future CPUS, but one can assume that keeping data compact, to access it sequentialy will continiue to be good strategies to get good memory access performance even in the future. If you over optimize for the specifics of current hardware you run the risk you having your code run slower, not faster once new hardware comes out. All code should be designed to last for many hardware generations and it therefor makes sence not to make assumptions about number for cores, precise memory access patterns and other things that are likely to change in the future.
In 1972, Ed Catmull decided that he wanted to make a fully animated feture film. Later in 1978, while working at Lucas film, they conducted a test to find out the requred resolution and bit depth of CG Images projected in a theater. (roughly 2000 by 1000 pixels at 10 bit depth). The RAEYS rendering achitecture renderd polygons half the size of a pixel, and by benchmarking their renderer, multiplied by resolution, 24 frames per seconds, 60 seconds per minutes, and the running length of a feature film, they could compute the computational requirements of a feature Ccomputer generated film. At the time the numbers showed that it would entiely unfeasable to make a feature film. But using mores law that predicted that cost would fall at a steady rate, they where able to predict that it would be financialy and technically possible sometime around 1994. In 1995, Ed and his team had spun out of Lucas film to form Pixar and released "Toy Story", the worlds first computer animated film.
Lets say you want to travel 1000 kilometers between two cities. To do this you might use a car. You may invite a friend to join you. The added cost of a passanger, in terms of fuel and maintanience and the use of existing infrastructure, is negligable. Most regular cars can fit four, but if you want to bring even more people, an minivan that seats 8 is an inexpensive upgrade. If you really want to bring a lot of friends you may need a bus. A bus is coniderably more expencive then a minivan, but per person it is still cheeper, since it can easily fit 50+ people. If thats not enough a train may be an option. A train car can fit 150 people and the number of train cars in a train is mostly limited to the length of the platform. If you get Japan Rails to run your train line you can run one train every 45 second one one track. With a ten car train that moves 120.000 people per hour. If thats not enough, just add more tracks. You could easily have 10 parallel tracks or why not 100 if you feel like moving the entire population of New York every hour. Thats well over a million times more people then where we started.
The 1000 kilometer jurney would take about 10 hours. But lets say we want go faster. We can buy a Porche and baring any intervention form law enforcment, we can cut that time in half, But it will be expencive. If you are counting $ per K/h a Porche is not a good investment compared to a moped. If we again want to cut our travel time in half, things start to get really expencive. The fastest production car at the time of writing is a Köningsegg and it tops out at around 400K/h, but it can easily cost you 10x more than the avarge Porche. After this, its starts getting really dificult. A japanese eperimental maglev train will only buy you another 100k/h, so thats out. You could go for one off land speed record car, but none of them have the range needed to complete the jurney. You pretty much have to go airborn. Your average passange plane can easily hit 800k/h, but with lake off, landing, and taxying it will be close. If you want to go cut your travel time in half again, you need some good connections with your local airforce. A fightere jet can get you close to Mach 2, and with a little help from an after burner and an ejector seat, you can shave valuable minutes from your jurney. If this is still on the slow side, you will need some truly epic connections with the Smithsonian air and space museum, to get them to dust off their SR-71 Blackbird for you. Unfortunately it will only get you to to 3.5 Mach, and with the required space siute and air refuling, it may end up being a bit a of a disapointment in the area of practicality. An SR-71 is expencive at 100.000.000 a pop in 1972 dollars, but at least they are reusable. To go faster we need to enter the field of roketry, this is for the traveler for whom money is no object. The level of complexity, costs and engineering needed to do this is starting to push against the limits of human capabilities. On the subject of human capabilities, somewhere around this point, your body starts to be a problem. Around 10G our bodies start breaking down, we loose contiusness, crack ribs and so on. Even if you can get arround the squishy-human-problem, eventiualy you will run in to the speed-of-light-problem.Im not saying these issues are unsolvable, but they probably involve a couple of trips to Stockholm to pick up a few Nobel prices for changing our understanding of the fabric of reality.
Whats is the point of this excercise? It illustrates a simple rule: Bandwith is easy and latency is hard. We where able to easily 1.000.000x the bandwith of our jurney using 100+ year old technology, where each doubleing of bandwith cut the cost significantly. At the same time we have great difficulty reducing our latency by just 50x, and we are forced to deploy the most advanced technology ever deviced, and at a punishing cost curve where every k/h is far more expencive then the last one.
Why does this matter for software enginnering? It tells us that, reducing the time it takes to return a search query by half, is harder than responding to twice as many queries. In fact, if your queries return twice as fast, its likely you can handle more queries too. It tells us that its far easier to compute, then it it is to syncronize computation. Its far easier to optimize a network for bandwidth than it is to optimize it for latency.
Knowing this helps you estimate how hard something will be, but it also tells us to be careful about prioritizing bandwidth over latency. A bad design with too much latency is far harder to fix then a bad design with too little bandwith.
Bandwidth tends to solve itself over time. To increase the bandwidth, you can always just do more of what you are already doing in parallel. Add more chips, wider busses, more lanes, more cores and bandwidth will go up. Latency however is much harder to solve for. It requires optimization, care full timeing and syncronization, and splitting up problems in to smaller problems that can be solved in parallel.
A computer from the 80s could access memory in a single cycle. Today a memory access in is lucky to be less then 10 cyles, even if it happens to be in the level one cashe. Latency on a modern comuter is still much faster then on a computer because it runs a frequency 1000 times higher. Relative to bandwidth (and compute), latency is becomming slower and slower as computers evolve.Latency in a computer is heavily limited by the speed of light. At one gigahertz, light travels just 300 mimmimeters. There are hard limitations that you cant get around so be careful about wasting this preasius reasorce.
There is no perfect code. There is only better or worse. There is no way to eliminate all possibilities of failiure. There are people atempting to build "veifyable" code where a code can be verified to work correctly against a specification. While this is possible, and have been done, you now have the problem: is the specification correct? Is the veryfier bug free? In the end it always comes down to "Does this thing solve its problem in the best possible way?" and that is never some thing that can be proven. Its always our best guess. Yes, the software that controls the plane you are flying in is just some peoples best guess as what the softwware in a plane should do. Such is life. There is no guarantes against and astroid destroying all life on earth, being hit by lightning, or cosmic background radiation fliping that on critical bit in memory.
So how do we manage? We do our best and prioritize what we think are the highest risk of failing. We have limited time and resources so lets look at where it is needed. Whats most likly to break? What is most likly to be hard to debug? What bugs are most likly to cause catastrophic failiure?
Sometimes when bugs are simple you can just look at the code and see the problem, you dont need to read a chapter of a book to fix thouse, so here we will focus on bugs that are harder to find. As lazy people we hope that bugs are solvable by just looking at the code, and too often we avoid systematicly investigating a bug in the hopes that some obvious solution will present itself.
Users dont see bugs, they see the symptoms of the bugs. Its really valuable to separate the symptoms from the bug. Buggs with with obvious and instant symptoms are hard to deal with for users, but are easy to get a handle on for developers. Reversly, bugs with barely any symptoms noticable to a user are hard to deal with for developers.
Bug mitigation and debugging therefor has fundamentaly different goals: Mitegation wants erros to have no impact at all, where as debugging wants bugs to have an instant impact. Debugging is often the proccess of produceing more symptoms of bugs, until the bug produces enough syptoms so that it can be understood.
Any investigation is the art of time travel. An applcation does the wrong thing, because of something it did in the past, and its your job to construct a time machine and travel backwards in time to find the fault that caused it. The shorter time you have to go back in time the easier it is.
This observation tells us that code with the shortest possible distance between bug and symptom is the ones easiest to debug. If the cops show up when the killer is still holding the knife stuck in the victim chest, then the crime is a lot easier to solve than if you find a corps that has decomposed for a couple of years. So idealy you want bugs to call atention to themself as quickly as possible.
A defensive programming aproace is therefor based on having code fail faster when there is a bug. Some blow up right away, (De-referencing NULL on memory protected system) and therfor if you run your application in a debuger they will be easy to solve. Defensive programing is about antisipating the bugs that will be hard and putting in the systems needed to catch and debug them.
ABA is a a bug that is based on the possibility that an Item, is referenced, then Item is removed, and a new item is created that happens to get the same address as the original Item. A reference that is made to the first object now inadvetedly referes to the second object. In normal operation this is a bug due to a failiure to remove the reference to the first object. This can be tricky to find, as it may not be obvious that for instance multiple allocations can yeald the same pointer. APA bugs becomes significantlly harder to deal with when you are dealing with lockless programming, that relly on compare and exchange instructions.
Compiler bugs are rare, and therfor you want to eliminate all other posibilities befor blaming the complier. While compiler bugs are rare in mainstream compilers on mainstream platforms.
-Printf
Printf is probably the most common debugging tool. If debuging is mostly about figuring out what you code does so printing out what it does is an obvious solution. The problem with printf is that you need to know in advance what information you want out of your code, and you need to pre instrument your code before you run your code. A good debugger on the other hand can give you the entire state of a program in an exloprable for.
Printf most valble use is to log the changing of state and flowcontrol of an application. A debugger can tell you the state of a progream but has often trouble logging what happend during execution.
When printing out information I try to make the print outs look and feel like the code I'm debugging. You want your debug output to be as readable as possible and since you are reading the code you are redebuggng, it makes sense to retain the naming, syntax, and formating of the code in debug outputs.
printf ("array [%u ] = %u ;", i , array [i ]);
When looking at this output, the variable names will be recognizable and as the number of printouts grow it will be easier to keep track of what printout coresponds to what data.
Some caviats about Printf: Printouts are on some platforms bufferd, menaing that when applications crash some printouts at the end may be lost. Try using unbufferd print outs like fprinf(stderr .... ) if this is a problem on your platform. Another issue with Printf is that it is slow, and may change the timing of the applications significantly. If the application is multi threaded, or depends on time in other ways theis can significantly alter the behaviour of the applictaion. This however is also a clue. A bug that goes a way or changes behaviour when printfs are added might be timing based.
-Write code for breakpoints :
void array_set (int element )
{
int array [10 ];
#ifdef DEBUG_MODE
if (element >= 10 )
element += 0 ;
#endif
array [element ] = 1337 ;
in this case I am using the line "element += 0;" as a NO-OP that i can put a debuger breakpoint on. "+= 0" has no efect, and that can have an advantage, since if you acidentaly leave a test like this in your code, the compiler on higher optimization levels will remove it for you. You could use "-= 0", "*= 1", "~= 0", or a number of other operations that also have no effect, but I exclusivly use "+= 0;", for the simple reason that this 5 character sequence becomes easy to search for in your code base. Everytime you see it, you know what it is for and that its not a misstake.
Some times you want to share or even ship code with debug code in it, then breakpoints wont help much. I prefer writing a clear print out error mesage followed by an exit. As your code base grows it helps to always include the module name in the error message. This means that its easier to triage the problem and give it to the right person right away.
void array_set (int element )
{
int array [10 ];
if (element >= 10 )
{
printf ("Module name Error : array_set given element value %i /n ", element );
program_exit (0 );
}
array [element ] = 1337 ;
While an exit, may be a graceful handeling in shared software, it is not very useful for debugging. If you are running the application in a debugger and the application calls exit, you emidietly lose all state and are unable use the debuger. A break or even a crash is a much more useful since it will initiate a debug session and give you a stack trace, and access to read variables and other data. You can put a break point on every exit, but breakpoints are not portable between projects and compilers so I prefer to simply crash:
void array_set (int element )
{
int array [10 ];
#ifdef DEBUG_MODE
if (element >= 10 )
{
unsigned int *a = NULL ;
a [0 ] = 0 ;
}
#endif
array [element ] = 1337 ;
This code deliberetly writes to NULL in order to crash. Again i always use the same pattern when producing this crash to make it easy to search for. If your code has graceful exit calls when something goes wrong in release mode, consider replaceing them with a crash in debug mode:
#ifdef DEBUG_MODE
extern void exit_crash (int value );
#define exit (a ) exit_crash (a )
#endif
void exit_crash (int value )
{
unsigned int *a = NULL ;
a [0 ] = 0 ;
Having a lot of localized debuging information is only useful if you know what part of the code needs to be fixed. There are times when this can be hard, for instance when a set of data is touched by many different systems. If you wrap any data access so that all code has to access it via specific code then that is a prime place to put debug code.
void important_value_set (unsigned int value )
{
#ifdef DEBUG_MODE
if (value == 1337 )
value += 0 ;
important = value ;
A lot of times bugs only show themselves in very specific circumstances.It is therfor often a good idea to write such breakpoint traps with many diffent tests and analasys. Once you have fond a state that causes the issue. You may want to consider writing separate code that recreates various similar test conditions, to test your code. Writing more code to verify and control the sutuation is almost always a good idea when you are debugging.
unsigned int my_function (unsigned int a , unsigned int b )
{
a = math_transform (a );
a = other_math_transform (a );
return b / a ;
Lets assume this code runs and crashes at the end because 'a' is 0, and that causes a divided by zero. How do we debug this? Something clearly happens inside either math_transform or other_math_transform to cause the 'a' to become zero. The problem is that given that 'a' is constanly being overwritten, we dont have a cmplete history of what has gone wrong. Idealy we would like to set a break point in the beginning of the function so that we can step thrugh the program and see the evolution of 'a'. The problem with this is that maybe the function hundreds of times before 'a' ever becomes zero, and we dont want to step thrugh the program hundreds of times, making notes of all operations. One solution is to detect the faulty state, and then re run the code with the faulty state. in the debugger:
unsigned int my_function (unsigned int a , unsigned int b )
{
original_a = a ;
a = math_transform (a );
a = other_math_transform (a );
if (a == 0 )
a = math_transform (original_a );
a = other_math_transform (a );
}
return b / a ;
Whenever the code produces a faulty state, the debugger will break and right after the break we add a re-run of the code that produces the faulty result. As soon as the code breaks, you can step in to the functions and carfully follow their operation in order to find out why they produce the wrong result.
Some times this aproach is clubersome to use becaus retaining the state needed to re run the offending code is not so easy. If you have a reprodrucable rdeterministic bug a simple solution is to add a static counter to the function:
unsigned int my_function (unsigned int a , unsigned int b )
{
static unsigned int debug_counter = 0 ;
debug_counter ++;
if (debug_counter == X )
debug_counter += 0 ;
a = math_transform (a );
a = other_math_transform (a );
return b / a ;
Now we can run the program, wait for our debugger to catch the divide by zero, check the value of "debug_counter", Replace X with the value, and set a breakpoint on "+= 0;". Now we can easily step forward and follow the execution that will cause 'a' to become zero.
Most bugs are simply logic bugs where the code doesnt do what you think it does. These are the most common and you cant give very much advice on how to takle them as they entierly depend on the problem at hand. As a general rule, since this class of bugs almost always come down to you not understanding what the code does, the solution is almost always to get a better understanding of what your code does, by reading the code, running it in a debugger, and adding code that outputs more information about what is going on. In a later section we will discuss more specific classes of harder bugs and how to aproach them.
Debuging the stack is harder then heap memory for a few reasons. Given that the allocation and layout of the stack is not controlled by the programer, its much harder to know how it works. Generaly stack overflows do not trap like heap overflows are likely to do. Stack overflows also often overwrite local variables and that may make the code harder to debug. Consider the following bug:
int array [10 ], i ;
for (i = 0 ; i <= 10 ; i ++)
array [i ] = 0 ;
This C code contains Undefined behaviour, so anything can happen, but what is likely to happen is that the variable 'i' will be placed after the array, so "array[10] = 0;" may result in "i = 0;" giving you an infinite loop. This is a simple example with only two variables, but in more complex code, it can be easy to not consider that a write to "array" could impact "i". With experience you learnd to recognize the signs of stack trashing, but to prevent bugs like this, I recomend avoiding using the stack for dynamicly addressed arrays, and insted use heap memory.
Heap memory also has the advantage of beeing dynamic in size. This means you dont have to put a limitation in our code as how big your data set is. (See limitations discussion) Heap memory does have the disadvantage that it needs to be explicitly allocated and freed and that can be a slow operations. One aproach is to combine both in cases where the data set is likly to be smal but you want to avoid setting a limit:
#ifdef DEBUG_MODE
#else
#endif
void function (int data_size )
{
int stack_buffer [MAX_STACK_USE ], *buffer ;
if (data_size > MAX_STACK_USE )
buffer = malloc ((sizeof *buffer ) * data_size );
else
buffer = stack_buffer ;
.... /* compute using buffer */
if (buffer != stack_buffer )
free (buffer );
The above code uses the stack, when the data set is small, but allocates memory when the data needed excedes a set limit. This is a good aproach if the data set is expected to be small, but can be guaranteed to be small. Notice that in Debug mode, we set the the limit to one, to force the use of allocated memory that is easer to debug. (Unfortunatly the C standard does not allow arrays of 0 length, with some exceptions)
When designing an API, you can make using it a lot easier by building in debug facilities, either always on, as part of a debug mode or configurable by the user. What are the pitfalls you can exopect your users to fall in to? Often people go straight to coding when using a new API, so builing in the documentation in to the API is very useful. If you get questions about your API, use those as hints to what people are having trouble with and try to incorporate debugging facilities that adress these issues. Some requirements, can be hard to express using the API itself, and are therefor great candidates for self debugging APIs.
If a user uses a uses the API in a pattern that is bad for performance, like allocating and freeing bufferes instead of reusing them.
extern void my_function_debug (int a , int b , char *file , int line );
#define my_function (a , b ) my_function_debug (a , b , __FILE__ , __LINE__ )
#ifdef DEBUG_MODE
#endif
extern void my_function_debug (int a , int b );
#ifdef DEBUG_MODE
void my_function_debug (int a , int b , char *file , int line );
#else
void my_function (int a , int b );
#endif
{
#ifdef DEBUG_MODE
if (a > b )
{
printf ("Error in file %s on line %u falling the function my_function , Pramater a (%u ) can not be more then paramerter b (%u )", file , line a , b );
exit (0 )
}
#endif
You can spend a significant time adding facilities like this that fool proof your API. In theory you can make an API that the end user can not accidentaly miss use.
Good code is easy to debug. I advocate writing code just for the purpouse of debugging. Especialy if you know that a portion of code is going to to be hard to get right, it makes sence to think in advance how it will be debugged. A cornerstone of debugging is being able to find a repeatable behaviour. This means that making your code as deterministic as possible helps a great deal. This can be especially hard if you code is multi threaded or takes a lot of live input. In these cases it may be worth investing the time to create the abillity to record all input and play it back in a repeatable manner. Creating facilities like logs, and even calls that let you plot graphics, can be very good preparation for a project. Writing data structure validation code, can also be time will spent early in a project. As a rule, writing debug code is a good investment of time.
While its valuable to think "strategically" about what debug code will be useful for a project, Its also useful in the shorter term. When ever im stuck, I start writing debugging code. It keeps you busy, it gives you something to do, it hopefuly contributed to figuring out the issue, and even if it doesnt, it mamy help you find other issues now or at a later date. Some of this debug code may be deleted the moment it reveals the cause of the bug, but some will grow in to facilities that are used repeatedly in development.
Unlike murder investigations, debugging ususaly lets you have a do-over. You can run the software over and over, each time instrumenting it with more and more debug outputs, that narrows you down the issue. This is a huge advantage, but it is only an advantage if the do-over can reliably replicate the issue. If the application is 100% deterministic, this is easy, but a lot of applications are not deterministic, becaus they depend on external factors susch as time, networking, user input and so forth. Limiting these factors can still help. If not all code can be made deterministic, perhaps portions can be. If possible, I find that its always decirable to be as deterministic as possible. One possibility is to enable the ability to record non determenistic factors, in order to be able to replay inputs precisely. Beeing able to record, the use of a library is also very useful.
Good code, tells you when something goes wrong. If you build a module with an interface, give it a debug mode, where all inputs will be validated, and any error results in termination with a descriptive messages. Yes, terminate the execution to force the developer to engage with the issue. Kind errors arent taken seriously. Building a separate debug mode,
Having a single debug mode is usualy to course, since you rarely want to debug your entire codebase at once. It makes sence to create localized debug modes for files, modules or even what you want to debug. If a debug mode prints out a lot of information, then having debug mode on for your entire application will make finding anything useful in the output very hard. By nesting debug modes you can make sure your debug mode is properly turned off in release mode:
#ifdef MAIN_DEBUG_MODE
#endif
#ifdef MODULE_SPEIFIC_DEBUG_MODE
... /* debug code */
#endif
Systems designed to debug your code should make programing easier not harder. Unfortunately many approaches to reducing bugs create a lot of friction for developers. Developers are required to over document, write and run excessive tests, and address benign warings and sanitizer findings, file rapports and so on. When things become harder to change, bugs also become harder to fix. As noted earlier, the most preasus resource any programer has is motivation, and its very important to protect the motivation of the developer and make sure they feel like they can get thier work done. While we would all like to write bug free code, we cant, so we should balance our bug prevention with our ability to quickly address bugs that are found. Unintuitivly, users often precive buggy software wher bug rapports are quickly addressed, as preferable to less bugy software where issues arent addressed. People want to feel heard and like their conserns are being taken seriously. Not all bugs matter and your users will let you know what bugs to focus on.
When systems are too rigorus, people start trying to work arround them. If you have a rule that says that any module that sees any chang has to be re certified, then sooner or later developers will start writing code where the main objective is no longer to write good robust code, but code that touches as few modules as possible. A lot of times fear of bugs creates bugs. Developers avoid touching things for fear of breaking things, and contort themselves in order to not have to engage with systems precived as fragine. If a system is precived as fragile, then maybe its time to consider why and how it may be addressed?
In the game development world, I know of a few absolutely game breaking bugs that have been found and fixed in matter of minutes, but then the patches have had to be delayed for weeks for certification. When you have a known show stopping bug, why worry about the possibility of an unknown bug? If you have a process for releasing software (you should) make sure you do include an escape hatch where critical fixes can be released without delay. Preferaly these fixes should be based on a previously verified release and only include the change that fixes the issue.
Some programmers argue that you should write no code that produces any warnings. (Or turn on treat warnings as errors) I argue this is the wrong way to look at tools. Tools should help you write and understand your code, not dictate what you should do. Most issues with code are instances where you think code does something different from what it actiually does, and therfor i think tools that help you understand your own code are the most important ones. Instead of thinking that using the right text editor will make you a great porgrammer, take the time to learn a proper visual debugger.
While im hugly in favour of tools that ask you "Is this right?", I think its equaly important you you as a programmer have the abillity to say "Yes this is what I meant". A common practice of treating varnings as errors, is in my opinion
for (i = 0 ; i < 10 ; i ++);
However, judging the general structure of the code, one might suspect that the semicolon at the end of the first line shouldn't be there. A good tool would warn the user about this semicolon and say: "Hey i noticed that you put a semi colon in a place where it might have been put by mistake, maybe you should have a look at it?". Its not bad code, so the tools shouldnt force you to make any changes to it, but it should draw attention to what is a possible issue. If you think of this warning as equivalent to a waring two things happens. First, it forces you to rewrite code to try to trick the compiler in to doing what you want while avoiding a minefiled of warnings. Secondly it forces the compiler writer to only write warnings for things where the compiler writer has a high certenty that the issue they discover actiullay is an issue. This means that they give the programmer a lot less feedback on things that probalby is right but might be an issue.
Varioud programing languages have proposed various shemes to produce code that can generate code such as templates, generics
When you are stuck on a bug. Dont just think your way out of the problem, program your way out of it. Start writing verification and testing code. It keeps you busy and engaged and often it reviels the issues you are looking for even before you complete the testing code.
While test code, is usful in many ways, simply writing tests for everything tends to be a waste of time. Most code is written to be used right away, by the person writing the code, and therfor the code that needs the new code is in it self a test. Not only is it a test, its a good one since it tests the code under the conditions its ment to be used. To write an aditional test we therefor should be much more dicering about the code we write. Most tests will not reviel anything interseting, and a straightforward test does not delve deep enough to find more hard to find issues. Before we write tests lets consider what we want a test to accomplish. These are the main reasons why one might consider writing a test:
I find that the most valuable test code you can write is not code that returns a fail or sucsess, but code that reveals what is going on. Finding out what your code does is always valuable, a binary test only tests your code against one or a limited set of issues. If your code fails its usualy not because it fails at something you anticipaded could fail. The real problems are the corner cases you never considerd.
The idea that some people put forward is that if you make tests that can automaticly run as your development process, you can guarantee that nothing breaks. In my opinion the moment code is touched, it may be broken. No automation can stop this.
Lets say that, you are writing an algorithm to down and up case strings. You write a simple test, that takes the string "hello world" and up cases it, and prints it out. If the output says "HELLO WORLD" all is well. Or is it? The turkish "I" can not be down and up cased in unicode for instance.
When code is working, there is no need to veryfy that the program is working, because the program itself is the verification. But if its not working, start writing code that veryfies it as soon as you are stuck. Its a common falacy to think that writing verification code will take time, where as finding the bug will will be a quick moment of clarity. When people ask me how long it will take to find a bug, i usually say it takes as long as finding your lost keys. It may be in the first place you look, or they may never be found forcing you to change the locks. If you write verification code, when you are stuck you are systematicly moving towards the bug by eliminating possibile problem areas. Again typing is easy. I have never found that I have wasted time writing debug code.
With a simple computer, a compiler and the will to do something it is possiblke to create any software. The hardest thing about programming is to keep yourself motivated. How you keep yourself motivated, is different for each person. For some people it comes easy, and some find it hard. My advice on this subject is therefor very personal and may not at all apply to you. One universal advice I can give, is to see motivation as a skill that needs to be worked on constantly. There are plent of people who know everything but do nothing because they lack motivation. Recognize it as a challange and a challange worth meeting. Since it is so individual you need to make an effort to figure out what works for you.
Programing can be tedious, so I tend to identify things I find tedious and try to in one way or another find a way make them interesting. This may be done by creating automation for these tasks, It may be to try a novel aproach to the problem, it may be to implement it in such a way that it can be reused or solve multiple problems at once. I find that its much better expend more effort on a larger solution that im excited about, then writing something much simpler that doesnt motivate me. Its not just about the number of hour something takes, its about how many hours you are motivated to spend.
A good programmer is someone who puts in the hours. Its someone who is a self starter and is always eaguer to learn. Stay hungry.
There is no code that doesnt deserve to be well written. If its not important enough to be well written, why are you working on it? Do something that matters with your life. Dont ever even consider if something you write
Maybe you think there is an opening to revolutionize the world by reinventing software for insurance. Insurance has a lot of money in it, so maybe this is a great oportunity for sucsess. Maybe the status quo is terrible and you can really make a difference. Before startuing this endevour, first consider that it may not be a great oportunity at all. What if it turns out no one will want to pay for your insurance software? Then what? If you are still passionate about writing this software even if no one ends up using it then you should go for it becaus you are probably the perfect person to do it.
Most things fail, but if you make something you would enjoy failing at, then you cant fail. Its the only guarantee of sucsess I know.
There are simple things that are used by billions of people and there are complex things that are used by very few. Dont assume a niche use case makes something simple to execute on. Also dont assume that just becasue a huge company dominates a market that what they make requires huge resources. The higher up in the software stack you get, the more niche you get, the more code is written and the harder it gets to imagine a world where the layers beneach you look different or work in another way. The greaterst trick the big tech companies ever played, was convincing the world that competing with them is impossible.
Dont waste your time delberatly making things that arent great. If you cant imagine it taking over the world why bother? If you are going to do things do them right. If you are a company making a product, design it with the intention of being better then the best competitor. It sounds like really obvious stuff, but I keep being surpriced by how often I hear statements proving that this is not obvius stuff. "Our users dont need all the features", "It doesnt have to be fast", "We dont compete with the big boys", "Ours doesnt cost as much / Is free" are all things people tell themselves in order to not have to do what should be done. It never works out. Its always a wasted effort. You dont capture 20% of the market by putting in 20% of the effort. You captiure 0%. Writing code that isnt as good as it can be is always more expencive. Sometimes I hear people say their implementations are just traininng excercises, so they dont have to be good. I keep thinking, what are you training to be? Mediocer? If you are going to do something, do it right. If you aim to make the best thing, but don't know how, you can study the subject, you can run experiments. Seek out information and people who can help you. If you dont aim to make something good, no skills or resources can save you. You are wasting your life.
In life you will find loads of successful people who all have had advantages you dont have. They will talk about how easy some things are that you find impossible. Instead of beeing discurraged by this, try instead focus on what your unfair advantages are. The world needs what it doesnt have, therfor trying to follow in the footsteps of sucsess is a sure way to not be needed. Maybe you know somthings others dont know. Who do you know that can help you, or who do you know that needs help? Do you live near interesting people you can reach out to? Even If you have nothing but yourself, consider the time you save not going to all the meetings you would have to atend if you where part of a big well funded team. Someone you admire is probably envious of you, in a way you may not see. Always remember that programming does not scale. A single developer can do things billion dollar companies fail at. Figure it out.
Cancer will not be cured because someone figures out the magical one thing that cures canser that no one thought of before but could have thought of 20 years earlier. It will be cured becasue we will build more advanced tools that lets us studdy, understand and then modify cells. Canser, like so many other problems are tooling problems. The world is brought forward by enabeling technology. Standardized shiping containers, UPC Barcodes, IS units, the electron microscope, the Haber�Bosch process, injection molding, semiconductor lithography, IP routing, GPS, have all touched our lives in imessurable ways. They are not necceseraly front and center of our daily lives but most of the stuff we have arround ourselves have been enbled by these thing.
Learn to recognize this, not only in the global scale but in the microcosm of the software development you conduct. The truly revolutionary technology like the transistor, dont necceseraly have a direct use. But rather they enable others to build things that do. When you chose to develop technology, dont just focus on your goal. think about what technology will eneble you to reach that goal. Analyse what problems you are likly to encounter, and build technology to solve those problems. Its easy to think that this only applies to large organizations that have the resources available to dedicate to tools development, but in my experience taking time off from projects to build tools to help the projects have allays been a good investment of time even as a single developer.
When you write code you are building a mountain. Each new software you write, should make use of modules you have written in the past, and add new modules you can re use in the future. When choosing what to do next, dont just consider the product of the project, consider what possible future projects will be enabled by the technology you create for it. The best investments are not in technology that gets you that one great product. The best technology investment is the thechnology that gives the the widest range of options
Apple didnt just build the iPhone over night. They first had to built OSX. They build quicktime so that they could do media playback. That made itunes and the iPod possible. iTunes made the itunes music store possible. Separatly they created Safari in order to not be dependent on Microsofts Expolrer browser. They built iPhoto and iMovie using quicktime in order to manage media. Only once all these things where in place, the iPhone and later the app store, and ipad was made possible.
Lets say you are implementing a Video editor. You need to write a module that can encode and decode video. You may think this is just plumbing that you have to get past so that you can implement your amazing new video editor concept that will take the world by storm. This is the wrong mind set. Modules are more vaulable then applications. Your video editor can meet two fates, either it is a sucess or a failiure. If its a sucess you will want a strong foundation to build on so the quality of your video module is important. If its a failiure, you wont use the video editing code, but its not unlikely that you will do some other project that requires you to encode / decode video. Done right the survival rate of modules should be higher then applications. If you have written good modules, you have hedged yourself against faliure. Even projects where video support isnt high on the list of features needed, can get it with little cost if the module is robust and easy to integrate. See the video editor project as an oportunity to concoure video encoding/decoding once and for all, so that everything you enbark on in the future will be able to make use of video. Make it one of your foundational capabillities of your arsenal.
You can do this at a company level, or on a personal level. If your company doesnt do it, try doing it for yourself. Go home and write the base of the module, and publish it online as open source. Then tell your employer that that they would save a lot of time using your open-source library. You agree to guarantee them usage rights, and they give you the right to contribute to the opensource project on work time. Everyone wins, and now if you ever need that code its yours to use no matter where you work. Dont loose valuable code just because management decides to drop your project. (Tip to managers: programmer HATE to have wasted effort on cansled projects, so giving them the right to opensource any canseled project is a valuable incentive.)
The lesson is: think hard and get it right the first time. When things go wrong, own it. A 10x programmer is not a programmer who can implement an algorithm in 10 minutes flat, its the programmer who can write the same thing in a way so that it last, unchanged for 10s of years.
Get in to the habit of always writing good, reusable, dependable, dependency free, performant code. Don't think, that some code is throw away or that there are cases when it doesnt matter. Dont assume you will have time to rewrite things at a later date, or that performance wont be important. Everytime you are writing something you are paracticing your craft, so dont waste you time to practice writing bad code.
Programmers are like multi-processor systems: avoid the need to synchronize at all cost. Like processors programmers are vastly more efficient at doing work then interfacing with one an other. Divide projects in to modules and assign no more then one programmer to each modules. If a module is too large divide it in to smaller modules, or create modules with sub modules (an image loading module with sub modules for different image formats). No module should be larger then that one programmer can implement it from scratch in 6-9 months. If they are larger then that, projects become too dependent one one person.
Whenever people bring up various collaboration methods I am reminded that all programmers collaborate with people they have never meet. When we read documented APIs written by other people we are (often very successfully) collaborating. In that way many programmers are better at collaborating with Ken Thompson then they are at collaborating with the members of their own team. If this is form of external collaboration is so successful, we should try to replicate it internally within organizations.
Often its hard to justify writing internal interfaces and documentation at the level of quality that is expected of externally visible software. Why spend the time to clean up code and documentation when you can walk over to who ever wrote the code and just ask what you need to know?
I would argue that collaboration is so detrimental to productivity, that even collaborating with your past self is difficult enough to warrant a strategy of modularization and interface design even on a one man project. Engaging in code that you wrote just a few years ago, can be risky enough that it is warranted to rewrite it rather the trying to understand it in order to make major modifications to it.
If your are working on a project too large for one person to complete, don't simply add more people. Instead imagine what the world would look like where it would be possible to one person to complete such a project, then add people to create that world. Maybe development tools need to be better? Maybe libraries and utilities need to be available? You can write a lit of all the things that would need to be available for one person who accomplish the goal, and then assign people to it. Some of the things that would need to be available may not be possible for one engineer to produce, so then you have to ask your self what the world would look like to make it possible for one person to write that thing. Now you assign people to make that world happen. By doing this you create an isolated structure of modules, that increases your productivity and flexibility. Once you have created the world where it is possible to write the project you want with only one person, you can look at what new opportunities this new world affords you. You can in theory hire just one person to write an entirely separate product that uses the same modules.
The eleventh person added to a team isn't going to make the team 10% more effective due to diminishing returns, but an eleventh team member can write a tool that makes the rest of the team more then 10% productive. As the team growth the value of the 10% productivity increase grows rather then diminishes. The lesson is: People don't scale, tools do. Enabling technology is force multiplying.
If you erase the difference between, a software interface that is used only by you, is used only within your team, and one that is publicly available to anyone, we need to dig in to what that means for you as a developer. There are a bunch of assumptions that cant be made when you don't know who you are collaborating with. You don't know what they use your code for, you don't know what aspects and features they depend on, you don't know what their priorities are or what they have time to invest time in to. This means that you have to be very careful in your decisions order not to up set their work.
In reality this form of software developments does drastically reduce the maintenance needed for developers because it leads to:
All this stresses the importance of interface design. Interface design is what necessary distance between the implementer and the user, so that both can be productive and innovative without upsetting the work of the other.
We all like to show off as great programmers, and most of us know that making something that looks like it works takes a lot less time then making something that is complete. When you have an early version of something that sort of works, try really hard not to show it off to anyone, especialy your superiors. Yes its cool, and you are excited about it, but dont give anyone false impressions of how much effort is needed to complete it. Show it when its done. Technical debt is built when enginners shows off work that appears to be done when it isnt. If it looks done, your manager will assign you a new task, and you have put yourself in bad position to negotiate for more time to complete the task they think is already done. When you show off your work too early you front loading all the fun, all the praise, you have nothing to looks forward to once all the hard work is really done. Rememebr the first 90% is easy, its the second 90% that are hard. Save the chamaign to the end.
Im fundamentaly wery of hackathons, where the objective is to hack something together in a limited time space. I know numerous impressive projects that where made in a few days, that then have taken years to get off the ground, or failed completly, because of a weak foundation. Being good at hackathons doesnt set you up to be a good software engineer.
When working in a team or for a customer or other stake holders, you will inevit come under pressure to do things that will compromise your work. You will be given unrealistic deadlines, feature requests, and scope creep. Most of us want to be good teamplayers, and we want to say yes, but i find it imperative that you stay firm and protect your space. If you give in to unrealistic goals, you will eventiualy make your work impossible. This is why i think you need to fearsly protect your space. Give yourself the time and space to do things right. If doing it right takes 3 weeks, but you can hack something together in one week, dont fall for the temptation of doing it in one week. You wont get 2 weeks at a later point to fix it. Learn to say no.
Protecting your space is not about just protecting yourself its about protecting the team and the project. If technical debt builds it will impact everyones work. Instead of having the choice of a 1 week hack and a 3 week job done write, you will have the choise of a 6 week hack or a years rewrite. This benefits no one. You are responceible for the quality of the work you do. If you say yes to every unrealistic deadline given, then you are responciple when things go wrong. Managment and customers cant know the technical implications of every decission so they cant be responcible for it. This is a responcibility enginnering have to take. Sometimes its your job to protect them from themselves. Everyone wants everything all the time, but know that in the long run, everyone apreciates consistent, dependable deliver on time and on budget.
I belive there are 2 kinds of deadlines, aspirational and dependency deadleines. Aspirational deadlines are "Lets land a man on the moon in this decade". Its a random date chosen to rally an effort to get something done. These deadlines have no meaning other than "Lets go do something". Dependency deadlines on the other hand are like "The train leaves the station at 5 oclock". If you are not there at 5, then you will miss the train so you might as well not bother showing up at all. Learn to recognize what kind of deadlines you are dealing with. If you are dealing with dependency deadlines, where other people need your work by a set date, then that deadline needs to be respected.
The best way to protect your space is to be vidgelant about controlling information. Management and customers wont read your code, so they are entierly reliant on you to tell them what is possible, how long it will take and what the risks are. You provide them with the vast majority of the information they need to take decissions on what you are to do. Give actionable information that lets them take the decissions. You dont need to tell them every detail. If you need to do 2 weeks of cleanup befor you can deliver teh feature that takes one week to implement, tell thme the feture will take 3 weeks to deliver. Do not assume that they care about what you care about.
Do your best to try to understand what they want but also why. A lot of times they ask for something complicated when they really just need something simple, and then its your job to figure that out. Managers have a tendency to reveal plans step by step, and this can cause a lot of troble. Ask upfront important questions about the limitations of the saftware. Does it need to be networked? is it single user only? what platforms will we target? Make sure you in the clearest terms possible explain that changing your mind later has a huge cost assiated with it. The thing you want to avoid at all cost is to after years of developing a native app have someone come in and say "great now it just needs to run in a browser. This has been the plan all along, i just wanted you to be focus on other stuff until now". At this point you want to be able to bring up an email or something that makes clear that you told managment 3 years ago this wouldent be possible without starting over. You need to protect your team, your organization and yourself against this. Thes kinds of decissions kill projects, and entire companies. Protect your space.
Most enginners aret too fond of managment, or managment tasks. Most engineers just want to engineer. At some point or another you are going to want to enginer things where the effort is simply too much for one person to do. When you reach that point you have three, either relinquish controll of the project to someone else who will manage it, give up the project, or manage a team yourself. I don't know who you are a person, but I would encurrage you to be open to the idea of managing other people.
If your heroes include engineers like Henry Ford, Thomas Edison, Jim keller, Linus Torvalds, Elon Musk, Satoru Iwata, Kelly Johnson, and Edwin Catmull, recognice that they where all able to super charge theeir creation by brining in and managing a lot of people.
But remember: The only thing worse than having to take decissions is when someone else does it for you.
You may not see your self as managment type, you may not like making charts and slide shows, wear a siute and hang out with other management people who seems to care more about corporate politics, status and money than making good stuff. The open secret is ofcoures that managers dont have have to be like that, If you become a manager you can be any kind of manager you want. Dont like siuts? Then dont weare one. You can ban slide shows if you dont like them. Dont like useless meetings? Then dont hold uselesss meetings. Do the meetings the way you want, when you want them and for as long as you think is right. The best managers are enabelers, people who help other people get stuff done. Sometimes by setting rules, but more often by removing red tape and obsticles. A good manager is flexible and works to make the work environment siuted for each individiual needs and talents.
Many manageres add layers of structure in order to be able to gain conttrol of something they fundamentaly dont understand. If you become a manager you dont need to do cover for your lack of understanding because you understand. Engineers dont always make the good managers but they do make the best managers. The best managers are the people who could do the work themselves and know how to get stuff done. They earn the respect of the team and they focus on the work.
The first and most obvius way to do this is to acceps that, your users have the right to reject your software. This includes updates. Even if you have made a new version of your software that inarguably is better in every consivable way, if the old version does what someone needs, then updating may not be a priority to them. This is especialy true when what your are providing is a API. API changes requires the user to modify their code, and this requires atention and thought. The reason someone is using your API is so that they dont have to engage in something, so if you make them, you are doing the very oposit of what your users want from you. Accept that "but this is better", is a weak argument and that "but this is newer" is no argument at all. As I consider APIs to be the best way for teams to colaborate on software developments, you can consider many member of a team to be platform holders who are all responcible to a number of platform users. The lessons are therefor widely aplicable both internaly and externaly.
So what steps should a software developer take to be a responcible platform holder and mininimize the starin on their users, when changes are made?
If you tell people to do one thing and then you tell them that was wrong and that they should do something else, you have wasted their time. This is true for a manager that changes requirements, a designer that changes a design, or a software developer who whanges an API. Valuing other peoples time is a sign of respect, and therefor changes that causes people to have to put in time to learn new procedures, modify their code or worst case rewrite, on top of making being distracted, and needing to make new releases, should not be made lightly. Making changes is sometimes nessesery. Things do change, and its not possible to get everything right the first time, but at this point it is very important to singal that you understand that this change casues work for others and that you are responsible for them having to do this eextra work. You need to comunicate to everyone involved that you recognice what what the changes mean for your users, and that you dont take this lightly. If you get in to the mind set that chnages have a cost, in terms of time, and the trust of your colaborators, then you are more likely to avoid preventable changes in the future, and gain more trust from your collaborators. A change that may make things nicer for you like name changes, is a complete waste of time for your users. No matter how simple the change is to make, they still have to drop everything they are doing to learn about the chang, adopt to it, and then make a release. Arguing with your users that the change is super-simple to a dopt to is just wating your users time further.
If you are writying a new version of a API, consider writing a wrapper that lets users use the old API to access the latest version. That way they can adopt the new changes acoording to their own time line. Smaller changes or name changes can be made using simple macros. Larger changes can be made using wrappers where the old API is implemented using the new API. I think this is a good exercise, and also creates great sample code for any developer that wishes to adopt the later version of the API. How long should these backwards compatible layers be maintained? My answer is forever, they shouldnt just be a temporary solution while users are forced to adopt their code. The entire point of having a wrapper, is that it shouldnt need any maintainance, therfore your should not be afraid to keep it aound indefinetly. You may end up in a situation where you have many layers of wrappers on top of each other, then thats just fine. If writing a wrapper, is hard due to the changes made to the API, then consider that they will be even harder for your users who dont have the benefit of the deep insight in to the API as you do. An API version that is hard to wrap the previous version, is an argument for writing a wraper rather then against.
Lets say you have written a text parser module. You have used in various projects, but now you find that it lacks some fundamental features that are needed.
int alphabetized (char *first_string , char *second_string );
It simply returns TRUE if the two strings are in alpabetical order, and FALSE if they are not. It deals with casing and over the years it has never given you any trouble until one day when you are calling the functionality with these parameters:
The funtion returns TRUE, because '1' < '9'. You realize that "alphabetized" doesn't understand decimal numbers. What do you do? It depends. Where is "alphabetized" used? If it is used in only one place, in code you have written, in software only used by you, You may just change It. But lets assume that the function is used by others, and maybe even by end users. When others depend on the code you cant just change the meaing of it. Dont assume that every time this function is used what the user wants has the same definition of what a good "alphabetized" function is. The user of the function may have read the code and decided to use it only after specificly checking that it does not have a special case for decimal numbers. What you should do is write a new function named something different. Once you have you you can publish the new one and let people know about it.
When calling a function called "alphabetize", you do so because you want you want the behavoiur of the code you are calling what ever that is. it really has no relation to the semantic meaning of the word "alphabetize". There may be 50 different opinions about how to correctly alphabetize things, but the caller isnt referencing the wider consept of alpabetize, they are referencing a peace of code that happens to have the string "alphabetize" as its identifyer.
At this point, users are entierly withing their right to ignore the new function and keep using the old one. You may think that everyone should use your new and improved version of "alphabetized", but just because you think that the new version is better doesnt mean anyone else does, or that you have the right to take up their valuable time. A lot of times, the most important feature of software, is it not being disrupted. Maybe your old version has a huge security hole in it, and you desperatly want to update the end users code, but that still doesnt give you the right to impose your code on them. Maybe they are working towards a deadline and the risk of not meeting the deadline is much worse then the risk of getting hacked, maybe they are on an airgapped network, you dont know. The point is you dont know the priorities of your users, and even if you do, you dont have the right to impose your priorities on them. I have sat threw lectures at big conferences, where 500 people had to sit and wait while the speakers laptop force rebooted to install a security updates for 10 minutes. If that is security, what are we securing against?
Even if the function is only used in code you have 100% controll of, I would still suggest writing a new function (Possibly by copy-paste-modifying the previous version), and then search for every use of the original, and by hand replaceing the old one with the new where apropriet, and then remove the original if you find that it is no longer in use.
What if, you later realize that you need versions of "alphabetize" need to handle Hexadecimal numbers, UTF-8, Roman Numerals, or a bunch of other things? You you could end up with an array of different implementations to maintain. Right? Right, that is your problem, you wrote the code, and the fact that it didnt have the features it needed is your problem not your users. When you put out code in to the world, its your responcibility. You are in service of your users. If they choose to use your software it doesnt give you the right to take up their time or decide their priorities. Learn to see yourself as a platform holder.
If this seems to lead to an impossible maintainence situation, where the amount effort needed to maintaine all versions will overwhelm any developer or team, consider this: there is no maintainance of code that doesnt change. The old version of "alphabetize" doesnt need to be updated, because the entire point of keeping it in its original state is to preserve its behaviour. A single implementation of "alphabetized", that is constantly maintained and updated with new features and behaviours, will continiusly break the software that depend on it, this causes the maintainence needed to go up much more.
If you need to change an interface, do so by adding a new interface, while making sure that the old one works, either by writing a wrapper arround the new code to emulate the old interface or fork the code so that both are available seperatly. You may document the old version as depricated but you cant remove it. Someone may depend on it, and there is no time in the future when that isnt true. You may think this sounds really messy, having old versions of old interfaces in your code, but thats your mess you created it. Its shouldent be your users problem. You wrote the bad interfae, its your problem and respocibility to fix it. If forking your code means you have to maintain multiple versions thats fine. Why Is it more work for you to fix a bug in 3 different vesions of your code, then for everyone else to stop what they are doing and care about your misstake. If your release code that you expect others to use, you need to take responsibility for it.
You need to get over the idea that having multiple versions of the code available is bad. Many computer scientists got started at a time where computer memory was extremly small and they therfor put a huge premium on making the executable as small as possible. This is no longer the case, but the culture precists. The many would define "Bloat" as a large executable. But who cares how many instructions your program has, what is important is how many are exected, in other words how performant is the software.
I was once told by a developer that Microsoft Visual studio Contained several, entierly separate implementation of older versions of Visual basic. This was told to me as a horror story of how not to do software design. Some how the existence of older versions upsets programers sense of estetics. I regard this as excelent software design. There are probably thousands of software projects that have been written in the various old versions of Visual basic that Visual Studio supports, and to them beeing able to still compile and execute these projects are crusial. The knee jerk ractions from programers is for these projects to be rewritten to conform to the latest version. But why? Why spends years of development to rewrite something that already works, and in the process upset a functioning workflow, and most likely introduce new bugs? If you are going to rewrite software you should do so for a reason.
Its very important to comunicate to your users what state a API is in. There is value in releaseing half baked APIs to get feedback early, but at this point you need to be clear with your users that it is likely to change, and that any investment in time using it may be wasted. Its also (And Im afraid this is something that needs to be said), Important to comunicate if the entire or parts of a interface is not yet implemented or tested. Make it clear what your intentions are, have you releasesd something, because its completed, as a sketch? Do you intend to support it long term, and if so what changes are expected?
Early on in development you can write faux APIs where the underlying implementation does not implement all features, but still exposes how the features will be exposed in the future. If an implementation only has one mode but is intended to have more then one mode, create the indirection where the user have to select that one available mode. Add Pitfields with optimization hits before the undelying implementation is able to take advantage of them. There are a range of things you can do to create a front end for your code, that is forward compatible even if its not yet fully fleshed out under the hood.
Another possibility is to make it possible to query out of the API information that may change over time.
Example: Instead of having a struct with a list of settings that needs to be implemented in a UI, make all settings their name and type queriable programaticly. That way a probram can implement a UI, settings file or similar that automaticly exposes the settings available. If the settings change the Im plementation will automaticly support them woithout any intervention from the user.
A lot of programmers have an instict feeling that something is wrong when they have multiple implementations of things in their code. I think this is some cultural legacy from a time when programs had to fit in very small ammounts of memory. Today we have as far as executable code goes virtualy unlimited memory.
Channel that in to making sure you get your APIs right the first time, rather then ignoring the problem.
Some large companies and open source projects have come to the conclution that they have enough market power that they do not need to consider how much work their changes encur to other. Beyond the obvius bad maners and general disregard for other people this shows, I think it is just plain wrong. People use your platform to do something, not just to use is, for using its sake. Every moment they have to make changes to acomodate your platform, is a moment they arent making the applications that use your platform better. There is a huge hiden oportinity cost here that should not be ignored. The effect of not managing your platform, is one you are likely to not discover for a long time. The sunk cost of your existing users will keep them on for a long time, but new projects will chose a different platform and once they do its very difucult to convice them to adopt yours. Because of this lag, many platform holders are overconfident until its way too late to course correct.
I am a mostly self taught programmer. As such I am not trained in Classical Computer sciense. For a long time I did not consirerd myself a "Real" programmer, so when ever I heard statements about how programing should be conducted that I didn't agree with I assumed I was wrong and that "real programmers" knew better. As I gained experience and started amassing a large body of code (Much of it available as open source if you are courious), I started to wonder why I was (In my view) able to be so much more productive dispite not following well established computer science philosopy.
I started noticing that the problems that real software development where encountering when not solved by computer sciense, and often even made worse. I started noticing that many sucsessful projects where critizied for bad practice by leading computer science advocates. I also found that many of the people whos work I admired would, after a few drinks, admit they didnt realy gain much value from following many of the generaly accepterd principles of devlopment. Eventiualy I accepted, that I think that much of the way computer sciense thinks about software development is wrong. I simply found that the ideals promoted by computer sciense, where irrelevant or often counter productive to development of software.
This document is the result of me trying to distill my experience in to principles I call "Explicit programming" that I think are useful for the practice of developing software.
The most common counter to my ideas, is that developing code the way I do, is simply not feasable from a productivity standpoint. I think my output as a developer diproves that. My body of code, is larger then almost any I know of produced by a single developer. The number of applications, libraries and algorithms I have produces rivals many substantial teams. If using high level languages, cutting corners, relying on dependencies and commonly touted programing paradigms did produces orders of magnitude more productivity, there would be plenty of people who had outputs far more substantial then mine, but I havent found any.
If you are a software developer, there are probably things in this document that go against what you have been taught and what you are comfortable with. Luckaly, you dont have to agree witt this document to gain value from it. The goal is to make you think about what we consider to be good software development practice, and to do that its not neceserafor you to agree with my principles. We learn from thouse we disagree with, so I dont mind if you do. In fact, Im counting on it, beacusse I need to grow and learn too. Software development should be evolving. I am evolving as a software developer, and will likely disagree, or want to modify or add things to this text in the future as I gain experience and insight. This is my tomb of knowledge at a moment in time.
Many professional occupations like Chef, air dispatcher, doctor, soldier, firefighter, and fishermen take great pride in their workmanship and disiplin. They have a culture of exelence. Everyone is expected to do things right and shortcuts arent tolerated. The tone in these professions can be rough epecialy towarsds beginners but its because the outcome of a failiure can be disasterous. You are expected to pay your dues, and prove your worth.
Programming doesnt share this culture. Much of our culture is focused, on avoiding work, by relying on the work of others, "hacking things together", and doing things with the fewest lines of code possible. We have a culture of constantly seaching for a new language, paradigm, or library that will act as a panacea to all our ills rather then putting in the effort to solve problems. We are like a group of cheffs where the majority cant be botherd to shapening our knifes, keeping our worstation clean and pick out the best ingredients, and instead advocate buying premade meals and stick them in the micro, because its easier and faster.
This very lax atitude means that the range of skills in software developers is so great. There are plenty of developers who output just a 10th of an average developers, but there are also a fair few developers that produce 10times or more then the average developer. The difference isnt always imidietly clear. Sometimes a micoroed frozen pizza can be misstaken for greatness. Explicit programming is trying to distill what makes a programmer great and why they can be orders of magnitude more producteive then their peers. For anyone who wants to be that kind of programmer, and is willing to put in th effort, This is for you. Its part programing advice, and part life wisdom.
Explicit programming, values real world perfomance, reliability, productivity, deep knowledge, and work ethic over theories, ease of use and paradigms.
The name derives form the explicit nature of the code produced, Its code that explicitly says way it does. It doesnt try to stay within a style or paradigm, but derives its style from the practical requirements of the task at hand.
Imagine you have a problem, and the optimal set of computer instructions to solve that problem. The computer does not do anything surpurfelous. It just solves the problem. Explicit programming is a school of programming that favors writhing code that explicitly describes these instructions and little else. The style is guided by the problem rather than a paradigm. Explicit programmers tries to think like a computer, because that what we are programming. Explicit programmings valuses practical solutions over abstract paradigms and abstractions.
They can be applied to any language (Although it favours low level progarmming languages), or problem. As such Explicit programming is paradigm agnostic, but it rejects the idea that a paradigm can be a panacea that should be applied to all development. A hammer is not better or worse than a screwdriver, they are just different tools.
Everything is memory and instructions, this is the only programing paradigm that is true, because that is the hardware architecture that we curently have. Functional programming, objectoriented programming, declarative programming, Constraint based programming, event driven programing, or any other paradigm may be useful as a tool or thought experiment, but none oftem makes programing easy. None of them are fit for all purposes. Know them, and use them when needed, but dont be seduced by them and dont shohorn a problem to fit them. Every time you chose a high level paradigm, you run the risk of obfuscating what your program really is: Instructions modifying Memory.
Code should only do what you explicitly ask it to do. When you read code, it should be clear when things happen and why. This may seem obvious but many programmers spend a lot of time writing code that either tries to hide what it does, or is there to manage other code. Explicit tries to only do what needs to be done, and do so in a transparent way. If something happens you can see it happening in the code.
If your code is full of handlers, controllers and managers that are there to manage other code, you are most likely wasting your time. Solve the problem in front of you, not some imaginary future code. Managers force each components to conform to their rules, code bases should instead conform to modules. Each model should be designed using the right approach for its particular task. This means you can re use modules in other designs, rather the have to adopt a large system.
Every hour you use a design you know is wrong you embedded that bad design deeper. Drop everything and fix it now. It may be a lot of work, but it will be more work for everyday you ignore it. Dont ever think "Ill go back and fix it later". Fixing it later is more work then fixing it now. I advocate rewriting rather then fixing.
In many ways its trying to define what good software development is. Its trying to distill the ethos that makes the very best programmers, the programmers thay are.
Explicit programming for people who are drawn to programing because they like to make things, they like to tinker, take things apart to figure out how they work. Its for people who want to make every part, and are want to make things right. Its for people who who are practical rather then theoretical. Its not for people who just want to get things over and done with, or make money, or impress others.
These are basic traits in all the best programmers (and engineers) I know. It can be used by single developers,teams, or even people who manage team or organizations that develop software.
If these things don't apply to you, then Explicit programming does not apply to you. You may think that everybody wants to be good, but in my experince it is a big obsticle. People are taken in by promises of easy programming and not needing to learn or put in the effort. Deciding that you are going to be good and that there is going to be a cost assosiated with that is an important life decicion. My experience tells me that taking this decission is key to becomming successful.
You dont need an expencive computer or special software to be a good programmer. In fact I would argue that a slow simple computer and basic software can be an advantage. You dont need to go to a fancy school, or need to know anyone to become a good programmer. Having internet access can be useful, but books from a public library can do the job as well. I grew up in a house without a computer, and was only able to access one at a community center, for a few years before I got my own. I also droped out of school and was and still am dyslectic. I'm not mentioning this to say that i had it hard, quite the oposit, I'm saying it to let you know that in the grand scheme of things, these adverseries had little impact on me becoming a good programmer. What really made me a good programmer is my curiosity and my will to improve and make things.
This is the goal of this document: To make us better. That is why I would say that the only prerequisit you need to make use of it, is that you want to be a better programmer. If you want programming to be easy, you want to spend as little time as possible programming, or dont care to write particularly good software, then this is not a document for you. This is for the people who want be the best they can be and be part of writing the greatest software.
All kids want to be like the wise kungfu master. They say "That guy is awsome, he can kick anyone in the head, please kunfu master show us!", but the wise kungfu master refuses. The kids say "Please, Kunfu master, if you wont show us, then at least teach us how to kick people in the head!", and the wise kungfu master, "Yes, I will teach you how to kick people in the head, but only If we do it my way". The wise kungfu master makes the kids train hard, get up early in the morning, he makes them meditate, he makes them keep the house in perfect order. The kids say "Why arent just paracticing kicks? Maybe the kungfu master is just tricking us in to cleaning his house?" Eventiually the kids do become kungfu masters who can kick people in the head. But they will find that the kungfu master did in deed trick them. He tricked them in to becoming wise. A kungfu master is wise, not becaus he can kick people in the head, but becasue he has concered the process of doing someething hard. They needed to learn respect for the craft, how to be patient, humble, relentless, and to care about the details, in order to be kungfu masters. These are all skills needed to be able master something. This have brought them wisdom, and as a wise people they will also no longer want to kick anyone in the head.
In order to be good at something, you need to not just want to be good, you need to want engage with the steps necceser to make you good. A master is a master not just because they can do the hardest thing, but becuse they want to do the hardest things. Masterchefs, drill sargents, kungfu and jedi masters all ride their students hard on a bunch of details that seems not focused on learn the skills at hand. Thats becaus they are trying teach the students to focus on doing the task at hand well, instead of trying to find a shortcut to the end.
Being good at something is not a requirement in life. I fully respect people who chose to live an easier life, and you should too. Life is hard enough for most people as it is. What I want to teach is mastery, what i dont want to do is to do is advocate for mastery. I think mastery should be given the kind of respected that is earned, but also that everyone should be given the kind of respect that is owed everyone, if they chose to live a different life.
If you have ever tried to write a web page, the complexity imidietly hits you. Why does it have all these tags when all I want to do us to put some text online? Why does it have to be this complicated? As soon as you manage to make your text appear on the page, you are struck by how bad the margins, the font, the colors and everything else, looks. So you start adding more and more tags. First you change the text to be "blue" but then you realize thats not the right blue so you resont to using hex codes for your colors. The longer you spend on the page to more control you require. In the end your complaint about the web editing process isnt that there are too many things you have to do, but that there arent enough things you can do.
Internalize this experience. Its true for so many things. When you first start to learn something, be it programing, or any other technology, or feilds as wide as medicin, politcs or the law, it seems overwealming and needlessly complex. Its easy to fall in to the trap of asking if things really need this complicated? We reach for simple answers, but the more we learn and the more proficiant we get, we start to see the complexity as assets, as possibilities and eventiually, we see that the worlds problems are often a result of us not takikng a nuanced enough aproach that engages with the complexities of the world.
A lot of programming ideas revolve around getting results fast, not about being in controll. Cant we do this in fewer lines? Cant the compiler figure it out for me? Why do I have to do all this typing? They are all the result of a naive view of what is of value. These questions are asked by people starting out a project, not people knee deep in actiually building something.
So, why does programming have to be hard? Because what we are trying to do is complex. Code is written to interact with the world, and the world we live in is complex. Embrace that challange. There is no magic solution. Realize that in the end you will want to know how things work becaus you want to be in controll. You will want to understand. 100 lines of code that you understand fully are better then 10 lines that the compiler magicly appears to do what you think it should.
Dont be afraid of complexity. Complexity is unavoidable, and it is a sign of control. Think about how to manage complexity instead.
Politicians love to talk about getting rid off complexity in the from of byrocracy, laws, taxes. They ask why does everything have to be so complicated? They love to say "Cant we just...". However almost all of our problems are becasue our systems are not refined enough, to handle an infinitly complex world. Rather then try to understand the system and try to improve it, its much easier to sell a simple solution. Simple solutions are attractive because they don't require us to think and learn. But complex systems always win out in the end. They handle more situations. They have more flexibility. They offer control.
Its alurring to imagine that a language, or coding style magicly should make comlex things simple, Just like its aluring to think that there are simple solutions to societal problems. The most important barrier to break, to achive greatness, is to want to know more, and engage deeper.
A regular person may drive a car with an automatic gearbox. They simply dont want to think about the mechanics of the powertrain, they just want to get where they are going. A race car driver uses a manual gearbox, because they want control. A truly great driver, doesnt just want to drive, they care about everything that goes in to racing. They read the 1000 page rule book, they contantly listen the mechanics and follow the work of the enginners. Idealy a racecar driver wants to control not just the gearbox but the turbo boost, weight distribution, suspencion setup, rake, differential, tire choice and any other thing that can give them an edge. A race car driver, wants all these things because they aspire to be as good as they can be. Someone who dont care about driving can drive an automatic and not be aware of the make of the car that they drive. That someone, may say "Why would anyone want to drive a manual gear car? its so much easier to drive a automatic". Their goal is to not have to understand something, while someone who wants to be good should always want to understand more. When I hear programers complain that they dont want to care about memory allocation, CPU cashe structures, compiler design, OS design, and the many other things that influences how software performs, then I know they can't be great programmers. If you are going to be good at something, you have to want to be good at it.
Simple solutions are decirable. They are elegant. If you can build something with fewer parts, then there is less that can break, its easer to make, service, and learn. We should always look for simple, elegant solutions, but they still need to solve the problem at hand. Building an airplane is much easier if you build it without a landing gear, but eventiualy you are going to realize that a landing gear would be nice to have and at that point you are going to wish you had planed for it all along. Good simple solutions are hard to come by, and they often take more work, then a more complex solution. Always look to make things simpler and more elegant to make them better, never use it as an excuse to avoid doing the job of an engineer.
Ideal circumstances dont exist, reallity is messy and full of special cases, and code needs to reflect that. There is no one paradigm that that will make programing easy, because code has to handle reality and reality is not easy. What is beautiful to Computer scientists, is a small clever recursive algorithem. What is beautiful to a user is an algorithem that encompasses every special case in the best possible way to solve the task for each situation. There is a huge difference between the two.
It is very natural for people who wants to do something to ask "What do I need to learn to do X?". While there is nothing wrong with this, I encurrage you to learn in order to be able to ask "Now that I know this what can I do with it?". It might seem like semantics, but the mind set is fundamentaly different. When you are confronted with something you dont understand its easy to be frustrated, and just want to get past it. Learning becomes a barrier you have to overcome in order to do what you want.
I encurrace you to see learning as adding tools to your arsenal. You dont buy a scewdriver in order to screw in one screw, you buy it because you know the word is full of screws and a screwdriver will alow you screw and and unscrew a lot of them. Ask yourself: is learning something I have to slogg thrug in order to do what I want, or is learning what opens doors to new possibilities. By learning this way you are open to comming up with new ideas that chllange your assuptions about what you are trying to do.
This is a fundamental conflict between people who embrace technical knowledge and thouse who just want to use it. Many people who dont work with technology see technology as a limitation. They go to an engineer and say "I want a flying car", and the engineer will start off listing all the techical issues with flying cars, like safety, noise, controll, pilot qualifications, energy requirements, polution, battery weight and so on. The non-technical person often reacts negativly thinking that the engineer lacks imagination, just sees the problems in everything, or is a bad engineer. They can forge ahead to build what they will eventiully realize is a Helicopter, hitting their heads against every law of physics, enginnering and reason on the way. What the non technical person is missing is that the enginner probably knows of much better ways to solve transportation then flying cars, things that a non technical person could never imagine. The layman cant recognize the engineers imagination, becase he or she cant imagine what its like to think as somone who has a firm grasp of technology.
Science is the search for what can be proven to be the truth, engineering is the search for what practicaly works. Both are incredibly valuable, but dont confuse the two. Just because something can be proven to be true doesnt mean it is usefull. In science proof is the only requirement, in engineering we have many requirements that needs to be meet.
Most bugs happens because you and the compiler have different ideas about what your code does. Debuging is the process of figuring out why your code doesnt do what what you think it does. The compiler is usualy right about what your compiler does, you are usualy wrong. Good code has to be readable by you, not just the compiler. It is therefor the job of a compiler and a language to be as clear as possible about what it does. If there is any ambiguity the compiler should notify the user and say, "I dont understand what you mean, please clarify", or "You are doing this, are you sure thats what you want?". Software should not think its smarter then users. The more things that are hidden to the programmer that harder it is to understand what the compiler does, and the more likely it is that the programmer missunderstands to compiler.
Get over the idea that the act of typing code is time consuming or hard. It is not. Designing an algorithm is hard, Architecting a system is hard, Debuging is hard. Monkeies can type. Fewer lines of code is not a virtue, Clarity is. Performance is. Stating explicitly when costly operations like memory allocations, disk access, System calls, networking, and mutex locking happens is important because it makes the cost clear. Copying some code in order to write a different version of the same thing is often clearer then trying to have one general implementation filled with if statements for various uses. If you know a better way to implement something you have already written, do it. If you know what the better design is you have already done the hard work. A lot of programing environments, languages, and systems pride themselfes on how little you need to type to accomplish things. Typing is easy, so I would rather have a system with better clarity, better debuging, shorter compile time, or a range of other features.
If someone takes a screenshot of a randomly selected page of code from your code base, and read it; is it understandable without context? In other words, how much does understanding the code depend on things that are defined elsewhere? Trying to keep code understandable without context, should be a prime objective for clarity. Obviusly not everything can can be defined in place, but when things are not defined in place, it needs to be very clear that they are not, and it needs to be clear to what extent it impacts the code that is in place.
This is why functionality such as function and operator overloading, macros, and name spaces are so dangerous, they change the context of the code. Copying the code from one part of the code base to another yealds a different result, even if the code looks identical. Any part of the language such as keywords and basic consepts such as flow controll should never be redefined, or obfuscated. To a large extent I alos discurrage the redefenition or renaming of basic types.
for (i = 0 ; i < 10 ; i ++)
Your brain should instantly recognices the pattern down to using "i" as an iterator. If we compare this to a macro that does the same thing:
loop (10 )
Your brain, wont instantly recognize what is going on. You may have saved a few key presses, but your brain is on overload not to think this is a function call that is missing a semicolor.
The C preprocessor is an incredible powerful tool that can be used for a wide range of things. As such i limit the use of "#define", to either fully capitalized constants, and function calls, that are function calls. When you read this:
for (i = 0 ; i < NUMBER_OF_LOOPS ; i ++)
#define NUMBER_OF_LOOPS slow_function_with_lots_of_side_effects ()
Then it is very missleading. I find that there are two reason to use the preprocessor to define defines with parameters. Either to create debug versions of functions that add __LINE__ and __FILE___ Macros to the parameter list of a function. Or when you need to force the compiler to inline code. In both cases, you should implement versions of your code that doesnt do this so that you can switch back and forth, to veryfy that they yeild identical results, and to help in debug
void *debug_malloc (size_t size , char *file , unsigned int line )
{
printf ("Allocating %u bytes in file %s on line %u /n ", size , file , line );
return malloc (size );
}
#define malloc (a ) debug_malloc (a , __FILE__ , __LINE__ )
I think a good rule of thumb is that any code copied from one place to another, should either break, because things are not defined, or work the same way because it is defined the same way. This means: never define anything in one place functionaly different from an other.
Everytime you abstract you run the risk unintended consequences. Any kind of ambiguity creates hazzards. Programmers need to be able to trust the code they see. Programing features like Macros, funtion overloading, implisit initialization, operator overloading, templates, implizit type conversions hides that happens from the user. Use the language as it is, do not try to hide it by redefining it to something else. If a handle is a pointer to a structure, do not hide that its a pointer. If a pointers is the languages natural way of expressing a reference to an object, then the programmer is used to seeing pointer, and knows what they are. The special type you define just for your thing is forigin to a programer, and wont be as easy to internalize, no matter bhow brilliant it is to you. Any time the language is clever, it forces the programmer to use more of her brain power to understand how the compiler will interprete the code.
Its a common trope in computer sciense to say that "a function should only do one thing", but no one can define what "one thing" is. An addition, is that one thing? Photoshop edits images is that one thing? I prefer to divide code in to sections that can have clear names. In general I prefer longer functions, that mainly are broken a part when sections are called in multiple placs. We read from top to bottom, we read sequentially. If your code is sequential means that its easy to follow. Jumping around and its easy to miss that the code does thinging you didnt intend it to do.
To organice code, I use a scheme that is pervasive throughout all code and files. The first thing you want to do is to distiguish between code, variables definitions and defines, and other mechanics your language provides. While the precise scheem you chose is a matter of taste and habbit, rather then objectiv right and wrong, having a consistent format greatly helps with readability. A consistent design can comunicate what things are quickly just as good syntax highlighting can.
I use capitals spaced with underscores for defines and constants, structs dont have spaces, but the first letter of each word is capitalized. functions and variables are lower case spaced with underscores, and so are my file names.
The reason why I think this is better is that it creates an address path towards finer and finer granularity. We start the address with "object" to create context and then we go on to describe what we do with the object. In reality, having just two levels of addressing is almost never enough. A real functions may be named:
We can here follow the path, from project, to module to sub mudule down to specific functionality. It also makes your code more easily searcable. By searching for "game_rendering_object_property" you get a list all uses of all propperty functions, and where they are beeing used. This is why using long global, unique names is very powerful and why I think name spaces are a bad idea. You cant quickly search to find every instance of a particular functionality. The code can easily be identified in its global context, and with no indirection you can copy and paste code between different files, and be sure they have the same meaning.
In a header file it becomes increadibly easy to find blocks of functionality that all deal with the same thing. Writing prefix, not only refines the search as you read it, it also makes it easy to navigate a header files. Here is a list of functions with postfix names:
Notice how much easier it is to navigate the prefix functions, just because we read and format from left to right. Its much more organized, and for instance its easy to identify all functions that deal with rendering objects.
This naming scheemes, is recognizable for how wide code it produces. In general I think that wider code I.E, code with long names is decirable.
We can also use the same naming structure for file names. The above functionality could reside in a file called:
It is now always abundantly clear where functionality recides. All functions that start with "game_rendering_object_property" should be found in this file. By storing all files in a flat hirarcy, you can easily list files to see the structure of the project and where functionality recides.
Many people have a header file for each code file, but i find that to create an awful lot of header files, with very little in them. I instead prefer to have fewer headerfiles that encompass multiple files. Given our previous example we may have a headerfile named:
It would encompass all functions if all of the files that start with "game_rendering". For modules i usualy have just 2 headerfiles. One external and one internal. Lets imagine we have a simple module called "hello", It may consist of the following files.
hello_internal .h
The "hello.h" file includes the entire outward facing API. It does not include anything the user of this module shouldnt have access. "hello_object.c" and "hello_property.c" implements the functionality of the module. Finally the "hello_internal.h" is a header that is included in both "hello_object.c" and "hello_property.c" and it contains shared definitions and datastructures that are not visible to the user of the module. The purpouse of "hello_internal.h" is to link the c files together, and to keep "hello.h" cleen and free from implementation details.
This structure again makes it very easy to know where to look for the functionality you need. These external inteterface headerfiles are a great place to describe and document the functionality as it is the starting point for any user.
Due to this naming scheeme I find it incredibly important to have a consistent style for paranteses. I do not have spaces between fucntion names and paranteses, so that i can search for eigher "game_rendering_object_property" to search for all functions that start with this path, or search for "game_rendering_object_property(" to search for a specific function. I always newline for braces, so that brace pairs are easy to identify.
I find that comments are often harder to read than good code. Outside of headerfiles, comments serve little purpouse. I would much rather focus on good naming, then writing lots of comments. The main reason to write comments in-line in your code, is if there is behaviour and systems that are not visible in the code. The main example I would give of that is multi threaded code where a peace of code runs concurrently with other code, residing someplace else. For anyone reading the code, it may not at all be clear why do code is jumping thrugh hoops to avoid a deadlock, if you dont see that there is other code that running concurrently. Another example is code that is a workaround for a broken API or hardware. This can produces code that looks nonsensical, that someone can easily break by accidentaly "fixing" it. In these cases a clear comment warning anyone reading the code can be warranted.
Another good use of comments is to add searchable tags. I use "FIX ME" as a universal tag for anything that needs addressing. At any point I can search the code for "FIX ME" and find known issues to work on. Obviusly you can invent your own tags that fit your team and project.
Name what things after what they are or do, not what they are used for. Very often you write a peace of functionality for a specific reason and then later you realize that it can be used for other things. If you then have named it for is use rather then for what it does, it becomes very confusing.
In computer graphics hardware the process of wraping an image arround a model is called "texturing", because it gives the object texture. However as functionality in modern programable graphics hardware, this translates in to a filterd lookup, in to a multi dimentional lookuptable. The is incredibly powerful and versetile functionality and accounts for much of what GPUs do in almost any use, yet its still called texture hardware, dispite just beeing one possible use of the functionality.
We have already talked about the value of a strict structure of naming. A well executed naming scheem also means that the user can guess correctly what functionality should be named. If your game, has a matrix property you want to set in its object in the rendering engine, you may guess its in "game_rendering_object_property_matrix_set". To make it predictable, make sure that words that are used match each other. For instance if you have a function that ends with the word "create", then a corresponding function should be named "destroy", not "delete", "free", or "subtract". The point is that if a user sees one function they should be able to predict what the coresponding fuction is named. There are many words that pair well with each other like "allocate" and "free", or "set" and "get".
Beyond having uniqe names, it makes sense to define uniqe words with specific meanings for your project. Imagine you are implementing a game of Monopoly. To do this you need to load image "assets" of the board and crads. You also need to implement a structure that stores each players "assets", in the form of money, streets, houses and hotels. If you then go ahead an implement a function that loads assets, its hard to know if that code loads images or the state of a players finances if the word "asset" is used interchangably to be about image assets and player assets. By renaming the players assets in to something different like "properyt" you make it clear that it is distinctive from assets. While a word like asset has a wide meaning in the english language, it makes sense to give it a much narrower meaning in a code base.
When designing software, you are (hopefully) designing primitives with specific uses, possibilities and limitations. It makes sense to define strong naming conventions to tell them apart. By naming then you create a strong shorthand to talk about the capabilities of your software. This naming conventions are usefull both within the code, and to comunicate functionality to the end user. Often we use generic words like session, object, node, fragment, layer, agent, operator, tasks, device, actions, sets, filters, and handler to describe functionality, but it makes sense to define much more narrowly what these words mean in the context of the software you are writing. By codifying the meaning of the terms, it becomes easier to enfoces implicit rules that may govern different types of code.
A "graph" is a structure that describes a set number of "nodes" that are connected to one and other. "Nodes" are invocation instances of "actions". "Actions" can only access input and output from a "Node", and may not access global data. A session can contain multiple "graphc", but "nodes" in one graph can not be connected to nodes in another "graph".
This kind of naming conventions that put clear bounderies arround different modules are especially useful when dealing with multi threaded code where it is extremly important to manage access rights. When choosing these words i tend to avoid word that are already used to describe language features of the implementation language, like "Functions", "objects", and "procedures".
Local variables dont need global adresses, so its much better to focus on clarity. Clarity can come in the form of expressive text, but often its much better to make the variables recognizable. If we again take the following example:
for (i = 0 ; i < 10 ; i ++)
It is instantly recognizable, and we make an asumption about the type of "i" even if its not explicitly stated and we also recognize it as a iterator, even if its only written out as a single letter. If we compare it to:
for (iterator = 0 ; iterator < 10 ; iterator ++)
It is less clear because its not instantly recognizable and therfor you need to read the code to figure out what it does. Using "i" is a very common ideom of programming, so for this to work in a broader sense, we need to expand the number of commonly used variables to cover as much as possible of our code base.
-i , j , k , l , m
Iterators, always of integer type, usualy unsigned. If a floating point iterator is needed, i use, fi, or di.
for (i = 0 ; i < 10 ; i ++)
sum += array [i ];
b = &buffer [10 ];
-p , a , b , c
-v , v2 , v3
Used for finding the best value in a data set according to some metric. "found" is always an integer or pointer, and best can either a floating point or integer value.
found = 0 ;
best = array [0 ];
for (i = 1 ; i < 10 ; i ++)
{
if (array [i ] > best )
{
found = i ;
best = array [i ];
-f , f2 , f3
used for temporary floating point values .
On top of these I have loads more that are domain specific for projects. One common way of deriving these is to use function definitions as a base for variable names. If you have a function that looks like this:
void project_container_destroy (PContainer *container );
It makes sence to call it with a variable called "container". It also makes sence to call it "container" thrugout the implementation of project_container_destroy, and other functions that use the same type. This way you instantly recognize that a variable called "container" is a pointer to the type PContainer.
Good software is software where the user is exposed to a few simple consepts gives them as much power to accomplich thing as possible.
LEGO, is a great design because with a few simple parts, youb can build anything you want. You dont need to own specific lego features in order to build a spaceship, you can just use the basic peaces. Special peaces, may aid in building a spcaeship, but even these special peaces conform to a general format that makes them easy to understand and integrate in to a design.
Unix is a great design becaue you can use pipes together with various comand line tools to create all kinds of functionality. The system doesnt need specific features, because you can use the basic functionality to build the specific features you need.
Users can only use what they understand, so these consepts have to be understandable. In order to make them understandable, you need decide what is exposed to the user and what is not. A program that shows images, may need to expose the user to the fact that images are stored in files, but does not have to expose the user to the secifics of the file formats used. A data base can let the user store and retrive data, but does not have to expose the user to what algorithm it uses to index the data. If it does, it empowers the user to better tune the perfomance, but also adds more complexity and concepts for the user to manage. Good software magically removes concepts that arent empowering the user and gives the user concepts that are.
Greate advances are often made in software when someone manages to remove the need for a concept to be managed by the user. 3D rendering that lets the user, video editors where the user doenst have to manage different formats, automation tools that doesnt require the user to write code, a system that can automaticly convert data types seamlessly. Often an insight in to how a concept can be removed or automated is a key factor
When you design software the goal is to build as few things as possible that can do as much as possible. The way to do this is to alow the users to combine different aspects of the software, in as many ways as possible. You dont want to add to a software, you want to add dimentions to your software. If the system you have designed is clear simple and fit for purpouse you need less features designed for a specifice purpous, since the basic design lets users do what they want within that framework. Good software design, lets the user do things you havent thought of. Good software lets the user use and combine its capabilities to solve problems the developer has never encounterd. Users often think in terms of features, but its your job as a software architect to translate that in to flexible systems that can do what your users request, but are also flexible enough to do what they havent yet asked you to implement.
At a product presentation for a video editor, I once heard a product manager proudly announce that they had after many user requests added an option to hide tracking points that would obscure the main view if you had a lot of tracking points. This is terrible design, duisguiced as "listening to the users feedback". If the tracking points are anoying maybe redesign them to not be anoying? Did i hear they obscure once you have too many? Maybe count them and make them smaller or more transparent as the grow in numbers? Instead of fixing the problem they added yet another thing that users have to learn and manage.
Productivity is messured by how long code can be used without needing to be updated divided by how long it takes to implement. The longevity of your code should the a prime metric of code quality. Longevity is obviuslly valuable. We all consider quality to be the messure of how long something remains fit for purpouse, be it buildings, cars or furniture, so the same thing should apply for code. While programmers do talk a lot about code quality, its rarely talked about in terms of longevity. Often code quality is defined as code that is easy to modify, rather then code that can be used for a long time without modification.
I would argue that Longevity is more important in code because cost of deterioating code is much much higher then for other things. If your table breaks, you can buy a new one without needing to do a major redesigns to your entire house. When code has to be taken out and replaces, It tends to be very disruptive and requires a lot of redesign of other things. If your car breaks down and you have to buy a new one, you can usually just get in to the new car and drive off. You may have to be told about how to use some obscure feature but the vast majority of your driving skills will carry over. New code (or god forbid a new langiage) requires the users to relearn its interfaces.
Going back to code you wrote a month ago is significantly harder then going back to code you wrote yesterday. Going back to someone elses code is orders of magnitude harder then code you wrote. The reason to complete code now, is so that you can clear your minds of the implementation details and take on another task. Everytime you have to go back to something you wrote a long time ago you have to take time and effort to refamiliarize yourself with the implementation details of something you no longer remember. Task switching has a high productivity cost, so stay on one thing, be sure to complete it, then move to the next thing. Leaving things for someone else to deal with is bad programmer manors.
The longer time you have used code, the more tested and therefor trustworthy it has become. The moment you make any change you sowe the seed of distrust that the code no longer works as you expect it to. Every time you use the code you are writing yet another test case, that verifies that the code is sound. The moment you make a change you go back to zero.
-Avoid trends
You should always avoid writing code you dont expect to use for decades. Not just becaus it good the have long lasting code, but also becaus its bad to have long lasting code that wasnt intenionaly designed to be long lasting. What ever is hip and cool now wont be cool in 10 years. When chosing technology maturity and stabillity is key. Will this be supported in 10 years? Will there be people available who can use it? Are the tooling mature? In programing new technology is bad technology. If its been arround for decades it will probably be around for decades more. Swithing is more painful, then the gain of new features is worth. The users dont care what cool hip technology you use, they care that the thing you make works. Computing is full of trends, and fads that constantly change. My argument is that you should almost always avoid trends and instead focus on a long term strategy.
Changing an API requires everyone who uses it to not only learn it, but also adopt all all code that depends on it. This is a huge cost to everyone involved, with the possible exception of the one who makes the change. It is there for incumbant on you to not make changes unless absolutly neccesery. If possible make the legasy interface available concurrently. If you think it looks bad to retain the old version, when you want everyone to move over to your new version, then tough shit. You messed up so now you have to live with it. Othere people have more important things to do with their time then to adopt their code because you messed up.
If you have users depending on your API, changing it is bad manors. The Linux kernel and Windows comes to mind as good citizens who respect their users time and effort, while Apple, Google and most Linux distributions don't.
Even if you are mindful of the effects of changing APIs the rest of the world isnt. You are going to need to depend on other peoples technology, but when you do you need be mindful of the risk profile you create. The concept of longevity rests on minimising risk. Keeping your code free from dependencies means reducing the attack surface of external shocks. If your code depends on an external service, company or software, you have all kinds of risk exposiure. What if the API change? What if they go bankrupt or stop supporting it? What if they change how they charge for the service or change their License agreement, or simply decide they dont want you as a customer? As you add more and more dependencies, these risks compound. You can reduce risk, by choosing fewer dependencies, but also technologies that are have multiple implementations, are open source, are not tied to a single entity, choose technologies where older versions will be supported, or where the possibility of forking exists. This is true for both hard and software platforms.
Languages are dependencies too. Languages are probably the dependenciy that poses the greatest risk. If a library or platform disapears, you may be able to repalace it, but if your languange is no longer fit for purpouse, you have to start over from scratch. Will a tool chain exist for the language of choice in the future? Will compilers be activly developed for new hardware architectures? Will there be enought skilled developers to hire?
In my opinion there is no reason to ever call any outside software directly: wrap EVERYTHING. When you wrap software try to always wrap your implementation arround at least 2 different external APIs. This way if one goes away you have anothere to hold you over until you have added additional implementatuions. Your code should build and run without installing or downloading any additional libraries, sorce or installing SDKs. Downloading a library is easy, but as you add more dependencies the chance that one dependency is no longer available, or has changed its interface increases exponentially. Code that is reliable today wont be in the future. Anybody can be hit by a bus, any company can go out of buisness or be taken over by people who change course. When you have to rely on existing tecnology like languages, chose stabile tech that is mature and have many indipendent implementations. If you need your code to interface with a SDK in order to run, make it a dynamicly loadable library that the application loads in. This means that if the SDK is not available the aplication can still be compiled and run.
When you write software that uses an API, you do so because you cant or dont want to write something yourself. You use an operating system, so that you dont have to do your own low-level memory management and task switching. You rely on others to do things for you, because your interst is different from people who are interested in writing the fastest possible task switcher in x86 in assembler.
If you need to change a API, then thats because you fucked up, and that on you. Its not the fault of the people who youse your API, so you cant command them to change their code just because you messed up. The argument "but the new one is better" holds very little weight to their priorities. They use your API because they dont want
When desiging software, the first thing to consider is the scope of the structure, that future development will have to live in. At the start of development many decission will be made in rapid sucsession that will have great implications in the future. Once a desiccion is taken, other decissions are made that depend on the first desission, and the older the original decission have been arround the harder it is to undo it. Thefore it important to think ahead. The goal is not to plan out everything in advance, but to not inadvertedly make things hard in the future. The structure you design in the beginning will remain for a long time. Parts may be swaped out, they even be designed to be swapped out, but that too takes planing
Some enginners and managers prefer to only have a very limited scope of a project at the beginning, focus on making something simple that works and then add features. I prefer to know as much as possible up front, and plan as far as possible ahead. The purpouse of this is not to set a detailed roadmap that stretches out long in to the future. What the software will need and in what order, will change many times, so the goal is not to avoid change but to antisipate and prepare for it. The way to do this is to define the posibility space of the software and where the bounds of it fetureset can reasonably be set. It is more important to discuss what it could be vs what it clearly can not be rather then precisely what you think it will be. Writing a feature list is useful to make sure that the software meets external requirements, but in my opinion, focus on features are the the enemy of good design.
Lets say you are making a video editor, and in it you have a time line that produces a videostream. Sounds straightforward. Now lets imagine you want to edit video that is sterioscopic, now the timeline produces 2 video streams, or you are making a immersive video installation that may employ many screens or projectors, then a timeline may output numerous video streams. If you assume that the limelines and video streams are always one to one, and then go back to separate them once the project has grown after a couple of years in order to support seriosopics, then that will be very painful. Deciding that they should be separate data structures have very little cost if its done up front. Just because its decided to separate them up front doesnt mean you have to actiually implement all the features a user may need to do sterioscopics or multi projection installations, or even make it possible for the user to do this at all. What it enables you to do, at a future point is to add these features if you decide you want them, without a huge rewrite.
In my experience designers, and decission makers. Rarely have a long term plan, and if they do they dont share it or it tends to change. A lot of misguided managers and decission makers think: "Im not going to tell my enginneers this going to have to be networked in the future, beacues i dont want to over load them with feature requests that we dont need until next year anyways". This is a terrible practice, and creats untold wasted hours rewriting systems that where never fit for purpouse. You need to know as much as possible up front. Knowing what the app is ment to be like in 5 years, is not a distraction from what needs to be done today, it is making sure we are working towards having that app in 5 years and not having in 10 years.
This means that its up to the engineer who designs the system to think ahead and antisipate what might come. This means ask a lot of question, think about the posibility space, and ask directed questions, about precisely what they mean. "When you say the user can load a document, do you mean that the user will never be able to loade more then one document at a time?" When asking these questions the scope will enevitybly grow. Knowing this information upfront is so much more valuable becaus expectations get more aligned. Whenever someone says that "The software will never be required to do X." I tend not to belive them, if I think there is a chance that it will be a requirement later on. In this situation, you may write it with this possibility in mind if its trivial to do so. If not supporting this use case would reduce the complexity and effort needed significantly, then I would ask a few more times, make it clear to the decisison maker that this is not a decission they can go back on, and explain the added cost of supporting it now, but also the significantly larger cost of supporting it at a future point if they change their mind. If they still think the feture isnt needed, get it in writing and keep it on file.
There are some requirements that are especialy important to make upfront, beacuse they are notoriusly hard to add late in aproject:
Once you have a vague idea of what parts of your software application is likly to change and need expanding, then you can start planing out what parts should be abstracted. Dividing the various parts of your code in to modules, and making them talk to one and other is a large subject, and it will be coverde extencivly latrer.
Preferably you can build a small core that interacts with a wide range of modules. If the core, and the way it interacts with modules can be made satble, a lot of things will become a lot simpler later on.
One valuabel consideration is the relationship between different modules. What module is calling what module? In general I advocate having all module interaction be one directional. Module A calls the API of module B, but module B never calls module A. If B needs to notify A let, A register a callback with B, so that B can call A. This creates a one directional dependency. A depends on B, but B does not depend on A. If A wants B to tell it when something happens let B register a callback with A.
In genral if code is dependent on other code, it makes sense to staticly link to it. There is not point in dividing an application in to multiple files, if separating them breaks the application. The exception is plugin architectures. A plugin architecture is usefull when you have optional features that the application can in theory run without. It also divides a software project in to multiple projects, and this can be increadible useful for both managing the project and dependencies. Lets imagine we are building a sound application and we design a pluging architecture. Each sound effect can be implemented as a separate code base. If for instance you hire a new junior employee they can be given a specific task to write a new de-esser, and this code is now entierly separate from the main code base. If the employee turns out to write terrible code, you can bin the entire plugin, and you wont have to worry about bad decission leaking in to other pars of the code. Similarly if you want to support a specific sound system SDK, you can write a separate pluging that does this. This means that only the people making this integration needs to install the SDK to build the plugin. The plugin now hasa dependency to the SDK, but the project as a whole has not added a dependency. Another good reason to build Plugin architectures to let outsiders write code that interface with a propriatary code base.
There is a trend of thinking "software is never finished" and a process of "continuous integration". I think this is a very bad mental model for producing anything. Yes, you can argue that anything can be improved, and that nothing is ever perfect, but that's fundamentally quite different from saying nothing can be finished.
There is a saying that in order to make great art you need two people, one great artist and another person to stop them from working on it when its done. Any project should have the right scope. Too much software degrades because it was designed to solve one problem, but once it does so well, the developers goes on to add more features, to solve other problems, and consequentially the software bloats and becomes a mess.
The idea that software has to evolve, instils the idea that new is always better than old. The very notion of new and modernity itself becomes a virtue that requires no justification. Lets redesign the interface! Why? because it is old. Lets change things that works so that they do the same thing but in a different way. Change for the sake of change.
If the project has no set end goal then the direction can change, at any time and do so over an over. No decision is ever final, and everything is always up for debate. If you have an infinite timeline for your development, then why prioritize? Why not put off hard work? On a long enough time scale you will get to everything, so why do it today? If your software only needs to hold together until next month when you are planning to update it, why build it is a stable foundation? Why insulate it against dependencies? If software is never done, then why bother, when the problem will be someone elses at some point anyways?
If your software requires constant up-keep, that is now a tax the world has to keep paying just to be able to do what it was able to do yesterday. You just ate a chunk of the worlds productivity indefnetly. Progress is when we can solve problems in such a way that we can move on to solve new problems. Change doesn't just cost the developers time and effort. If software always changes, users have to constantly adopt, re-learn and update software, costing them time and effort that needs to be justified.
Imagine that the software you release will be the last version. Ask your self for how long will it be used?
I think the best way to write software is to, start with a vision of what how it should solve a problem. Then you define a clear scope around that, what the software does, and does not do. Then you implement that. One you have implemented that, you need to evaluate what you have made. At this point there are two outcomes: Either the software just wasn't a good idea and needs to be scrapped, or proves to be useful, but inevitably needs some work. You can plan your software in advance but until you have it and can use it you cant know how your vision will perform in practice. I tend to find that some workflows can be streamlined, some features are missing, and some are never used. A lot of times a particular way of using the application proves so good, that it renders other features obsolete.
At this point you can do a number of rounds to refine and optimize the application. Eventually you end up with a peace of software that is completed, bug free, and reflects your vision, with the added experience you have gained from the process.
Once the software is complete you should be able to walk away from the software, confident that you have left something useful behind. However, a lot of times, you don't want to walk away, because your head is filled with ideas for how to make the software better. Something I have learned the hard way, is that most of the time its better to step away.
Standly Kubrik once said that "the best idea is usually the opposite of the first idea". I think about that a lot.
A format is a definition of how to express something. So many things in software engineering can be thought of as formats. A data structure is a format that defines how data is stored in memory, an API is a format for calling code, a network protocol is a format for coding and decoding data sent to a remote machine, a file format is a format for desctibing some data like an image or a document. Even a programing language is a format for describing instructions for a comupter to execute.
That is a very wide definition, so wide in fact that it might border on meaningless, but before we get in to the specifics, and we will get there, I want to show you that there are a surpricing ammount of lessons that apply to all of them.
All formats are inherently comunication devices. You use a format so that something can be understood. Formats are not the message, but they are the medium, and they there to define what can be expressed and how it is done.
Any form of comunication has to be implemented at minimum twice, once by the provider of information and and once for the reciver. There is no point in saving a image as a file that no program can open, write a program in a language no compiler can compile, calling an API that no one has implemented or writing a variable unless you intend to read it. Idealy thoug, you want to reuse a format as many times as possible. Just like when you chose to learn to speak a language, the value of the language goes up if there are lots of other people who speak that language.
Some sucsessful formats become a standard. Standards are sometimes officialy and sometimes unoficialy accepted formats. Some standards originate from standard bodies (they are ususlly bad because of design by comitty), but other standards organicly emerge because enough people adopt other peoples formats. (Once a defacto standard has been established, it can be useful to create a standards body to maintain it.) These naturaly emerging standards have to be good and useful to many people, because adopting them is optional. Given that organicly emerging standards, have to be good fomats, we can use them to learn what makes a susessful format.
Understanding how to create a good format that can grow in to a standard, should be fundamental to any software enginners skills, yet its not a topic explored enough in engineering (or in standards bodies for that matter). A standard may seem like a very rigide structure for a design that requires more work and agreement, than what is neccesery for most software projects. That can be true, but standards share many caracteristics of good design with software engineering that worth exploring their properties. Good standards arent complicated at all. If the format you are using internly in your project or organization have the propperties that would make it successful as a standard, then you are probably doing really good systems design and you and your organization will reap many benefits, just like the wider industry would if they adopted it.
Lets say someone needs to measure the length of stuff, so they picks up a stick and says "How about we use this stick to measure stuff? Lets call it a meter!". Congratulation this person has just invented a format for expressing lengths! At this point really any stick will do, all you really need to do is pick one and stick with it.
Picking up a stick is really easy and anyone can do it, so lots of people will. In fact there is a enough sticks in the forest for everyone to have their own. Unfortunately not all sticks are the same length. There is an obvious value if everyone could agree to use the same stick, but since anyone can pick up a stick, why would anyone let someone else have to honor of being the stick picker? You may want the honor of being the stick picker, and we may all have opinions about the right length of a stick, but a stick agreedupon by everyone is infinitly better so get over it. A lot of really bad standards, that everyone agrees are bad (NTSC), persist because the value provided by being compatible overrides all other requirements. Compatibility is the differece beween something working or not working, and most of the times having things work is pretty high on the requirements list.
Whether you use what everybody else is using or not, once you have messured everything with one stick, it becaomes very costly to change stick. It very easy to make fun of people who use a bad format, but while it can be easy to spot a out of favour format, apreciating the cost and effort required to change the format is much harder, and very often underapreciated.
Software engineers love to complain, that systems use the wrong formats, be it APIs, data structures, protocols, or was written in the wrong language. Most of the time this is very counter productive. People underestimate the value of a working format, even when poorly designed. Any functional system that does what should is infinitly better that any imaginary design that have yet to be built. A factory full of equipment that operate in imperial units instead of metric units, may not be ideal, but the fastest way to go bakrupt is to scrap all that perfectly working equipment, just to replace it with the same equipment with different numbers on them. Changing formats is hard, expencive and ofter very risky, so learn to accept bad but working formats, and learn to manage bad formats.
We will talk more later about how to migrate from one technology to another because it is an important topic, but for now lets be very clear: You want to avoid migrating, or be stuct with a bad format, so do everything in your power to get it right the first time.
Because formats are comunication devices, it lets you divide a problem up in to multiple peaces. This make everything easier. Once everyone has agreed to use the same stick to measure stuff, one can go out and aquire two peaces of equipment from two different vendors, who do not even know of each other and they will fit perfectly together. Thats magic. When work can easily be distributed this way, productivity goes up a lot. If you are managing a team, manage the fomrats that connect your team memebers.
Comunication is the hardest thing in a team, so if you solve that by giving everyone common formats to interface with, everything else will be easier. This is why I emphasize the design of good formats over almost everything else. If you and your organization are good at this, you can scale infinitly. Any problem hard enough can be broken down in to multiple less complex problems.
The many-to-many property of a format is notable, because as more things use a format it becomes exponentialy more useful. There is however an other side to this, that most people do not realize: if we add complexity to the format that complexity gets distributed to every thing that needs to interact with the format, and we now have exponential growth in complexity! If your format takes a day longer to implement, thats not a day wasted, thats one day for each implemenation. This means that making your formats simple is paramount.
Many formats that have become standards where never intended to be widely used. "This was just something I hacked together", is a common statement from many inventors of the most important formats that exists. Becasue simple formats are easy to adopt, and useful formats are so sticky, you can often inadvertenly get stuck with something you never intended to reuse. This is the reason why sometimes the thing someone hacked together in an afternoon becomes more sucsessfull, than a three hundred page specification meticulusly designed by the industrys brightest minds.
This is the big challange of format design, you want to make something as simple to implementable and use as possible, but still want to have all the features you need. The time you shave off by making your format easier to use for you, will be multiplied by every other user too. One the flip side, the misstake and bloat you add when designing a format, is now everybodys problem, for as long as your format is used. Both features and problems spread and persist. Why not just fix the problems in your format? Well its not enough for you to fix it, now everybody has to fix it, and at the same time. This is incredibly hard.
This is where designing formats become an art form: you want to balance a future looking feature set that encompases everything you could ever want to do, with something simple and implementable.
So how do you design a format that can do everything but has very few features? The way to do it is to build a simple but flexible system that can easily be understood and implemneted but that can be stretched to do many things. Easy right? Later we will talk more in depth about how to do this using primitives.
You want your format to be flexible and encompas every possible future use, and this is when people start adding features, lots of features. This is especialy common when a format is designed by comitty, and why "designed by comitty" is commonly recognized as a bad thing. Each partsipant has they own needs and requirements and everyone piles them on. This is an easy way to bloat the format and make it hard to implement. You not need to be a comitty to make this misstake, many deveopers pile on features without any regard to implementability.
When designers (or more commonly groups of designers) cant make up their minds abou the best way to do something, they may opt to let the user decide. Instead of deciding if the format should store units in metric or imperial, you decide to let the user decide. The therory goes that now the format does support your favorite meassuring unit, no matter if it is metric or imperial. Two is better than one right? In reality, it measn that everyone now needs to handle both, so everyone is forced to implement the system they think in inferior. Whats worse, is that a number of users are going to say "well I use one of the systems so i dont need to implement the other", and yet others will chose to only implement the other and the compatibility falls apart. (USB-C is a good example of this)
A common way to make a specification less complex, is to make parts of the standard optional. In some cases this can be good, but a lot of times this creates a ven diagram of features where there is less overlap then needed to get useful interoperability.
Some standards leave space for arbitrary user extentions. This can be useful in rare cases, but lot of times you want a format to be as complete as possible so that there is one standard for everything. The purpouse of using a common format is to have interoperability, if everyone is free to define what ever they want then it isnt a format that offers interoperability.
In the orginal specification to FTP (RFC 959) the responce to the command LIST that is used to query a server for the content of a directory, simply states that the server should transfer a list of files, but doesnt specify how the data is formated. A human can read the list disregarding the formting, but the omissing of a strict specification of the format makes it impossible to write a program that can reliably parse the resonce of any FTP implementation correctly.
A very common source of limitations and complexity is indirection. Can your application have one or more documents loaded at the time? Can resorces be shared between documents? Is it a single user application, or multi user, are users divided in to groups? Are there sub groups? There is a great value, in asking yourself, can "there be one or many of these" for each structure you design. These are the kind of questions you want to ask when considering the indirections of your design. Getting this right is very important beacause it is so often very complicated and labourus to change once it has been implements. This most often happens when there are too few indirections, but having too many indirections tends to become a burden for develoers too
Example: Lets say you have a multiplayer strategy game. You have sides that play against each other. One side may be controlled by a player, but multiple players can also be allied. This means that you have 2 players controling 2 armies fighting on the same side. You may also have multiple players shareing the control of a single army. Once you have figured out this is the design you want, you can make a decission of what in your design constitutes Sides, Armies and Players and how the 3 relate to one an other.
The cost of having too much indirection is that you get complexity. If the code becomes needlessly complex, it gets harder to name and diferentiate different levels of indirections, and indirection can often cause a lot of complexity when exposed to the end user. You must weight this against the probablility that the indirection will be needed. I tend to think that too much indirection is better then too little. Its always a judgment call. I think that leaving indirection to the future, when you know that you at some point in the future will need it is always a bad idea.
The Jason file format supports arrays, but the content of an array does not have to be of a uniform type.
As a general rule, designing a format the numerical limits of any structire should be Zero, One or memory/address size bound. Or in other words, if something is suported, and you can have many of them you should be able to ahve ans many of them as you want. There is a wrinkle to this rule though, everytime you allow for something to be dynamic in size you are adding a indirection. Any time you have a dynamic size you are adding an allocation. For a lot of things a strict limit is acceptable. The name of an item can be given an arbitrary length, but for almost any use a 32 or 64 character limit will be more than enough, and significantlly simplyfy implementations since you know the buffer size needed to stora a name up-front.
If a format gets better the more people that use it, how do you get people to adopt yours? If you are the boss you can just tell people to adopt it but even then you want people to want to adopt it. Some may adopt bad formats, that have a lot of complexity because they like the long feature list, but since we are not in the buisness of tricking people in to make bad decissions. We need a reason for early adopters to adopt and lets be clear the early adopter are the ones who are hard to win over. Once your format is a prevasive standard people will be forced to adopt it wather they like it or not. Trying to convince someone to adopt a format because it will be great, once everybody else has adopted it as a standard, doesnt help them solve their problem today, its just you trying to push your format.
Standards emerge, not because they are good when they are widely adopted standards, but because they are good before then. It has to give the early adopters a reason to adopt it, by being useful and solve a problem. People adopt technology, because it makes their lives easier not harder. Given that adopting, is in it self an effort that needs to be undertaken, the return on investment has to be at least more then the cost of adoption.
The best way to this is to provide tooling. If you are standing in a forest full of people asking you to use their particular stick to measure everything, and one of them offers you a full range of precission mesuring equipment including everything from laser range finders to calipers and micrometers, and tables letting you convert your measurements in to every other stick in the forest then thats the one you pick. All sticks solve the problem of having a stick to measure against, but that stick solves a lot of other surrounding problems, that would have cost you time and effort to solve yourself.
A format with reference implementations, viewers, debuggers, libraries, utilities, converters, documentation, loggers, UIs and so on, are a lot more atractive to adopt then having to do all that stuff by yourself. The nice thing about tools is that they can be as complex as you want without making the format itself complex. Any complexity that can be moved from the format itself to tools is therefor a win. Cant decide how your format should store something? Pick one, and then write converters to the other options. If the user dont like your tools, thats fine too because they arent required to use them to use the format. Over time your format should build up a pool of tools that users can choose from, and that makes adopting your format a lot more attrctive. Tools unlike formats are easy to change, rewrite or replace and that gives you the flexibility that formats dont posess back.
The simpler your format is to use, the easier it is to write tools for it. The more tools you have the more atractive your format becomes. A format with lots of tools have also been validated. No one would bother writing all the tools unless implementing the format was easy. Use the tool development process as a way to not just make future use easier, but to refine your format before presenting it to others.
As you start to use your format more and more, you will discover new issues you didnt at first consider. If the stick you use to measure things is a cylinder, then at some point some one who uses the stick will tilt it slightly and get a slightly longer stick. So now you dont just need to define what stick to use, but precise guidence of how that stick is to measured. Then you realize that in cold weather the stick shrinks and in warm weather and grows and when its humid the stick swells. Now you have to define the precise climate the stick should be used in. Soon you have to start worring about the wear of the stick. All these things might seem negligable, but once people are using your stick to buy copper to build trans atlantic wires or, measure gravitational waves, these things really matter. You need to think hard about what you define and how you define it. You want your format to be a specific as possible so that there is no ambiguity in its use.
Once you have established the one stick to use to measure all things, some crafty person will say "I need to measure strength, how about we use this stick that everybody already has, and we see how far it can be bent!" Now your standard for measuring lengths just became a standard for measuring the strength of burly men. So is this a good thing? Probably not. Now you have an entierly new set of expectations for your format. People now expect the stick to have specific strength, and that means it cant change material, and maybe having burly men bend these sticks may result in... bent sticks? Bent stick are not good for measuring so you are bound to have some confusion when some things are measured by straight sticks and some by sticks that have been bent.
Its therefor very useful to explicitly have non-requirements. As important as it is to define what your format defines, it is just as important to define the limits of your format. If you explicitly forbid sticks from being used for something other than their intended use, or even better design your format in such a way that its hard or impossible to miss use it, you are much less likly to run in to these issues.
If your format is good people are going to want to use it for its intended and non intended uses. If the non intended uses become too popular your format may fork, where there is one set of requirements on the books, but an entierly different set of requirements in the field. Some companies have intentionaly engages in "Embrace and extend" a tactic where you deliberatly embrace a standard, but then add various non standard extention to your implementations in order to force the use of their product in order to be compatible with the extentions. (See both Microsoft and Google in the Browser wars.)
If a format requires all data to be sorted in a specific way, then anyone who implements the format also needs to implement sorting. You just added a requirement to your format and increased the implementation burden. You, the designer, may have a beautiful sorting algorithm you can take off the shelf an just use, but that doesnt mean that everyone does. You may not care about perfomance, but for someone else the time it takes to sort the data may be the difference between it being useful or useless.
These are hard trade offs that needs to be made. A good question to ask yourself is: does the requirement have to be in the format itself, or is it something that a user that needs it can do themselves? If a user of the data needs it to be sorted, does it have to come sorted or can they sort it themselves? I tend to try to keep the requirements as low as possible on the format itself because it does apply to every implementation.
Too often I see standards that cassualy add loads of requirements that are assumed to be trivial: "Images are stored in this format that are Checksumed by this algorithm, and then compressed using this algorithem and encrypted by that algorithem". This assumes that the user have all these technologies redily available on their platform and that they all work flawlessly. The format becomes bloated and fragile. If any of those technologies arent available or changes then everything falls apart like a house of cards. What may be a simple addition to you may not be for everyone else.
Lesson is keep your formats simple and indipendent, so that the can be implemented from scratch in a reasonable time frame.
When ever you are designing a format the underlying implementation you are using easliy rubs off on the format.
If you are using a stick to measure stuff, the unit of mesure wont be longer than the longest stick you can find. This may be fine, but a lot of times it can bake in the limitations of your way of using the format in to the format itself.
Your platform, hardware limitations, and requiremnats are different from other users and from how the format may be used in the future. The less assumptions about what the world looks and what the requirements are, the more flexible your format is going to be.
Idealy you want your format to be reimplemented many times before setteling on a design. (Some standard bodies require at least two indipendent implementations before considering standardization).
While moderns computer architecture with deep pipelines, cashes, out-of-order-execution, parralizm and branch prediction are very different from a PDP-11 but they are still trying to appear as if they ar just a faster BDP-11 to then programmer, because that is what C is best siuted for.
For decades GPU API designers has struggled with GPU hardware advancing at amuch faster rate than CPUs. Again and again the CPUs ability to feed the GPU has become the bottleneck. In early versions of GPU APIs the CPU would feed the GPU individual vertex propperties one by one. The CPU would have to send several commands to draw a single triangle. Modern GPUs alow CPUs to send a single command to draw complex scenes with thousands of objects, with multiple textures and shaders in complex multiple passes.
You dont know what the future looks like. Some times you have to make bets about where technology is going, its always best if you can avoid it. At the time of writing, if I compare two computers of roughly the same price from the same vendor roughly 20 years appart, I get:
As you can see, all number has gone up, but the difference is stark. A design that once relied on getting data from the lan instead of generating it on the GPU.
Often a format has two sides, Reader/Writer, Client/Server or Caller/Callee. In this case the burden of implementation does not necesseraly weigh equly. Lets say you have a service and you require it to deliver data in very specific order that the reciver can depend on. By making this requirement, you have made it harder to implement the service. On the other hand perhaps the burden on reciver have decresed becasuse they can always depend on the service to provide correctly orderd data. If the designers of the system only expect one or a few implementations of the service and orders of magitude more implementations of users of the service, then perhaps this is a good trade off. Always assume all sides will be reimplemented at some point.
When designing format, it is valuable to separate the structure from the semantics. The structure defines how data is stored, where as the semantics defines what the data means.
Example: Json is entierly a structural file format. It only defines how data is stored, not what any of it means. This means that its possible to write a parser that can parse the data structure, but there is no way to write a parser that can make sense of the data. If you have two data bases that stores records of people using json, they are not neceseraly compatible, because the two systems may store the same data, in the same format in completly different ways.
The SI sytem on the opther hand is entierly a semantic system. It defines what a meter is, but it doesnt define how the data is stored. You cant write a loader that can load in all SI units, because they can be stored in any way, but once you have loaded in SI units, you can make sense of them and do calculations on them.
Thinking of them as two separate but linked problems helps design a system that is both easy to understand and implement. The two have very different requirements and different functionality has different needs. All interactions needs to access the data, so creating a structure that is simple is important even if the semantic dedscription of the data is very complex. Often functionality needs to know very little about the semantics of the data in order to do its job. Many functionalities only change one thing, and therefor only needs to know the semantics of that thing, but in order to do so it needs to be able to traverse the entire sturucture to parse/load it and then modifying it and return/save it. The complexity of your data format has an invers corelation to the numer and capabilities of the tools that operate on them.
Example: imagine you have a very complex data set that stores the plans for a nuclear plant. The data needs to store all kinds of different data types, from materials, electrical systems, geometry of machines and buildings, radiation levels over time and so on. Imagine you realize that the maker of light fixtures that have been specified, has renamed their product line and you want to write a tool that can replace all the occurences of the old product name with the new in the data set. If the data is stored in a simple structure, it is possible for someone to write a loader for this data set that places it in to memory. Once it is in memory the name of the fixture can be fond and modified, and saved in its modified form. This tool, does not need to understand thecomplex semantic meaning of any of the data, except light fixture name, but it does need to be able to traverse the entire structure of the data.
A good way to make a flexible format is to define a core format, that defines the data structure and the a set of basic functionality, but then leaves room for others to add their own data in a well defined way in the same structure. Preferably this additional data uses the same structure but adds new semantic meening to it. Many tools can be written without having any semantic understanding of the data. When a new semantic is defined, and auxilary specification can be written that defines how a data structure is used for a specific use case. This lets a format stay simple and easy manage, while keeping it extendable, and compatible.
Lets say we design a 3D file format. We define basic geometry, but we also alow for each object to have a key value store. If someone wants to make a physics engine, they may want to store the mass of an object in the key value store. Since the format is defined in meters, the physics engine stores the values as kilograms with the key "mass". Once this is done they can publish an axillary spec, that defines how they store these propperties in the file format. If another team wants to implement a physics engine, they can follow this auxilary spec and make their software compatible. A editor for the format can display and let the user edit the mass of an object simply by letting the user access the key value store. The editor does not have to know the semantic meening of mass, for it to edit it. This means that you can write a forward compatible tool that can handle propperties was not yet defined when it was written.
Lets say you want a data structure where you want to track a bunch of kids, that share 10 sticks. The obvious way to store this, is to have each kid store the number of sticks in their posession. To trasfer a sticks from one kid to another you need to subtract it from one kid and add it to another. If you add up all the kids sticks it should alway add up 10, but it doesnt have to. Its possible that a sticks was added to one kid but not removed from another or wise vera. If we assume we live in a world where kids dont loose stuff, then the programmer must have lost them. If on the other hand we store a list of who has each of the 10 sticks then we cant loose any toys, becuase if we change who has a toy, we dont have to do two matching operations only one.The data structure is consistent with our requirements by its nature, and is therefor much less error prone.
Example: Unix command-line system lets you cut, paste, and transform data in endless combinations using a few basic commands. However, Unix commands however assumes that all data is stored as textfiles. You cant use grep to find out when "as time goes by" plays in a video file of Cassablanka. It would be possible to build such a system, but it would require a much more complex type system, that could operate on such disparate data types as text, audio and video. It would not only make the system more complex, ith would also make wiriting commands such as Grep more complex and time consuming.
The first and most obvius way to do this is to acceps that, your users have the right to reject your software. This includes updates. Even if you have made a new version of your software that inarguably is better in every consivable way, if the old version does what someone needs, then updating may not be a priority to them. This is especialy true when what your are providing is a API. API changes requires the user to modify their code, and this requires atention and thought. The reason someone is using your API is so that they dont have to engage in something, so if you make them, you are doing the very oposit of what your users want from you. Accept that "but this is better", is a weak argument and that "but this is newer" is no argument at all. As I consider APIs to be the best way for teams to colaborate on software developments, you can consider many member of a team to be platform holders who are all responcible to a number of platform users. The lessons are therefor widely aplicable both internaly and externaly.
So what steps should a software developer take to be a responcible platform holder and mininimize the starin on their users, when changes are made?
If you tell people to do one thing and then you tell them that was wrong and that they should do something else, you have wasted their time. This is true for a manager that changes requirements, a designer that changes a design, or a software developer who whanges an API. Valuing other peoples time is a sign of respect, and therefor changes that causes people to have to put in time to learn new procedures, modify their code or worst case rewrite, on top of making being distracted, and needing to make new releases, should not be made lightly. Making changes is sometimes nessesery. Things do change, and its not possible to get everything right the first time, but at this point it is very important to singal that you understand that this change casues work for others and that you are responsible for them having to do this eextra work. You need to comunicate to everyone involved that you recognice what what the changes mean for your users, and that you dont take this lightly. If you get in to the mind set that chnages have a cost, in terms of time, and the trust of your colaborators, then you are more likely to avoid preventable changes in the future, and gain more trust from your collaborators. A change that may make things nicer for you like name changes, is a complete waste of time for your users. No matter how simple the change is to make, they still have to drop everything they are doing to learn about the chang, adopt to it, and then make a release. Arguing with your users that the change is super-simple to a dopt to is just wating your users time further.
If you are writying a new version of a API, consider writing a wrapper that lets users use the old API to access the latest version. That way they can adopt the new changes acoording to their own time line. Smaller changes or name changes can be made using simple macros. Larger changes can be made using wrappers where the old API is implemented using the new API. I think this is a good exercise, and also creates great sample code for any developer that wishes to adopt the later version of the API. How long should these backwards compatible layers be maintained? My answer is forever, they shouldnt just be a temporary solution while users are forced to adopt their code. The entire point of having a wrapper, is that it shouldnt need any maintainance, therfore your should not be afraid to keep it aound indefinetly. You may end up in a situation where you have many layers of wrappers on top of each other, then thats just fine. If writing a wrapper, is hard due to the changes made to the API, then consider that they will be even harder for your users who dont have the benefit of the deep insight in to the API as you do. An API version that is hard to wrap the previous version, is an argument for writing a wraper rather then against.
Lets say you have written a text parser module. You have used in various projects, but now you find that it lacks some fundamental features that are needed.
int alphabetized (char *first_string , char *second_string );
It simply returns TRUE if the two strings are in alpabetical order, and FALSE if they are not. It deals with casing and over the years it has never given you any trouble until one day when you are calling the functionality with these parameters:
The funtion returns TRUE, because '1' < '9'. You realize that "alphabetized" doesn't understand decimal numbers. What do you do? It depends. Where is "alphabetized" used? If it is used in only one place, in code you have written, in software only used by you, You may just change It. But lets assume that the function is used by others, and maybe even by end users. When others depend on the code you cant just change the meaing of it. Dont assume that every time this function is used what the user wants has the same definition of what a good "alphabetized" function is. The user of the function may have read the code and decided to use it only after specificly checking that it does not have a special case for decimal numbers. What you should do is write a new function named something different. Once you have you you can publish the new one and let people know about it.
When calling a function called "alphabetize", you do so because you want you want the behavoiur of the code you are calling what ever that is. it really has no relation to the semantic meaning of the word "alphabetize". There may be 50 different opinions about how to correctly alphabetize things, but the caller isnt referencing the wider consept of alpabetize, they are referencing a peace of code that happens to have the string "alphabetize" as its identifyer.
At this point, users are entierly withing their right to ignore the new function and keep using the old one. You may think that everyone should use your new and improved version of "alphabetized", but just because you think that the new version is better doesnt mean anyone else does, or that you have the right to take up their valuable time. A lot of times, the most important feature of software, is it not being disrupted. Maybe your old version has a huge security hole in it, and you desperatly want to update the end users code, but that still doesnt give you the right to impose your code on them. Maybe they are working towards a deadline and the risk of not meeting the deadline is much worse then the risk of getting hacked, maybe they are on an airgapped network, you dont know. The point is you dont know the priorities of your users, and even if you do, you dont have the right to impose your priorities on them. I have sat threw lectures at big conferences, where 500 people had to sit and wait while the speakers laptop force rebooted to install a security updates for 10 minutes. If that is security, what are we securing against?
Even if the function is only used in code you have 100% controll of, I would still suggest writing a new function (Possibly by copy-paste-modifying the previous version), and then search for every use of the original, and by hand replaceing the old one with the new where apropriet, and then remove the original if you find that it is no longer in use.
What if, you later realize that you need versions of "alphabetize" need to handle Hexadecimal numbers, UTF-8, Roman Numerals, or a bunch of other things? You you could end up with an array of different implementations to maintain. Right? Right, that is your problem, you wrote the code, and the fact that it didnt have the features it needed is your problem not your users. When you put out code in to the world, its your responcibility. You are in service of your users. If they choose to use your software it doesnt give you the right to take up their time or decide their priorities. Learn to see yourself as a platform holder.
If this seems to lead to an impossible maintainence situation, where the amount effort needed to maintaine all versions will overwhelm any developer or team, consider this: there is no maintainance of code that doesnt change. The old version of "alphabetize" doesnt need to be updated, because the entire point of keeping it in its original state is to preserve its behaviour. A single implementation of "alphabetized", that is constantly maintained and updated with new features and behaviours, will continiusly break the software that depend on it, this causes the maintainence needed to go up much more.
If you need to change an interface, do so by adding a new interface, while making sure that the old one works, either by writing a wrapper arround the new code to emulate the old interface or fork the code so that both are available seperatly. You may document the old version as depricated but you cant remove it. Someone may depend on it, and there is no time in the future when that isnt true. You may think this sounds really messy, having old versions of old interfaces in your code, but thats your mess you created it. Its shouldent be your users problem. You wrote the bad interfae, its your problem and respocibility to fix it. If forking your code means you have to maintain multiple versions thats fine. Why Is it more work for you to fix a bug in 3 different vesions of your code, then for everyone else to stop what they are doing and care about your misstake. If your release code that you expect others to use, you need to take responsibility for it.
You need to get over the idea that having multiple versions of the code available is bad. Many computer scientists got started at a time where computer memory was extremly small and they therfor put a huge premium on making the executable as small as possible. This is no longer the case, but the culture precists. The many would define "Bloat" as a large executable. But who cares how many instructions your program has, what is important is how many are exected, in other words how performant is the software.
I was once told by a developer that Microsoft Visual studio Contained several, entierly separate implementation of older versions of Visual basic. This was told to me as a horror story of how not to do software design. Some how the existence of older versions upsets programers sense of estetics. I regard this as excelent software design. There are probably thousands of software projects that have been written in the various old versions of Visual basic that Visual Studio supports, and to them beeing able to still compile and execute these projects are crusial. The knee jerk ractions from programers is for these projects to be rewritten to conform to the latest version. But why? Why spends years of development to rewrite something that already works, and in the process upset a functioning workflow, and most likely introduce new bugs? If you are going to rewrite software you should do so for a reason.
Its very important to comunicate to your users what state a API is in. There is value in releaseing half baked APIs to get feedback early, but at this point you need to be clear with your users that it is likely to change, and that any investment in time using it may be wasted. Its also (And Im afraid this is something that needs to be said), Important to comunicate if the entire or parts of a interface is not yet implemented or tested. Make it clear what your intentions are, have you releasesd something, because its completed, as a sketch? Do you intend to support it long term, and if so what changes are expected?
Early on in development you can write faux APIs where the underlying implementation does not implement all features, but still exposes how the features will be exposed in the future. If an implementation only has one mode but is intended to have more then one mode, create the indirection where the user have to select that one available mode. Add Pitfields with optimization hits before the undelying implementation is able to take advantage of them. There are a range of things you can do to create a front end for your code, that is forward compatible even if its not yet fully fleshed out under the hood.
Another possibility is to make it possible to query out of the API information that may change over time.
Example: Instead of having a struct with a list of settings that needs to be implemented in a UI, make all settings their name and type queriable programaticly. That way a probram can implement a UI, settings file or similar that automaticly exposes the settings available. If the settings change the Im plementation will automaticly support them woithout any intervention from the user.
A lot of programmers have an instict feeling that something is wrong when they have multiple implementations of things in their code. I think this is some cultural legacy from a time when programs had to fit in very small ammounts of memory. Today we have as far as executable code goes virtualy unlimited memory.
Channel that in to making sure you get your APIs right the first time, rather then ignoring the problem.
Some large companies and open source projects have come to the conclution that they have enough market power that they do not need to consider how much work their changes encur to other. Beyond the obvius bad maners and general disregard for other people this shows, I think it is just plain wrong. People use your platform to do something, not just to use is, for using its sake. Every moment they have to make changes to acomodate your platform, is a moment they arent making the applications that use your platform better. There is a huge hiden oportinity cost here that should not be ignored. The effect of not managing your platform, is one you are likely to not discover for a long time. The sunk cost of your existing users will keep them on for a long time, but new projects will chose a different platform and once they do its very difucult to convice them to adopt yours. Because of this lag, many platform holders are overconfident until its way too late to course correct.
Dont get in to the habbit of writing slow or bad code when you think it doesnt matters. If you solve a problem, it should remain solved, and you should leave no reason to solve it again, becasue it wasnt done the right way the first time. Returning to re write code that you thought didnt need to be fast, comes at a big mental cost. If you get in to the habit of always writing performant code, you have to maintain and write one style of code rather then two. Write code that is fast on generic hardware, and alow the compiler to do its job. Dont waste your time optimizing for a specific compiler, hardware or platform unless you have to. The goal is to make code that is fast and will remain fast, not code you have to go in and re-optimize.
I maintain a mental barrier againts writing code that has a n2 complexity where n is dynamic. A computer can obviusly solve problems of quadratic complexity using a brute force aproach, but I pretend that they can't. I get in to the habit of always avoiding writing code that have this or worse complexity. In your head you maintain a tool kit of solutions to various problems, and I chose not to carry this one, to force myself to get in to the habit of writing performant code.
The reason code is slow is not because you havent read up on the latest papers on sorting theory. Its almost always slow because your application does things that it doesnt need to do. Its slow becasue it keeps accessing disk, waits for network connections, garabge collects, hasent cashed what it alread has computed, runs in a virtual machine, makes too many system calls and has poorly designed thread locks. The best potential optimization is always figuring out a way you dont need to compute someting at all. Be explicit with operations that take time. Amortize large computations and decouple heavy computations from user feedback. O notation only rarely is the cause of poor performance. Also note that all common search and sorting algorithmes assume you know nothing about your data. This is almost never true, and is another good reason why generalized code is less decirable.
You write a the function: int setting_get(char *name) that opens a file, searches it for the setting then closes the file. If it cant find then file, then it connects to a remote server to query the setting. Then someone uses this function like this:
for (i = 0 ; i < 10000 ; i ++)
array [i ] = setting_get ("Initial value ");
The job of a programmer is to produce a set of machine instructions that executes on some kind of hardware tos do a task. If this is a definition of sucsess,
A garbage collecting, language objectivly makes the software slower, and use more memory, for what is preceved to be easer development. By choosing to use such a tool, you are chosen your own convinience over the quality of your output. A much better aproach is to use a tool that helps the programmer without adverse efects on the end product. A house without a roof, may be easer to build, but its not a better house.
Example: The Linux kernel contains many goto's. While this may not make for the easiest to read code, it does produce better assembly language that saves execution cycles. The Linux kernel is not desinged to be bed time reading, it is designed to be a optimal operating system running on billions of devices. A single saved instruction saves an imesurable ammount of power and compute time across the world, and that is much more valuable the the percived estetics of its source code.
If the choice is better user expeience or better developer experience, always chose to better user experience. This how ever is rarely a tradeof that needs to be made. Most problems encounterd during the development process than be aliviated with better tools that dont impact the final result negatively.
Low Latency is almost alway more decirable, and harder to achive then bandwidth. This is almost always true or all domains like user feedback, networking, memory access and so on.
-Avoid optimizing for a specific hardware .
For a software developer, Its easy to think that Software is easier and faster to develop then hardware. Yes, it is easier and faster to write hello world, then it is to solder the hardware needed to run it, but hardware gets replaced much more often. Most of the software we choose to use is over a decade old, but we rarely choose to use a decade old hardware.
When optimizing there are always gains to be had by knowing exactly how the hardware works, but the time it takes to optimize for the exact hardware, is often outpaced by the development of new hardware. What you need to do is to recognize what hardware trends are long and short term. Algorithmic optimisations almost always yeild longer term results then optimizitions targeting a specifict hardware generation.
Its impossible to know the exact size of cashe lines, the number of cashes, or memory latency of future CPUS, but one can assume that keeping data compact, to access it sequentialy will continiue to be good strategies to get good memory access performance even in the future. If you over optimize for the specifics of current hardware you run the risk you having your code run slower, not faster once new hardware comes out. All code should be designed to last for many hardware generations and it therefor makes sence not to make assumptions about number for cores, precise memory access patterns and other things that are likely to change in the future.
In 1972, Ed Catmull decided that he wanted to make a fully animated feture film. Later in 1978, while working at Lucas film, they conducted a test to find out the requred resolution and bit depth of CG Images projected in a theater. (roughly 2000 by 1000 pixels at 10 bit depth). The RAEYS rendering achitecture renderd polygons half the size of a pixel, and by benchmarking their renderer, multiplied by resolution, 24 frames per seconds, 60 seconds per minutes, and the running length of a feature film, they could compute the computational requirements of a feature Ccomputer generated film. At the time the numbers showed that it would entiely unfeasable to make a feature film. But using mores law that predicted that cost would fall at a steady rate, they where able to predict that it would be financialy and technically possible sometime around 1994. In 1995, Ed and his team had spun out of Lucas film to form Pixar and released "Toy Story", the worlds first computer animated film.
With a simple computer, a compiler and the will to do something it is possiblke to create any software. The hardest thing about programming is to keep yourself motivated. How you keep yourself motivated, is different for each person. For some people it comes easy, and some find it hard. My advice on this subject is therefor very personal and may not at all apply to you. One universal advice I can give, is to see motivation as a skill that needs to be worked on constantly. There are plent of people who know everything but do nothing because they lack motivation. Recognize it as a challange and a challange worth meeting. Since it is so individual you need to make an effort to figure out what works for you.
Programing can be tedious, so I tend to identify things I find tedious and try to in one way or another find a way make them interesting. This may be done by creating automation for these tasks, It may be to try a novel aproach to the problem, it may be to implement it in such a way that it can be reused or solve multiple problems at once. I find that its much better expend more effort on a larger solution that im excited about, then writing something much simpler that doesnt motivate me. Its not just about the number of hour something takes, its about how many hours you are motivated to spend.
A good programmer is someone who puts in the hours. Its someone who is a self starter and is always eaguer to learn. Stay hungry.
There is no code that doesnt deserve to be well written. If its not important enough to be well written, why are you working on it? Do something that matters with your life. Dont ever even consider if something you write
Maybe you think there is an opening to revolutionize the world by reinventing software for insurance. Insurance has a lot of money in it, so maybe this is a great oportunity for sucsess. Maybe the status quo is terrible and you can really make a difference. Before startuing this endevour, first consider that it may not be a great oportunity at all. What if it turns out no one will want to pay for your insurance software? Then what? If you are still passionate about writing this software even if no one ends up using it then you should go for it becaus you are probably the perfect person to do it.
Most things fail, but if you make something you would enjoy failing at, then you cant fail. Its the only guarantee of sucsess I know.
There are simple things that are used by billions of people and there are complex things that are used by very few. Dont assume a niche use case makes something simple to execute on. Also dont assume that just becasue a huge company dominates a market that what they make requires huge resources. The higher up in the software stack you get, the more niche you get, the more code is written and the harder it gets to imagine a world where the layers beneach you look different or work in another way. The greaterst trick the big tech companies ever played, was convincing the world that competing with them is impossible.
Dont waste your time delberatly making things that arent great. If you cant imagine it taking over the world why bother? If you are going to do things do them right. If you are a company making a product, design it with the intention of being better then the best competitor. It sounds like really obvious stuff, but I keep being surpriced by how often I hear statements proving that this is not obvius stuff. "Our users dont need all the features", "It doesnt have to be fast", "We dont compete with the big boys", "Ours doesnt cost as much / Is free" are all things people tell themselves in order to not have to do what should be done. It never works out. Its always a wasted effort. You dont capture 20% of the market by putting in 20% of the effort. You captiure 0%. Writing code that isnt as good as it can be is always more expencive. Sometimes I hear people say their implementations are just traininng excercises, so they dont have to be good. I keep thinking, what are you training to be? Mediocer? If you are going to do something, do it right. If you aim to make the best thing, but don't know how, you can study the subject, you can run experiments. Seek out information and people who can help you. If you dont aim to make something good, no skills or resources can save you. You are wasting your life.
In life you will find loads of successful people who all have had advantages you dont have. They will talk about how easy some things are that you find impossible. Instead of beeing discurraged by this, try instead focus on what your unfair advantages are. The world needs what it doesnt have, therfor trying to follow in the footsteps of sucsess is a sure way to not be needed. Maybe you know somthings others dont know. Who do you know that can help you, or who do you know that needs help? Do you live near interesting people you can reach out to? Even If you have nothing but yourself, consider the time you save not going to all the meetings you would have to atend if you where part of a big well funded team. Someone you admire is probably envious of you, in a way you may not see. Always remember that programming does not scale. A single developer can do things billion dollar companies fail at. Figure it out.
Cancer will not be cured because someone figures out the magical one thing that cures canser that no one thought of before but could have thought of 20 years earlier. It will be cured becasue we will build more advanced tools that lets us studdy, understand and then modify cells. Canser, like so many other problems are tooling problems. The world is brought forward by enabeling technology. Standardized shiping containers, UPC Barcodes, IS units, the electron microscope, the Haber�Bosch process, injection molding, semiconductor lithography, IP routing, GPS, have all touched our lives in imessurable ways. They are not necceseraly front and center of our daily lives but most of the stuff we have arround ourselves have been enbled by these thing.
Learn to recognize this, not only in the global scale but in the microcosm of the software development you conduct. The truly revolutionary technology like the transistor, dont necceseraly have a direct use. But rather they enable others to build things that do. When you chose to develop technology, dont just focus on your goal. think about what technology will eneble you to reach that goal. Analyse what problems you are likly to encounter, and build technology to solve those problems. Its easy to think that this only applies to large organizations that have the resources available to dedicate to tools development, but in my experience taking time off from projects to build tools to help the projects have allays been a good investment of time even as a single developer.
When you write code you are building a mountain. Each new software you write, should make use of modules you have written in the past, and add new modules you can re use in the future. When choosing what to do next, dont just consider the product of the project, consider what possible future projects will be enabled by the technology you create for it. The best investments are not in technology that gets you that one great product. The best technology investment is the thechnology that gives the the widest range of options
Apple didnt just build the iPhone over night. They first had to built OSX. They build quicktime so that they could do media playback. That made itunes and the iPod possible. iTunes made the itunes music store possible. Separatly they created Safari in order to not be dependent on Microsofts Expolrer browser. They built iPhoto and iMovie using quicktime in order to manage media. Only once all these things where in place, the iPhone and later the app store, and ipad was made possible.
Lets say you are implementing a Video editor. You need to write a module that can encode and decode video. You may think this is just plumbing that you have to get past so that you can implement your amazing new video editor concept that will take the world by storm. This is the wrong mind set. Modules are more vaulable then applications. Your video editor can meet two fates, either it is a sucess or a failiure. If its a sucess you will want a strong foundation to build on so the quality of your video module is important. If its a failiure, you wont use the video editing code, but its not unlikely that you will do some other project that requires you to encode / decode video. Done right the survival rate of modules should be higher then applications. If you have written good modules, you have hedged yourself against faliure. Even projects where video support isnt high on the list of features needed, can get it with little cost if the module is robust and easy to integrate. See the video editor project as an oportunity to concoure video encoding/decoding once and for all, so that everything you enbark on in the future will be able to make use of video. Make it one of your foundational capabillities of your arsenal.
You can do this at a company level, or on a personal level. If your company doesnt do it, try doing it for yourself. Go home and write the base of the module, and publish it online as open source. Then tell your employer that that they would save a lot of time using your open-source library. You agree to guarantee them usage rights, and they give you the right to contribute to the opensource project on work time. Everyone wins, and now if you ever need that code its yours to use no matter where you work. Dont loose valuable code just because management decides to drop your project. (Tip to managers: programmer HATE to have wasted effort on cansled projects, so giving them the right to opensource any canseled project is a valuable incentive.)
The lesson is: think hard and get it right the first time. When things go wrong, own it. A 10x programmer is not a programmer who can implement an algorithm in 10 minutes flat, its the programmer who can write the same thing in a way so that it last, unchanged for 10s of years.
Get in to the habit of always writing good, reusable, dependable, dependency free, performant code. Don't think, that some code is throw away or that there are cases when it doesnt matter. Dont assume you will have time to rewrite things at a later date, or that performance wont be important. Everytime you are writing something you are paracticing your craft, so dont waste you time to practice writing bad code.
Programmers are like multi-processor systems: avoid the need to synchronize at all cost. Like processors programmers are vastly more efficient at doing work then interfacing with one an other. Divide projects in to modules and assign no more then one programmer to each modules. If a module is too large divide it in to smaller modules, or create modules with sub modules (an image loading module with sub modules for different image formats). No module should be larger then that one programmer can implement it from scratch in 6-9 months. If they are larger then that, projects become too dependent one one person.
Whenever people bring up various collaboration methods I am reminded that all programmers collaborate with people they have never meet. When we read documented APIs written by other people we are (often very successfully) collaborating. In that way many programmers are better at collaborating with Ken Thompson then they are at collaborating with the members of their own team. If this is form of external collaboration is so successful, we should try to replicate it internally within organizations.
Often its hard to justify writing internal interfaces and documentation at the level of quality that is expected of externally visible software. Why spend the time to clean up code and documentation when you can walk over to who ever wrote the code and just ask what you need to know?
I would argue that collaboration is so detrimental to productivity, that even collaborating with your past self is difficult enough to warrant a strategy of modularization and interface design even on a one man project. Engaging in code that you wrote just a few years ago, can be risky enough that it is warranted to rewrite it rather the trying to understand it in order to make major modifications to it.
If your are working on a project too large for one person to complete, don't simply add more people. Instead imagine what the world would look like where it would be possible to one person to complete such a project, then add people to create that world. Maybe development tools need to be better? Maybe libraries and utilities need to be available? You can write a lit of all the things that would need to be available for one person who accomplish the goal, and then assign people to it. Some of the things that would need to be available may not be possible for one engineer to produce, so then you have to ask your self what the world would look like to make it possible for one person to write that thing. Now you assign people to make that world happen. By doing this you create an isolated structure of modules, that increases your productivity and flexibility. Once you have created the world where it is possible to write the project you want with only one person, you can look at what new opportunities this new world affords you. You can in theory hire just one person to write an entirely separate product that uses the same modules.
The eleventh person added to a team isn't going to make the team 10% more effective due to diminishing returns, but an eleventh team member can write a tool that makes the rest of the team more then 10% productive. As the team growth the value of the 10% productivity increase grows rather then diminishes. The lesson is: People don't scale, tools do. Enabling technology is force multiplying.
If you erase the difference between, a software interface that is used only by you, is used only within your team, and one that is publicly available to anyone, we need to dig in to what that means for you as a developer. There are a bunch of assumptions that cant be made when you don't know who you are collaborating with. You don't know what they use your code for, you don't know what aspects and features they depend on, you don't know what their priorities are or what they have time to invest time in to. This means that you have to be very careful in your decisions order not to up set their work.
In reality this form of software developments does drastically reduce the maintenance needed for developers because it leads to:
All this stresses the importance of interface design. Interface design is what necessary distance between the implementer and the user, so that both can be productive and innovative without upsetting the work of the other.
We all like to show off as great programmers, and most of us know that making something that looks like it works takes a lot less time then making something that is complete. When you have an early version of something that sort of works, try really hard not to show it off to anyone, especialy your superiors. Yes its cool, and you are excited about it, but dont give anyone false impressions of how much effort is needed to complete it. Show it when its done. Technical debt is built when enginners shows off work that appears to be done when it isnt. If it looks done, your manager will assign you a new task, and you have put yourself in bad position to negotiate for more time to complete the task they think is already done. When you show off your work too early you front loading all the fun, all the praise, you have nothing to looks forward to once all the hard work is really done. Rememebr the first 90% is easy, its the second 90% that are hard. Save the chamaign to the end.
Im fundamentaly wery of hackathons, where the objective is to hack something together in a limited time space. I know numerous impressive projects that where made in a few days, that then have taken years to get off the ground, or failed completly, because of a weak foundation. Being good at hackathons doesnt set you up to be a good software engineer.
When working in a team or for a customer or other stake holders, you will inevit come under pressure to do things that will compromise your work. You will be given unrealistic deadlines, feature requests, and scope creep. Most of us want to be good teamplayers, and we want to say yes, but i find it imperative that you stay firm and protect your space. If you give in to unrealistic goals, you will eventiualy make your work impossible. This is why i think you need to fearsly protect your space. Give yourself the time and space to do things right. If doing it right takes 3 weeks, but you can hack something together in one week, dont fall for the temptation of doing it in one week. You wont get 2 weeks at a later point to fix it. Learn to say no.
Protecting your space is not about just protecting yourself its about protecting the team and the project. If technical debt builds it will impact everyones work. Instead of having the choice of a 1 week hack and a 3 week job done write, you will have the choise of a 6 week hack or a years rewrite. This benefits no one. You are responceible for the quality of the work you do. If you say yes to every unrealistic deadline given, then you are responciple when things go wrong. Managment and customers cant know the technical implications of every decission so they cant be responcible for it. This is a responcibility enginnering have to take. Sometimes its your job to protect them from themselves. Everyone wants everything all the time, but know that in the long run, everyone apreciates consistent, dependable deliver on time and on budget.
I belive there are 2 kinds of deadlines, aspirational and dependency deadleines. Aspirational deadlines are "Lets land a man on the moon in this decade". Its a random date chosen to rally an effort to get something done. These deadlines have no meaning other than "Lets go do something". Dependency deadlines on the other hand are like "The train leaves the station at 5 oclock". If you are not there at 5, then you will miss the train so you might as well not bother showing up at all. Learn to recognize what kind of deadlines you are dealing with. If you are dealing with dependency deadlines, where other people need your work by a set date, then that deadline needs to be respected.
The best way to protect your space is to be vidgelant about controlling information. Management and customers wont read your code, so they are entierly reliant on you to tell them what is possible, how long it will take and what the risks are. You provide them with the vast majority of the information they need to take decissions on what you are to do. Give actionable information that lets them take the decissions. You dont need to tell them every detail. If you need to do 2 weeks of cleanup befor you can deliver teh feature that takes one week to implement, tell thme the feture will take 3 weeks to deliver. Do not assume that they care about what you care about.
Do your best to try to understand what they want but also why. A lot of times they ask for something complicated when they really just need something simple, and then its your job to figure that out. Managers have a tendency to reveal plans step by step, and this can cause a lot of troble. Ask upfront important questions about the limitations of the saftware. Does it need to be networked? is it single user only? what platforms will we target? Make sure you in the clearest terms possible explain that changing your mind later has a huge cost assiated with it. The thing you want to avoid at all cost is to after years of developing a native app have someone come in and say "great now it just needs to run in a browser. This has been the plan all along, i just wanted you to be focus on other stuff until now". At this point you want to be able to bring up an email or something that makes clear that you told managment 3 years ago this wouldent be possible without starting over. You need to protect your team, your organization and yourself against this. Thes kinds of decissions kill projects, and entire companies. Protect your space.
Most enginners aret too fond of managment, or managment tasks. Most engineers just want to engineer. At some point or another you are going to want to enginer things where the effort is simply too much for one person to do. When you reach that point you have three, either relinquish controll of the project to someone else who will manage it, give up the project, or manage a team yourself. I don't know who you are a person, but I would encurrage you to be open to the idea of managing other people.
If your heroes include engineers like Henry Ford, Thomas Edison, Jim keller, Linus Torvalds, Elon Musk, Satoru Iwata, Kelly Johnson, and Edwin Catmull, recognice that they where all able to super charge theeir creation by brining in and managing a lot of people.
But remember: The only thing worse than having to take decissions is when someone else does it for you.
You may not see your self as managment type, you may not like making charts and slide shows, wear a siute and hang out with other management people who seems to care more about corporate politics, status and money than making good stuff. The open secret is ofcoures that managers dont have have to be like that, If you become a manager you can be any kind of manager you want. Dont like siuts? Then dont weare one. You can ban slide shows if you dont like them. Dont like useless meetings? Then dont hold uselesss meetings. Do the meetings the way you want, when you want them and for as long as you think is right. The best managers are enabelers, people who help other people get stuff done. Sometimes by setting rules, but more often by removing red tape and obsticles. A good manager is flexible and works to make the work environment siuted for each individiual needs and talents.
Many manageres add layers of structure in order to be able to gain conttrol of something they fundamentaly dont understand. If you become a manager you dont need to do cover for your lack of understanding because you understand. Engineers dont always make the good managers but they do make the best managers. The best managers are the people who could do the work themselves and know how to get stuff done. They earn the respect of the team and they focus on the work.
There is no perfect code. There is only better or worse. There is no way to eliminate all possibilities of failiure. There are people atempting to build "veifyable" code where a code can be verified to work correctly against a specification. While this is possible, and have been done, you now have the problem: is the specification correct? Is the veryfier bug free? In the end it always comes down to "Does this thing solve its problem in the best possible way?" and that is never some thing that can be proven. Its always our best guess. Yes, the software that controls the plane you are flying in is just some peoples best guess as what the softwware in a plane should do. Such is life. There is no guarantes against and astroid destroying all life on earth, being hit by lightning, or cosmic background radiation fliping that on critical bit in memory.
So how do we manage? We do our best and prioritize what we think are the highest risk of failing. We have limited time and resources so lets look at where it is needed. Whats most likly to break? What is most likly to be hard to debug? What bugs are most likly to cause catastrophic failiure?
Sometimes when bugs are simple you can just look at the code and see the problem, you dont need to read a chapter of a book to fix thouse, so here we will focus on bugs that are harder to find. As lazy people we hope that bugs are solvable by just looking at the code, and too often we avoid systematicly investigating a bug in the hopes that some obvious solution will present itself.
Users dont see bugs, they see the symptoms of the bugs. Its really valuable to separate the symptoms from the bug. Buggs with with obvious and instant symptoms are hard to deal with for users, but are easy to get a handle on for developers. Reversly, bugs with barely any symptoms noticable to a user are hard to deal with for developers.
Bug mitigation and debugging therefor has fundamentaly different goals: Mitegation wants erros to have no impact at all, where as debugging wants bugs to have an instant impact. Debugging is often the proccess of produceing more symptoms of bugs, until the bug produces enough syptoms so that it can be understood.
Any investigation is the art of time travel. An applcation does the wrong thing, because of something it did in the past, and its your job to construct a time machine and travel backwards in time to find the fault that caused it. The shorter time you have to go back in time the easier it is.
This observation tells us that code with the shortest possible distance between bug and symptom is the ones easiest to debug. If the cops show up when the killer is still holding the knife stuck in the victim chest, then the crime is a lot easier to solve than if you find a corps that has decomposed for a couple of years. So idealy you want bugs to call atention to themself as quickly as possible.
A defensive programming aproace is therefor based on having code fail faster when there is a bug. Some blow up right away, (De-referencing NULL on memory protected system) and therfor if you run your application in a debuger they will be easy to solve. Defensive programing is about antisipating the bugs that will be hard and putting in the systems needed to catch and debug them.
ABA is a a bug that is based on the possibility that an Item, is referenced, then Item is removed, and a new item is created that happens to get the same address as the original Item. A reference that is made to the first object now inadvetedly referes to the second object. In normal operation this is a bug due to a failiure to remove the reference to the first object. This can be tricky to find, as it may not be obvious that for instance multiple allocations can yeald the same pointer. APA bugs becomes significantlly harder to deal with when you are dealing with lockless programming, that relly on compare and exchange instructions.
Compiler bugs are rare, and therfor you want to eliminate all other posibilities befor blaming the complier. While compiler bugs are rare in mainstream compilers on mainstream platforms.
-Printf
Printf is probably the most common debugging tool. If debuging is mostly about figuring out what you code does so printing out what it does is an obvious solution. The problem with printf is that you need to know in advance what information you want out of your code, and you need to pre instrument your code before you run your code. A good debugger on the other hand can give you the entire state of a program in an exloprable for.
Printf most valble use is to log the changing of state and flowcontrol of an application. A debugger can tell you the state of a progream but has often trouble logging what happend during execution.
When printing out information I try to make the print outs look and feel like the code I'm debugging. You want your debug output to be as readable as possible and since you are reading the code you are redebuggng, it makes sense to retain the naming, syntax, and formating of the code in debug outputs.
printf ("array [%u ] = %u ;", i , array [i ]);
When looking at this output, the variable names will be recognizable and as the number of printouts grow it will be easier to keep track of what printout coresponds to what data.
Some caviats about Printf: Printouts are on some platforms bufferd, menaing that when applications crash some printouts at the end may be lost. Try using unbufferd print outs like fprinf(stderr .... ) if this is a problem on your platform. Another issue with Printf is that it is slow, and may change the timing of the applications significantly. If the application is multi threaded, or depends on time in other ways theis can significantly alter the behaviour of the applictaion. This however is also a clue. A bug that goes a way or changes behaviour when printfs are added might be timing based.
-Write code for breakpoints :
void array_set (int element )
{
int array [10 ];
#ifdef DEBUG_MODE
if (element >= 10 )
element += 0 ;
#endif
array [element ] = 1337 ;
in this case I am using the line "element += 0;" as a NO-OP that i can put a debuger breakpoint on. "+= 0" has no efect, and that can have an advantage, since if you acidentaly leave a test like this in your code, the compiler on higher optimization levels will remove it for you. You could use "-= 0", "*= 1", "~= 0", or a number of other operations that also have no effect, but I exclusivly use "+= 0;", for the simple reason that this 5 character sequence becomes easy to search for in your code base. Everytime you see it, you know what it is for and that its not a misstake.
Some times you want to share or even ship code with debug code in it, then breakpoints wont help much. I prefer writing a clear print out error mesage followed by an exit. As your code base grows it helps to always include the module name in the error message. This means that its easier to triage the problem and give it to the right person right away.
void array_set (int element )
{
int array [10 ];
if (element >= 10 )
{
printf ("Module name Error : array_set given element value %i /n ", element );
program_exit (0 );
}
array [element ] = 1337 ;
While an exit, may be a graceful handeling in shared software, it is not very useful for debugging. If you are running the application in a debugger and the application calls exit, you emidietly lose all state and are unable use the debuger. A break or even a crash is a much more useful since it will initiate a debug session and give you a stack trace, and access to read variables and other data. You can put a break point on every exit, but breakpoints are not portable between projects and compilers so I prefer to simply crash:
void array_set (int element )
{
int array [10 ];
#ifdef DEBUG_MODE
if (element >= 10 )
{
unsigned int *a = NULL ;
a [0 ] = 0 ;
}
#endif
array [element ] = 1337 ;
This code deliberetly writes to NULL in order to crash. Again i always use the same pattern when producing this crash to make it easy to search for. If your code has graceful exit calls when something goes wrong in release mode, consider replaceing them with a crash in debug mode:
#ifdef DEBUG_MODE
extern void exit_crash (int value );
#define exit (a ) exit_crash (a )
#endif
void exit_crash (int value )
{
unsigned int *a = NULL ;
a [0 ] = 0 ;
Having a lot of localized debuging information is only useful if you know what part of the code needs to be fixed. There are times when this can be hard, for instance when a set of data is touched by many different systems. If you wrap any data access so that all code has to access it via specific code then that is a prime place to put debug code.
void important_value_set (unsigned int value )
{
#ifdef DEBUG_MODE
if (value == 1337 )
value += 0 ;
important = value ;
A lot of times bugs only show themselves in very specific circumstances.It is therfor often a good idea to write such breakpoint traps with many diffent tests and analasys. Once you have fond a state that causes the issue. You may want to consider writing separate code that recreates various similar test conditions, to test your code. Writing more code to verify and control the sutuation is almost always a good idea when you are debugging.
unsigned int my_function (unsigned int a , unsigned int b )
{
a = math_transform (a );
a = other_math_transform (a );
return b / a ;
Lets assume this code runs and crashes at the end because 'a' is 0, and that causes a divided by zero. How do we debug this? Something clearly happens inside either math_transform or other_math_transform to cause the 'a' to become zero. The problem is that given that 'a' is constanly being overwritten, we dont have a cmplete history of what has gone wrong. Idealy we would like to set a break point in the beginning of the function so that we can step thrugh the program and see the evolution of 'a'. The problem with this is that maybe the function hundreds of times before 'a' ever becomes zero, and we dont want to step thrugh the program hundreds of times, making notes of all operations. One solution is to detect the faulty state, and then re run the code with the faulty state. in the debugger:
unsigned int my_function (unsigned int a , unsigned int b )
{
original_a = a ;
a = math_transform (a );
a = other_math_transform (a );
if (a == 0 )
a = math_transform (original_a );
a = other_math_transform (a );
}
return b / a ;
Whenever the code produces a faulty state, the debugger will break and right after the break we add a re-run of the code that produces the faulty result. As soon as the code breaks, you can step in to the functions and carfully follow their operation in order to find out why they produce the wrong result.
Some times this aproach is clubersome to use becaus retaining the state needed to re run the offending code is not so easy. If you have a reprodrucable rdeterministic bug a simple solution is to add a static counter to the function:
unsigned int my_function (unsigned int a , unsigned int b )
{
static unsigned int debug_counter = 0 ;
debug_counter ++;
if (debug_counter == X )
debug_counter += 0 ;
a = math_transform (a );
a = other_math_transform (a );
return b / a ;
Now we can run the program, wait for our debugger to catch the divide by zero, check the value of "debug_counter", Replace X with the value, and set a breakpoint on "+= 0;". Now we can easily step forward and follow the execution that will cause 'a' to become zero.
Most bugs are simply logic bugs where the code doesnt do what you think it does. These are the most common and you cant give very much advice on how to takle them as they entierly depend on the problem at hand. As a general rule, since this class of bugs almost always come down to you not understanding what the code does, the solution is almost always to get a better understanding of what your code does, by reading the code, running it in a debugger, and adding code that outputs more information about what is going on. In a later section we will discuss more specific classes of harder bugs and how to aproach them.
Debuging the stack is harder then heap memory for a few reasons. Given that the allocation and layout of the stack is not controlled by the programer, its much harder to know how it works. Generaly stack overflows do not trap like heap overflows are likely to do. Stack overflows also often overwrite local variables and that may make the code harder to debug. Consider the following bug:
int array [10 ], i ;
for (i = 0 ; i <= 10 ; i ++)
array [i ] = 0 ;
This C code contains Undefined behaviour, so anything can happen, but what is likely to happen is that the variable 'i' will be placed after the array, so "array[10] = 0;" may result in "i = 0;" giving you an infinite loop. This is a simple example with only two variables, but in more complex code, it can be easy to not consider that a write to "array" could impact "i". With experience you learnd to recognize the signs of stack trashing, but to prevent bugs like this, I recomend avoiding using the stack for dynamicly addressed arrays, and insted use heap memory.
Heap memory also has the advantage of beeing dynamic in size. This means you dont have to put a limitation in our code as how big your data set is. (See limitations discussion) Heap memory does have the disadvantage that it needs to be explicitly allocated and freed and that can be a slow operations. One aproach is to combine both in cases where the data set is likly to be smal but you want to avoid setting a limit:
#ifdef DEBUG_MODE
#else
#endif
void function (int data_size )
{
int stack_buffer [MAX_STACK_USE ], *buffer ;
if (data_size > MAX_STACK_USE )
buffer = malloc ((sizeof *buffer ) * data_size );
else
buffer = stack_buffer ;
.... /* compute using buffer */
if (buffer != stack_buffer )
free (buffer );
The above code uses the stack, when the data set is small, but allocates memory when the data needed excedes a set limit. This is a good aproach if the data set is expected to be small, but can be guaranteed to be small. Notice that in Debug mode, we set the the limit to one, to force the use of allocated memory that is easer to debug. (Unfortunatly the C standard does not allow arrays of 0 length, with some exceptions)
When designing an API, you can make using it a lot easier by building in debug facilities, either always on, as part of a debug mode or configurable by the user. What are the pitfalls you can exopect your users to fall in to? Often people go straight to coding when using a new API, so builing in the documentation in to the API is very useful. If you get questions about your API, use those as hints to what people are having trouble with and try to incorporate debugging facilities that adress these issues. Some requirements, can be hard to express using the API itself, and are therefor great candidates for self debugging APIs.
If a user uses a uses the API in a pattern that is bad for performance, like allocating and freeing bufferes instead of reusing them.
extern void my_function_debug (int a , int b , char *file , int line );
#define my_function (a , b ) my_function_debug (a , b , __FILE__ , __LINE__ )
#ifdef DEBUG_MODE
#endif
extern void my_function_debug (int a , int b );
#ifdef DEBUG_MODE
void my_function_debug (int a , int b , char *file , int line );
#else
void my_function (int a , int b );
#endif
{
#ifdef DEBUG_MODE
if (a > b )
{
printf ("Error in file %s on line %u falling the function my_function , Pramater a (%u ) can not be more then paramerter b (%u )", file , line a , b );
exit (0 )
}
#endif
You can spend a significant time adding facilities like this that fool proof your API. In theory you can make an API that the end user can not accidentaly miss use.
Good code is easy to debug. I advocate writing code just for the purpouse of debugging. Especialy if you know that a portion of code is going to to be hard to get right, it makes sence to think in advance how it will be debugged. A cornerstone of debugging is being able to find a repeatable behaviour. This means that making your code as deterministic as possible helps a great deal. This can be especially hard if you code is multi threaded or takes a lot of live input. In these cases it may be worth investing the time to create the abillity to record all input and play it back in a repeatable manner. Creating facilities like logs, and even calls that let you plot graphics, can be very good preparation for a project. Writing data structure validation code, can also be time will spent early in a project. As a rule, writing debug code is a good investment of time.
While its valuable to think "strategically" about what debug code will be useful for a project, Its also useful in the shorter term. When ever im stuck, I start writing debugging code. It keeps you busy, it gives you something to do, it hopefuly contributed to figuring out the issue, and even if it doesnt, it mamy help you find other issues now or at a later date. Some of this debug code may be deleted the moment it reveals the cause of the bug, but some will grow in to facilities that are used repeatedly in development.
Unlike murder investigations, debugging ususaly lets you have a do-over. You can run the software over and over, each time instrumenting it with more and more debug outputs, that narrows you down the issue. This is a huge advantage, but it is only an advantage if the do-over can reliably replicate the issue. If the application is 100% deterministic, this is easy, but a lot of applications are not deterministic, becaus they depend on external factors susch as time, networking, user input and so forth. Limiting these factors can still help. If not all code can be made deterministic, perhaps portions can be. If possible, I find that its always decirable to be as deterministic as possible. One possibility is to enable the ability to record non determenistic factors, in order to be able to replay inputs precisely. Beeing able to record, the use of a library is also very useful.
Good code, tells you when something goes wrong. If you build a module with an interface, give it a debug mode, where all inputs will be validated, and any error results in termination with a descriptive messages. Yes, terminate the execution to force the developer to engage with the issue. Kind errors arent taken seriously. Building a separate debug mode,
Having a single debug mode is usualy to course, since you rarely want to debug your entire codebase at once. It makes sence to create localized debug modes for files, modules or even what you want to debug. If a debug mode prints out a lot of information, then having debug mode on for your entire application will make finding anything useful in the output very hard. By nesting debug modes you can make sure your debug mode is properly turned off in release mode:
#ifdef MAIN_DEBUG_MODE
#endif
#ifdef MODULE_SPEIFIC_DEBUG_MODE
... /* debug code */
#endif
Systems designed to debug your code should make programing easier not harder. Unfortunately many approaches to reducing bugs create a lot of friction for developers. Developers are required to over document, write and run excessive tests, and address benign warings and sanitizer findings, file rapports and so on. When things become harder to change, bugs also become harder to fix. As noted earlier, the most preasus resource any programer has is motivation, and its very important to protect the motivation of the developer and make sure they feel like they can get thier work done. While we would all like to write bug free code, we cant, so we should balance our bug prevention with our ability to quickly address bugs that are found. Unintuitivly, users often precive buggy software wher bug rapports are quickly addressed, as preferable to less bugy software where issues arent addressed. People want to feel heard and like their conserns are being taken seriously. Not all bugs matter and your users will let you know what bugs to focus on.
When systems are too rigorus, people start trying to work arround them. If you have a rule that says that any module that sees any chang has to be re certified, then sooner or later developers will start writing code where the main objective is no longer to write good robust code, but code that touches as few modules as possible. A lot of times fear of bugs creates bugs. Developers avoid touching things for fear of breaking things, and contort themselves in order to not have to engage with systems precived as fragine. If a system is precived as fragile, then maybe its time to consider why and how it may be addressed?
In the game development world, I know of a few absolutely game breaking bugs that have been found and fixed in matter of minutes, but then the patches have had to be delayed for weeks for certification. When you have a known show stopping bug, why worry about the possibility of an unknown bug? If you have a process for releasing software (you should) make sure you do include an escape hatch where critical fixes can be released without delay. Preferaly these fixes should be based on a previously verified release and only include the change that fixes the issue.
Some programmers argue that you should write no code that produces any warnings. (Or turn on treat warnings as errors) I argue this is the wrong way to look at tools. Tools should help you write and understand your code, not dictate what you should do. Most issues with code are instances where you think code does something different from what it actiually does, and therfor i think tools that help you understand your own code are the most important ones. Instead of thinking that using the right text editor will make you a great porgrammer, take the time to learn a proper visual debugger.
While im hugly in favour of tools that ask you "Is this right?", I think its equaly important you you as a programmer have the abillity to say "Yes this is what I meant". A common practice of treating varnings as errors, is in my opinion
for (i = 0 ; i < 10 ; i ++);
However, judging the general structure of the code, one might suspect that the semicolon at the end of the first line shouldn't be there. A good tool would warn the user about this semicolon and say: "Hey i noticed that you put a semi colon in a place where it might have been put by mistake, maybe you should have a look at it?". Its not bad code, so the tools shouldnt force you to make any changes to it, but it should draw attention to what is a possible issue. If you think of this warning as equivalent to a waring two things happens. First, it forces you to rewrite code to try to trick the compiler in to doing what you want while avoiding a minefiled of warnings. Secondly it forces the compiler writer to only write warnings for things where the compiler writer has a high certenty that the issue they discover actiullay is an issue. This means that they give the programmer a lot less feedback on things that probalby is right but might be an issue.
Varioud programing languages have proposed various shemes to produce code that can generate code such as templates, generics
When you are stuck on a bug. Dont just think your way out of the problem, program your way out of it. Start writing verification and testing code. It keeps you busy and engaged and often it reviels the issues you are looking for even before you complete the testing code.
While test code, is usful in many ways, simply writing tests for everything tends to be a waste of time. Most code is written to be used right away, by the person writing the code, and therfor the code that needs the new code is in it self a test. Not only is it a test, its a good one since it tests the code under the conditions its ment to be used. To write an aditional test we therefor should be much more dicering about the code we write. Most tests will not reviel anything interseting, and a straightforward test does not delve deep enough to find more hard to find issues. Before we write tests lets consider what we want a test to accomplish. These are the main reasons why one might consider writing a test:
I find that the most valuable test code you can write is not code that returns a fail or sucsess, but code that reveals what is going on. Finding out what your code does is always valuable, a binary test only tests your code against one or a limited set of issues. If your code fails its usualy not because it fails at something you anticipaded could fail. The real problems are the corner cases you never considerd.
The idea that some people put forward is that if you make tests that can automaticly run as your development process, you can guarantee that nothing breaks. In my opinion the moment code is touched, it may be broken. No automation can stop this.
Lets say that, you are writing an algorithm to down and up case strings. You write a simple test, that takes the string "hello world" and up cases it, and prints it out. If the output says "HELLO WORLD" all is well. Or is it? The turkish "I" can not be down and up cased in unicode for instance.
When code is working, there is no need to veryfy that the program is working, because the program itself is the verification. But if its not working, start writing code that veryfies it as soon as you are stuck. Its a common falacy to think that writing verification code will take time, where as finding the bug will will be a quick moment of clarity. When people ask me how long it will take to find a bug, i usually say it takes as long as finding your lost keys. It may be in the first place you look, or they may never be found forcing you to change the locks. If you write verification code, when you are stuck you are systematicly moving towards the bug by eliminating possibile problem areas. Again typing is easy. I have never found that I have wasted time writing debug code.
Lets say you want to travel 1000 kilometers between two cities. To do this you might use a car. You may invite a friend to join you. The added cost of a passanger, in terms of fuel and maintanience and the use of existing infrastructure, is negligable. Most regular cars can fit four, but if you want to bring even more people, an minivan that seats 8 is an inexpensive upgrade. If you really want to bring a lot of friends you may need a bus. A bus is coniderably more expencive then a minivan, but per person it is still cheeper, since it can easily fit 50+ people. If thats not enough a train may be an option. A train car can fit 150 people and the number of train cars in a train is mostly limited to the length of the platform. If you get Japan Rails to run your train line you can run one train every 45 second one one track. With a ten car train that moves 120.000 people per hour. If thats not enough, just add more tracks. You could easily have 10 parallel tracks or why not 100 if you feel like moving the entire population of New York every hour. Thats well over a million times more people then where we started.
The 1000 kilometer jurney would take about 10 hours. But lets say we want go faster. We can buy a Porche and baring any intervention form law enforcment, we can cut that time in half, But it will be expencive. If you are counting $ per K/h a Porche is not a good investment compared to a moped. If we again want to cut our travel time in half, things start to get really expencive. The fastest production car at the time of writing is a Köningsegg and it tops out at around 400K/h, but it can easily cost you 10x more than the avarge Porche. After this, its starts getting really dificult. A japanese eperimental maglev train will only buy you another 100k/h, so thats out. You could go for one off land speed record car, but none of them have the range needed to complete the jurney. You pretty much have to go airborn. Your average passange plane can easily hit 800k/h, but with lake off, landing, and taxying it will be close. If you want to go cut your travel time in half again, you need some good connections with your local airforce. A fightere jet can get you close to Mach 2, and with a little help from an after burner and an ejector seat, you can shave valuable minutes from your jurney. If this is still on the slow side, you will need some truly epic connections with the Smithsonian air and space museum, to get them to dust off their SR-71 Blackbird for you. Unfortunately it will only get you to to 3.5 Mach, and with the required space siute and air refuling, it may end up being a bit a of a disapointment in the area of practicality. An SR-71 is expencive at 100.000.000 a pop in 1972 dollars, but at least they are reusable. To go faster we need to enter the field of roketry, this is for the traveler for whom money is no object. The level of complexity, costs and engineering needed to do this is starting to push against the limits of human capabilities. On the subject of human capabilities, somewhere around this point, your body starts to be a problem. Around 10G our bodies start breaking down, we loose contiusness, crack ribs and so on. Even if you can get arround the squishy-human-problem, eventiualy you will run in to the speed-of-light-problem.Im not saying these issues are unsolvable, but they probably involve a couple of trips to Stockholm to pick up a few Nobel prices for changing our understanding of the fabric of reality.
Whats is the point of this excercise? It illustrates a simple rule: Bandwith is easy and latency is hard. We where able to easily 1.000.000x the bandwith of our jurney using 100+ year old technology, where each doubleing of bandwith cut the cost significantly. At the same time we have great difficulty reducing our latency by just 50x, and we are forced to deploy the most advanced technology ever deviced, and at a punishing cost curve where every k/h is far more expencive then the last one.
Why does this matter for software enginnering? It tells us that, reducing the time it takes to return a search query by half, is harder than responding to twice as many queries. In fact, if your queries return twice as fast, its likely you can handle more queries too. It tells us that its far easier to compute, then it it is to syncronize computation. Its far easier to optimize a network for bandwidth than it is to optimize it for latency.
Knowing this helps you estimate how hard something will be, but it also tells us to be careful about prioritizing bandwidth over latency. A bad design with too much latency is far harder to fix then a bad design with too little bandwith.
Bandwidth tends to solve itself over time. To increase the bandwidth, you can always just do more of what you are already doing in parallel. Add more chips, wider busses, more lanes, more cores and bandwidth will go up. Latency however is much harder to solve for. It requires optimization, care full timeing and syncronization, and splitting up problems in to smaller problems that can be solved in parallel.
A computer from the 80s could access memory in a single cycle. Today a memory access in is lucky to be less then 10 cyles, even if it happens to be in the level one cashe. Latency on a modern comuter is still much faster then on a computer because it runs a frequency 1000 times higher. Relative to bandwidth (and compute), latency is becomming slower and slower as computers evolve.Latency in a computer is heavily limited by the speed of light. At one gigahertz, light travels just 300 mimmimeters. There are hard limitations that you cant get around so be careful about wasting this preasius reasorce.
Android is a truly terrible software development platform. I often have trouble understanding it, because i cant imagine anyone would design a system so poorly. Here are some baffling discoveries:
Installing a development environment and connecting it to a device is a mess of broken links, poor integration and company turf wars. I managed to get a working environment on the third computer I did a clean install on.
You pretty much can't write clean C/C++ applications for Android. You need to use JNI to access lots of basic things. JNI is ugly, not thread-safe, and different for C and C++. Cant Google hire an engineer to wrap their APIs? Why do I have to do it? Choosing to build a OS around a language is stupid, choosing java is worse. Low powered devices should not run code in virtual machines. Android is a layered architecture where they skipped writing the foundation layers.
The project manifest files seems to be the dumping ground for all kinds of functionality not accessible programmatically like, program name, orientation handling, icon, permissions.... It should be possible to write a program using only one syntax.
If your application explicitly sets flags to run in fullscreen, on a device has a higher aspect ration than 16/9 you still get black bars. To fix this you need to set a line in your manifest file that explains to android that you want fullscreen when you say fullscreen even if the screen is wide.
Using the onscreen keyboard does not generate key events, except for space, period, return and a few other. No letters. No numbers. Either a onscreen-keyboard emits key events or edits a string, WTF is this?
You cant publish your app with files. If you want to add a file that you want to open with fopen, you need to include it in your packet, then use a custom API to enumerate the content, use a java API to find a place you can read/write files, extract the files to disk. Google designers: "By storing it in an APK file everything gets zipped and uses less storage!" Reality: You now need to storing your files twice, taking double the space. Bravo!
When you can get away with it, you can access the underlying Linux core and then things are OK, but what parts of Linux is available is a complete crap-shoot.
Many parameters for functions are entirely without documentation. There is even documentation admitting that some functions where added by accident to the API. Microsoft's VisualStudio documentation of Android, is better and more complete then Googles. Imagine how bad your documentation is, when your competitor documents your software better then you do.
I would be surprised if there are more then 10 Android apps written from scratch. The documentation is so bad that everyone just copies code they find. Everywhere I search I see the same variable names and "I dont know what it does but this code works for me". There are entire core concepts of the platform that no one seams to know what it does like "state save".
If i was put in charge of Android, I doubt 80% of the team would still have jobs by the end of first week. This is a staggering level of incompetence for a software development team. Give me 50 developers and I would build a platform that would crush Android.
PS If any Apple fanboys want to use the above rant as proof for why iOS is better, you should be careful not to tempt me in to putting in to words my feelings about iOS, because you may not like what I have to say.
I really want secure software but Ive had it with security people and their misguided orthodoxy, being unquestioned in the software comunity.
Security people always think that security is the most important aspect of computers. It NEVER EVER IS. Computers exist to get things done. If security was more important, then, here is a hot tip: Dont plug in the compiter to either a power socket or a network port. But most pople People still do that, becuse most people think that the benefits of running computers are more important, and are willing to take risks when it comes to security, to do things like make society run. Security people always act as if they are the only ones able to judge what constitute good software and do it from the most narrow of perspective. Real software developers have LOADS of things we have consider and security is only one, and many times security doesnt matter.
Software development is to balance of reduction of bugs with production of usefull funtionality to the user. Its always a balancing act. Dispite what some people think, software can never be proven to work. You can sometimes prove that it reflects a specification, but you cant prove that there arent errors in the specifiaton or thats the intention of the specification is flawed to begin with. There is no way to writing software witout bugs, all we can do is make the right prioritizations to minimize bugs. Screaming and crying about how the world isnt perfect is childish and stands in the way of actiually making things better.
This utter failiure to understand the reality of software development, is very evident when listening to the process advice given by security professionals. Essentialy security advice boils down to this: put so many checks on the process, that making any progress becomes so clubersome that no one will ever do anything. I know of so many examples of hacks and lack of much needed rewrites steming from developers doing anything in their power to avoid onerous security reviewas and lengthy recertifications. Writing secure software is as much about as keeping the software maintained, and keeping the developers motivated to maintain it, as it it so employ reasonabel security practices.
I have worked in game develeopment for many years, and I am confidently telling anyone out there, that if you are making a new multiplayer game, spending much of your time on preventing hacks and cheats is a complete waste of your time. The wast majority of games fail to find an audience, so focus all your energy on trying to make a game that any one cares enough about to even considering hacking or cheating in. Only then consider security seriously once its a problem with some kind of return on investment engaging in.
As a C programmer and member of the wg14, for once I would like to see a security person concider that if all the most trusted software like Linux, OpenSSL, Apatche, Python, Curl, are written in C, then perhaps just maybe the people who wrote it arent complete idiots. And perhaps just maybe C has some property that, beyond the obvious shortcommings, makes it the most sucsessfull language to write secure code in. Maybe it would be worthwhile trying to figure out what that is, that is if security reserchers where intersted in reserching reality. But security people dont live in the real world. Case in point: The Linux kernel, has over 2000 filed unaddressed security vuldnerbilities. You may think that says something bad about Linux, but if Linux really had 2000 exploitable volnerbilities then no one whould run Linux, and whoever was brave enough to do so would be hacked instantly. Obviusly thats not true. In the real world Linux is very secure. What it really says is that the security comunity has dreamed up 2000 bullshit issues that dont matter in the real world.
The survivor bias of buffer over runs is overwhealming. Speaking of, buffer overruns are not bugs. Buffer overruns are the symptoms of bugs. And BTW, buffer over runs are not seriously hard bugs. ABA bugs, time travel bugs, UB elimination bugs, lockless bugs, and asliasing bugs are hard, the people who find those bugs deserve our addoration.
Security reserchers review code and as such they only see the bugs that are left behind not the ones cought during development. They see a bug in a design but they dont see the bugs that a design avoids. As such they want "volnerbility mitegation", trying to somehow make bugs more graceful. Software developers on the other hand know that the bugs that are found and fixed are the once that fail hard and fast, not the ones that almost work and dont make much noice. So while the "mittegations strategies" proposed by security people makes some bugs less serious, they often result in more bugs, not less.
Security reserch is a subsection of QA. Their job is to find bugs. Thats it. They are not secret agents, or cool hackers from some bad TV show. Finding bugs is good, but its a job, not a super hero identity. Finding a security bug does not make you cooler then finding any other QA bugs. I find many bugs everyday and when I do, I fix them and move on. What I dont do, is come up with a cool name for them, register a domain, and then give talks at conferences about it. If you find a security bug in some important software thats good, but you do not deserve to be more famous than the people who wrote the software to begin with. Its the same bullshit that makes anyone with a badge think they are the lead in an action movie where they are the only ones who can save humanity. You are not cooler than a nurse, teacher, firefighter or someone else who is much more likely to actiually save lives. Security people dont know how to be software developer better than software developers, if they did they would be software developers, so lets stop pretending they are.
All this wouldent be so bad, if it wasnt for the fact that we are living in a security hellsscape, and we desperatly need better security.
No I'm not talking about buffer over runs, I'm talking about the fact that every person is walking arround with tracking devices, that are owened and operated by survailence capatalists whos very existence depends on stealing our data.
Much of todays software stack depend on hundreds of source packages that are without review, anyone of wich could be compromised. Most of these are protected only by two factor autentification, two factors that is a password and a phone, where one can be reset with the other and the second one can be defeted with a sim swap that is widely availabel for purchase on the dark web.
Large service providers routinly lock people and companies out from their accounts, data and livelyhoods, without any recourse. This happens without explanation for fear of revealing their methods, on the advice of scurity people. The same security people who are preaching public securty volnerbility disclosiure when its someone elses software being affected. Its like corporatized ransomewear only at least real ransomewear hackers, offer human tech support and the abillity buy our data back once they have stolen it. Windows update, that is ment to protect me from visruses, have lost me far more data, then any hacker has.
Security professionals worrying about potential hacking of voting machines, somewhat missses the point, when the entire electorate has been hacked with disinformation to begin with.
Close to 10 years after Edward Snowden curagusly reveald what many of us suspected, that the US and many other goverments engage in widespreda hacking and survailance of the internet and that all major service providers have given them the back door keys to your data. The security community has not been able to even present any feasable alternative future to a world of us being perpetiualy beholden to the whims of a few mega corporations who without any transparency bend any goverment no matter how reprehensive. But thats just the thing, scurity people dont make things better, they dont create, they just complain about the people who do.
They complain and people listen and obay without questioning their orthodoxy. Who ever convinced the world that every computer without exception needed to be slowed down to mitegate Transient execution CPU attacks is responcible for billions of dollars of increased power consumption, and untold tonnage of co2 being released in the atmosphere, should be considerd on par with the Exxon Valdez, or Deepwater Horizon disaster.
The threat modeling of the security comunity sucks so bad that one has to wonder if its a deliberate tactic to keep us unsafe. Maybe they keep bringing up ridiculus fantacy explointations like rowhammer and an atackers abillity to get hold of our credentials by submerging our memory chips in liquid nitrogen, to make us forget that in the real world dominated by corporations that employ a lot of the security comunity, we are all pawned by default.